Managing data

"persistence of data -- commercial services are attractive to faculty, but what if they go away? if something is open and standards based, where the data can be extracted" (Ex 1, 1b-A)

"data flows in so fast, how do you organize it?" (Ex 1, 1b-B)

"Can we contribute open-knowledge content across institutions that we can then use in lieu of purchasing a CD of stuff that my institution must have, but has not assembled for re-use." (Ex 1, 1b-C)

""You can now find things that you never could have found before ... it has revolutionized scholarship" Scholars who have skills to mine that data are coming up with ideas that seem "crazy" to those who lack skills to find and mine resources that were not available a few decades ago." (Ex 1, 1b-C)

"Need for the primary material is enormous: demand increasing faster than the digitized collections can supply. Students want/expect to be able to find 1920s Irish newspapers, and 1950s Chinese census data, etc., etc., instantly available on-line." (Ex 1, 1b-C)

"hope to find better ways to integrate media, embed images, be truly multimedia" (Ex 1, 1b-D)

"in my field drowned in data, need database models, information integration, need to develop an ontology of digital support for humanities." (Ex 1, 1b-D)

"ITunes for my articles. Indexing, referring, recommending for scholarly articles." (Ex 1, 1b-E)

"keeping up with the rate of proliferation of digital data" (Ex 1 group sharing, 1c)

"Separation of communication or archival needs with current solutions." (Ex 1, 1c-A)

"Very clear that once content goes digital, distinction is magnified. Once in digital form, items continue to have meaning, but the meaning may be altered. One can argue that the narrative form was optimized for codex, but in electronic form approaches change. Search, index." (Ex 1, 1c-B)

"Sustainability. How does Mellon foundation see sustainability growing out of PB? What are the external factors affecting sustainability? What needs to be sustainable is not a tool (which is likely to be supplanted), but the artifacts and metadata created with that tool that need to be sustained." (Ex 1, 1c-C)

"Disc space on any given computer is being filled with digital media (not text). What does the "replacement" of text as a primary area of focus mean to humanities scholarship? What tools are available for searching on, annotating, extracting from, publishing, etc. these -- these are at a less-developed stage than tools for text." (Ex 1, 1c-C)

"What are boundaries drawn around a given "object" that both allow and prevent community engagement? How to keep pure but also open enough?" (Ex 1, 1d-B)

"Importance of visual and non-text media" (Ex 1, 1d-G)

"d-space a groundbreaking digital repository system that captures, stores, indexes, preserves, and redistributes an organization's research data" (Ex 1, 1d-H)

"linking different languages. How do you get parallel paragraphs into a usable format. What about copyright issues? are graduate students asked to become programmers and lawyers?" (Ex 1, 1d-H)

"issues related to archiving information in/from multiple media; how to, do it better, access to resources that are already in place" (Ex 1, 1d-H)

"One question over past several years---question of Data. Sciences have data sets. Some disciplines in A&H have data, some don't. Scholars taking lots of time making digital data sets so that they can then do something with it." (Day 2, 1a)

"Finding a secure, persistent place for storing resources. Should be like Library of Congress. Minerva project. Academic presses are thinking about this as well." (Ex 2, 1a-B)

"I want to cast a very wide net; This is a basic research questions. You keep working your way around databases and explore and find new content and define the space of inquiry. ; Conference on representation of older women, (reference to Slate article on Hilary aging) , How do I find materials; start with fragments, what you notice or YouTube; Could also include production and reception history." (Ex 2, 1a-B)

"Store and retrieve personal research; being able to find what you've already digested somewhat and make sense of it" (Ex 2, 1a-C)

"format conversion, mine website, make usable, extend. This is creative in the development of tools and techniques, but the process is actually a data/source material transformation, manipulation, munging" (Ex 2, 1a-C)

"Data set might be original source, secondary data, images, texts, audio files, bibliographies, stories, GIS locations" (Ex 2, 1a-F)

"Data manipulation: be flexible about how I look at / query data in a marked-up data set or across several such sets. Encode and manipulate data, and run queries over it." (Ex 2, 1a-F)

"Preserve digital resources. Notebooks on acid free paper, photo albums are really the most reasonable bet for preservation ... Digial data gets lost." (Ex 2, 1a-F)

"Organize data and information with tools ("we can be messy and let our tools organize")" (Ex 2, 1a-F)

"Data manipulation: be flexible about how I look at / query data in a marked-up data set or across several such sets. Encode and manipulate data, and run queries over it." (Ex 2, 1a-F)

"everyone organizes things in their own ideosyncratic way, which makes it hard to share with others" (Ex 2, 1b-A)

"preservation of data can be the contribution" (Ex 2, 1b-A)

"I'm much better off now than I would have been ten years ago because of ability to find materials in WorldCat or JSTOR. "The field has been leveled considerably" already." (Ex 2, 1b-C)

"With Mellon's support, we've worked on a prototype of a system that integrates all these things, uses TEI DTD used originally for inscriptions, has been extended for lots of ancient witnesses. Providing at the data level a way to deal with that; the next phase is generalizing the tools, having a community base so they don't become obsolete" (Ex 2, 1d-A)

"I can find material online never easily accessible before. I was doing research found a prisoner number, found the prison, sent the information to them, and was able to pay for them to send me a copy of the documents. To be able to get materials rapidly from home rather than traveling, going the archives, etc." (Ex 2, 1d-B)

"Unrestricted or minimally restricted information through sharing of materials, drawing on a wider range of sources and media (including people and places previously not considered authoritative)
technologically mediated approach where the relationship between the user and the raw data is a seamless as possible" (Ex 2, 1d-C)

"Technology is laptop, emacs, and text I'm reading, relatively low tech. On laptop I take notes, stream of consciousness connections, fetish is hyper text note taking. Not data driven, but architecture driven. Systems that model the way we think and the way lit and language should work. That's an underlying theme: frustrating - ought to be a way technology should work, but hasn't helped w/ literary work. Notion of annotation and metadata and becoming formal has numerous degrees. To make it accessible and standards compliant - we've turned to librarians who have rigid standards for metadata. Can write intellectual search machines but has to get more specific. Question of data shoveling is also who is going to define?" (Ex 2, 1d-D)

"Limitation of tools we are provided with. Closed model doesn't make data useful." (Ex 2, 1d-D)

"My relation w/ the institution where I organize other tech guys, archiving our workshop notes, archiving student projects. Was previously done in a closed proprietary system. I'm bringing out into the light. Using Opensource 2.0; I love it. (drupal) Archiving links to blogs, workshop notes and postmorta and how they went. Past practices and critical evaluation of those practice. Work w/ grad students in instructional design, lots of turnover, rely on institutional memory" (Ex 2, 1d-D)

"We use Blackboard in teaching and learning sites to collect these things. Units were generic depending on the year in cycle (this is 10 yrs), now doing in dozen different teaching fields for 4 yrs. Can invite people in to password-protected sites; want to take material out of that. Graphic references w/ hyperlinks, text. Going back 5 or 6 yrs." (Ex 2, 1d-D)

"refer to "peter's world" spend a lot of time trying to figure out how to get from creating a digital bib. artifact, from something "print" to something "machine actionable", that the stupid pigeons can do something with. how to get from easy to read to easy to program against? whether or not the effort of the transition is worth it. the scholarly practice is about inventing new ways to work with citation (digital modes). embedded in a particular set of subdisciplines (ancient world). - creating citation ("check me")" (Ex 2, 1d-F)

"many universities don't have the resources of (say) harvard for a competent bibliographist. analysis for storage in an archive looking at metadata so it's useful for unintended uses preparing material for eventual unknown uses invention of the "dspace" or fedora products how to store the information so that it will be there at all (format drift) incorporate discovery of online resources, push to "where sutdents and researchers are" (facebook etc.) rather than library home page as starting point" (Ex 2, 1d-F)

"Process of discovery. Can accomplish high level of granularity online; intersection of scholarly and non-scholarly. What I can find through UVA library plus google scholar and google books is incredible " (Ex 2, 1d-F)

"friction points? cost? availability? (only so many art history db's out there) fit of products to needs? cost, labor, infrastructure, data migration, "customizable now but need to be sure we can migrate" " (Ex 2, 1d-F)

"have built large database of social science data; 5 year project. for social science scholars, but also for the public. Funding has run out, can't find funding to continue; "on life support" through library. Katz's law, "costs more to maintain a database than to build it." Confronting this a lot. Get funding to build something, scholars come to rely on it, but when funding runs out, isn't clear how that it will be supported over time. " (Ex 2, 1d-G)

"Finding primary texts, and secondary materials. How do I do that? Mostly online; sometimes going to library. In germany, most libraries are closed stacks. Likes open stacks in US; serendipity of finding things; similar experience online. Don't go to particular journal to find articles, but to database." (Ex 2, 1d-G)

"Importance of content export, the exit strategy. History of preservation strategies. Moving stuff between contexts." What about doing something new: connected to OCA, easier to upload. Such a project needs resources LACs don't have. What object? Digital surrogates for physical texts vs born-digital materials. Cost of migration: loss of functionality and/or loss of perspectives. How do we define our objects? Could Bamboo take this on? No. Can Bamboo facilitate communities which engage a data curation conversation towards defining digital objects?" (Ex 2, 1d-H)

"Data Management: Curating, preserving; update/maintain software; geo-references" (Ex 3, 1a-E)

"We need bibliographic file management that fits idiosyncratic workflows" (Ex 3, 1b-C)

"Finding validated or trustworthy primary data on a research subject" (Ex 3, 1b-C)

"To do anything besides paper would be huge amount of work" (Ex 3, 1c-D)

"Difficulty barriers for tools; uncertainty - risk management. I know a sorting tool that'd be ideal, but it would take me ages to code my data and put it into those formats. Think of all people who use e-mails for document management. It's hard to build reliable, intuitive scholarly tools. Developing tools, but not for public - documentation to make it publicly available is time-consuming. Would use Microsoft because you know it, it's easy, will get what you want, even if there's better things out there. 90% of tools built used only by person who built them" (Ex 3, 1c-D)

"sustainability; planning for lifecycle; maintaining access to objects; distinguishing between materials for preservation vs ephemera; planning for persistence" (Ex 3, 1d-B)

"moving from text to database sets as the primary resource of humanities scholarship and it has a far reaching impact" (Ex 3, 1d-C)

"Have you found issues where "if we had done this digitization today, it'd be better/more usable"? It's happened to me. Librarians say "We won't digitize again unless you give us a really good reason why; we're not going to keep doing it - this stuff is fragile." If you're going to do it, do it right." (Ex 3, 1d-D)

"Ideally what we would have from the process is an inductive approach to categories... allowing categories to emerge including uses of the data when we gathered the data. Emergent categorization rather than inductive categories. Taxonomies are incredibly important to not lock everything down." (Ex 3, 1d-G)

"Back to ambiguity and interpretation. Contextual information management. Annotation , or cross-reference. "Every architecture needs to allow for conflicting statements about things"." (Ex 3, 1d-H)

"hard time imagining a system that doesn't have as its apex or subbasement a standard set of terms" (Ex 4, 1b-B)

"even these MD standards and tagging schemes themselves are products of particular communities of scholars that don't cover the waterfront and are "historically contingent products of scholarship that are ultimately ephemeral" so we shouldn't build structure on ultimately ephemeral things" (Ex 4, 1b-B)

"the library world has plenty of thesauri. even a small number of baseline metadata that everyone agrees on, v. broad, is v useful" (Ex 4, 1b-B)

"global schema to which local heterogeneous schemas can be mapped. a very conventional old-fashioned database problem. is there an appropriately abstract and universal global schema that would let you map local taxonomies to a more general tax." (Ex 4, 1b-B)

"they have this problem too in ling; there is one ontology becoming more and more accepted. the task of the researcher isn't to reinvent ideology but to provide a mapping" (Ex 4, 1b-B)

"has a vague image of foraging digitally and being able to see other people's taxonomic structures transparently associated with a given item" (Ex 4, 1b-B)

"foraging combines discovery and serendipity; that kind of investigation can inform both linear and untraditional forms of understanding" (Ex 4, 1b-B)

"he likes the verb "foraging"; it applies on different levels to the cognitive process. He produces CDs, self-sustaining projects; producing stuff with public television, etc. Foraging is artistic process and also idea-finding and -incubation. Librarian at large state school, runs institutional repository; foraging provides a way for her to find value in the weird stuff she has in her repository. Her angle on producing nontraditional materials is sustainability; she's involved in the ebook movement. Foraging as a practice guided by a specific theoretical frame? opposed to a more structured search process...? It's searching, discovering more broadly construed. There's the way you're "supposed" to do it and then there's the other stuff" (Ex 4, 1b-B)

"foraging produces research that doesn't fall into already-researched paradigms, so that can be the object of the foraging" (Ex 4, 1b-B)

"Engage users in folksonomic tagging, giving meaning to a scholarly object, identifying the value or significance of a scholarly object. Engaging people in disambiguation or correction of non-automatable data. Might look to a member of the public like "playing a game". Might not be so interesting to people ... must distinguish between scholarly and general-public communities" (Ex 4, 1b-C)

"conference experience of nobody presenting on what are their research tools, but presenting new tools that are never followed up upon. Couple years later, it's all washed away." (Ex 4, 1c-A)

"What are the priorities in digitizing? Technology can create extra tasks which are gratuitous from the point of research. Type-setting as an example." (Ex 4, 1c-B)

"Resources may sit on shelves if in the wrong form. Requires a funding bid to be put in the right form. Scholar is required to demonstrated a scholarly need in order to transfer/change formats." (Ex 4, 1c-B)

"Highly involved with presentation and curation. Do scholars preserve? Or is it done for them because of the scale?" (Ex 4, 1c-B)

"One may have to convert formats - standardize, via having things digitized. Some early audio/video can simply not be played and need to be digitally redone/recreated. A/V formats which go out of date need to be digitized and updated more often than print sources." (Ex 4, 1c-B)

"What are the current practices that might be facilitated by new technology? Provide alternate means of access to rare collections through technology and visualization. Possibility of virtualizing some data sets and collections can play a key role in humanities scholarship. Much art scholarship can be done with prints, as one day one might be able to do architecture using visual models. There still may be value in visiting the original. Using library, using surrogates (like Google). Cultural restitution and reunification of artifacts. Must visit physical artifacts is a crucial part of research. Assemblages of sensitive artifacts can happen virtually." (Ex 4, 1c-B)

"Tendency of technologically mediated sources > more bite-sized pieces. When talking about issues of research process, this kind of issue becomes important re: workflow. Important to be able to share representation you're using. Not implicit in the same way. Citing electronic resources differently; move around differently" (Ex 4, 1c-D)

"One of the things for scholarship: control in some sense for interactive technology support. You don't get context on-line > Need to see page in relation to ads, juxtaposed with other articles. Allows new opportunities to juxtapose things that physical version can't." (Ex 4, 1c-D)

"Archives and communities are not separable. We should be building tools to help use these materials." (Ex 4, 1d-A)

"My major critique of Bamboo is that it doesn't sufficiently emphasize the diverse and multiple uses of tools and material, and the ability to pull different sets of data captured for different purposes into single sites for new insight." (Ex 4, 1d-A)

"the ancillary information and data that are made on the way toward publication" (Ex 4, 1d-C)

"the more interdisciplinary, the more likely the need to go to multiple sources to review the work of others and to determine if the problem/question has already been solved" (Ex 4, 1d-C)

"Decision of a librarian to buy a book is a filter. tools to decide which digital resources to use (reviews) don't exist the way they do in print. bamboo could serve as a sort of gatekeeper; out of fashion but useful. Example of bryn mawr classical review inviting submissions of classical work for e-mail review" (Ex 4, 1d-E)

"here's a cost issue. with print, cost is about lighting, air conditioning, shelf space, circulation. in digital, costs have moved. equipment, staff skills. Is digital cost comparable, higher, lower? Cost is similar but types are different. costs of maintaining access to the electronic; things break. print stays put unless stolen. Stability has value (print). digital can break on no notice. social stability of citation to print vs. references to digital resources which change or move. social effects: print has a certain authority. author, press, costs to produce don't have clear digital analogs. digital dramatically different from print in important ways. not sure how stability translates into digital. comparison of print revolution to digital revolution: less technological than social. scholarly apparatus, not a technical apparatus. Shared agreements for what one will do with digital works once completed." (Ex 4, 1d-E)

"How scholars manage their data sets? If data sets are part of what scholars do in the future, this is relevant. How creating/tagging data to allow the research we want to happen is a scholarly practice" (Ex 4, 1d-E)

"Your tool over time would teach you to rely on certain scholars; also be able to find their comments. Folksonomies where anyone can comment > how useful is that? You might want to block comments in Chinese, by certain scholars, before this date, etc." (Ex 4, 1d-E)

"Internet is no longer usable after a certain point for my research. Can get some kind of bibliography, but then you have to read articles/footnotes, constantly go back to originals, which aren't available on-line. These are temporary measures at this time, a first step. But what if everything has been digitized? What tools will you need for that? At that point, have to refine what you're doing, ask more interesting questions. Interesting if you develop techniques where you can sense that there's a gap of stuff that needs to be brought on-line sooner rather than later. Start down these roads learning new ways to do it; leads to machine intuition, etc." (Ex 4, 1d-E)

"As you get more fine-grained, is this text the same as that one? Two versions created sequentially? Etc. In library community, we've tried to classify intellectual works > expressions > instances, etc. Naming those relationships is another practice within that. In Germany, they do a lot of editing. Using technology to show various levels in the textual structure. First version, all the way up, can click back and forth > this is going on for earlier periods" (Ex 4, 1d-E)

"can db access be done in a standardized way? valley of the shadow: 1990s presentation, not interactive particularly, outdated, could be done better now. images etc. independent of indexes. would be nice if humanists had a space where all thhat was preserved, could be revisited, rebuild, recreated. amazon S3 model: content-free infrastructure that humanists could fill with content." (Ex 4, 1d-F)

"who will be responsible for all ways all these media will be joined and made accessible? library might have different baskets for jpegs, tiffs, etc. library can preserve the object, who indexes for access. in SOA that would be done ona case by case basis, library not responsible. isn't that the problem? if goal is for re-use, maintaining at faculty level... objects maintained in library or such, mashups would be handled by consortia." (Ex 4, 1d-F)

"Tools and services that can grow and resift. Planned growth. A built-in assumption that it's an imperfect description. Need checks to make sure you haven't left something out. Automatic classification of the '70s. How do you let a machine help you define something? Ambiguity could be developmental---evolution." (Ex 4, 1d-F)

"big question but small part of it: somebody (libraries) should provide a service that allows lowering of threshold bar for making repositories(huh?) example of access to nyu lib. architecture buld in abstract ways. builf for afghan libraries, now looking at using for collections of images. successful consortial efforts: JSTOR, ArtSTOR, interface not what i'd want but saves our library a lot of pain. can these be seen as analogs at a different scale for collaboration facilitation? jstor, artstor... what other stors? there's not a lot of humanistic methodology embedded in jstor. a title is a title." (Ex 4, 1d-F)

"Extended metadata - what is that? What do people need to know about these materials? Key decision. Hard to go back and change that decision. Key moment to define your ontologies." (Ex 4, 1d-G)

"metadata story about people adding terms offensively (not deliberately)" (Ex 4, 1d-H)

"we aren't talking about changing the basic scholarly practices, we're talking about refining it and incorporating new things. convergence: we began talking about data sets, tagging, metadata, and getting better at that, getting consistency, and now we've come back to that cluster of subjects" (Ex 5, 1b-B)

"we would like to hang on to everything for posterity, but that sticks posterity with the problem of picking out what's valuable. letting a thousand flowers bloom vs. existence of evaluative criteria. so in the digital world there ought to be clear ways of evaluating and sharing evaluations" (Ex 5, 1b-B)

""Linguists talk about archiving stuff, when they mean slapping it up on a website, and that's not archiving, in my view." "Tell it, brother! Testify!"" (Ex 5, 1b-B)

"aesthetic of just-good-enough. better to do something badly than not at all" (Ex 5, 1b-B)

"University presses need to step up to this challenge. They're going down.. and they need to create new ways of letting repositories be recognized." (Ex 5, 1b-E)

"Field linguists go out and collect language, collect lexicons. The dissertation is given importance, and there's a mass of related information that never gets published, that is put in a box. But it takes a lot to put it in a box (assuming someone will take it). The discipline also needs to recognize the value of this primary data. Right now, only traditional forms of publication are given sufficient value. As a result, not much effort is put into preserving this." (Ex 5, 1b-E)

"Isn't it about tagging and bagging? Bagging into conceptual bins, maybe one object in multiple bins. Tagging w/ something that distinguishes in those bins, or strikes you, something that's a mnemonic. This would really work for hybrid system; whether you're in research library looking at document you can't take away, or have digital file - if there's enough of that tagging to get you back to it, and can put into multiple categories... If you could set up some kind of paradigm saying these are important categories and tags, these are kind of what Windows does in Word, it's slightly inefficient but it wouldn't take care of hybridity but would give you access. Once I have those photocopy pages, there's a physical process of juxtaposing pages. You put the materials together, you could do it on the screen, but the problem is hybridity of documents. More than bagging and tagging, also ordering. Also difference where while you're doing project work, they do take an awful lot of time to do (workflows, databases, ways for people to manage information) but it's different when you're doing your own thing" (Ex 5, 1c-D)

"If you have 3000, 10,000 files on your computer but nothing but Word to access, you need something else, is very ad hoc, different for everyone" (Ex 5, 1c-D)

"Personalization in scholarly context - more than the ability to annotate and search on annotation and organize by it? I can imagine an argument that all I need in scholarly context is to store materials I have-- as long as I can link back, I don't care where they are-- but the environment notion" (Ex 5, 1c-D)

"Once your materials come from different sources, then the question of putting them in comparable format does become a burden. Lots of things can be digital, but what good does it do you if you have PDF's of images, then HTML from other things, and then you might as well treat them as analog" (Ex 5, 1c-D)

"I might tend to do more with images, because then I know I might be using those images for other sorts of things in a readily available form. If there's things likely to use in research > digitization. Students might want to use them, I might for PowerPoint - potential for multi-use" (Ex 5, 1c-D)

"A fair number of film annotation projects. If anything, there's a lot of redundancy. No convergence yet around any project" (Ex 5, 1c-D)

"Lots of issues re: different formats for recorded versions. With text, at some level you can move from one platform to another. Much harder for video. YouTube doesn't cut it for scholarly analysis of most performances" (Ex 5, 1c-D)

"Problem of one-off technology > You're not likely to curate beyond immediate needs of project" (Ex 5, 1c-D)

"We haven't had new hires lately. New grad students use blog like environments to relate, communicate, think about their content. Relational links. One place where everything is. Text, images, other modalities. Virtual desk. RSS feeds, links to other sites they find interesting. Totally different from 'my' practice. I have different sites. Stacks of printed articles. I don't like to read online. They are much more comfortable reading online. I use a lot of paper." (Ex 6a, 1a-C)

"Younger scholars are interested and expect access to primary data. We didn't have access, it's now expected. They want to trace arguments through data. They are willing to make data available even if not published. Maybe in a decade's time that these objects will be 'published' contributions." (Ex 6a, 1a-C)

"Persistence of older technologies is something Bamboo should take into account. Realizes she needs to attend to both digital and non-digital archival concerns." (Ex 6a, 1b-B)

"One way in which the field becomes boundless is in connections to everything else. Is that part of the process? Yes, something about getting the failure you'd expect." (Ex 6a, 1c-A)

"People used to have mentality of as things came across their desk, they'd decide keep/not keep. Now people want to keep everything so they can later do whatever. Need metadata, huge storage department > people want huge e-mail quotas" (Ex 6a, 1c-C)

"Grad students expect digital surrogates of print forms provide enhanced access (allow them to do more). Is added value seen as coming along with the surrogates themselves, or that tools can be brought to bear on the surrogates? Will future scholars see tools and bodies of material as closely linked? Tools can be brought to bear on them, but the surrogates have to be structured in a particular way Grad students don't care how it happens. If it's digital, there's an assumption we can read it anytime, anywhere. Internet Archive, we've submitted things including an enormous pdf file we can't figure out how to do anything with An enormous pdf file isn't helpful, but having tools to dissect and analyze it, give enhanced access would make it useful. You end up reading/understanding/remembering it differently in digital format. They want to do different things, take scholarship in a different direction. Pushing the boundaries. They're looking to make their mark, have their work build upon others', hope that technology will help them do that, and if technology doesn't, they don't care so much. I'm a graduate student, I don't know what others are finding, but I value the ability to do non-traditional projects in a digital medium, exploring digital media for substantial scholarly use" (Ex 6a, 1d-D)

"Persistent trusted storage" (Ex 6b, 1a-G)

"Where do we put data & back it up. Trusted repository." (Ex 6b, 1a-G)

"Graduate student online networked, with server space," (Ex 6b, 1b-C)

"I want several full-time people to organize all of my information" (Ex 6b, 1c-A)

"We have a problem of data management? Who does what? Everybody is inventing everything in parallel." (Ex 7, 1a-A)

"conversion, versioning, repository archive services, metadata schema harmonizing, presentation and visualization, licensing, and citation". (Ex 7 flipcharts, 1a-C)

"sustainability, preservation, maintenance, succession ('estate planning'), migration" (Ex 7, 1a-D)

"Clearly need a foundation piece to get more resources (digital) available. So many things are resources we don't even know they're valuable yet. Trying to solve digitizing everything in this consortium won't work. What about the next layer, of the stuff that's being produced: ie. The standards of what's being produced." (Ex 7, 1a-E)

"Focus on special collections & archives; categorical & systematic ways of collecting data" (Ex 7, 1a-E)

"Perhaps Bamboo should concentrate not on tech and widgets, but on the tools for collaboration: e.g., shared repositories, etc." (Ex 7, 1a-H)

"Faculty have data sets: how to preserve them." (Ex 7, 1a-H)

"question about finding collections elsewhere, but can't get as digital? sort of. Often exist, but may just be images and need transcriptions. Digitization is a very common theme for them. advocates a lot for digitizing non-book materials, especially Goog and MSFT are doing so much in book space. Cal Bamboo help with this." (Ex 7, 1a-H)

"Bamboo might have a place if could provide services that make it easier to digitize/transcribe materials." (Ex 7, 1a-H)

"Faculty have data sets: how to preserve them." (Ex 7, 1a-H)

"services around data sets: munging tools, versioning, citation, etc. there are many old data sets around, dead or half-dead. Need tools to resuscitate these. ..and a lawyer to figure out ownership and permissions. often just get a screen scrape. Need better tools. moving from punch cards to something modern. Bamboo as a virtualization space for data sets and assoc. services." (Ex 7, 1a-H)

"Access to assets: standards based repository that can be federated; aggregation of data for discovery & re-use" (Ex 7, 1b-B)

"A volume of massive amounts of data: how do you look at that, get a grip on it, free it up so you can analyze it in a way that's valuable. It's agreeing on common practices that will facilitate other zappy things later. People most passionate about digitizing shouldn't have to be involved in the technology to distribute it when its' done - no sustainable incentives to do right thing with those resources" (Ex 7, 1c-C)

"Various strands - you can't digitize everything. But you can get people in touch w/ the state of play: what's available, who's doing what. It's a hybrid working environment. You work with what you've got, the value of this to me would be the communication and a place you can go or can come to you, on your homepage, to see what's going on. Form a nice synergy in the longer term; but how you balance in the early days is a problem. Marginal benefit - won't have long tail because funders get exhausted, passion only fills in a little gap" (Ex 7, 1c-C)

"In terms of repositories, institutions recognize these are their assets and might not want to hand them over. Notion of substrate of interoperable services that could guide to... Institution would participate by plugging in, not giving it to Bamboo." (Ex 7, 1d-B)

"how many have read acls report for cyberinfrastructure? found striking: isn't this what libraries (e.g. at harvard) already do? maybe we should link all these libraries together, given that we rent access to datasets. good for institutions with lots of funds. maybe bamboo could analyze carefully what infrastructure would work for the poor as well as the rich. one way we move from paper to digital (maps, e.g.) is by scanning, but at $20-30/sheet, hundreds of sheets, adds up. need to know who else has already done this. think of slide libraries already moving to scanned images, separately, when costs of moving slides into database with metadata is quite high." (Ex 7, 1d-E)

"business of bringing resources to poor institutions - rationale is we have colleagues at professionally remote organizations who are underperforming due to poor access to resources. pushing resources out benefits whole field." (Ex 7, 1d-E)

"a lot of repetition of scanning at different insts. has to do with copyright, unique artifacts. we've been talking about this for years. somebody needs to be organizing, promoting sharing, seeing how far fair use can be taken." (Ex 7, 1d-E)

"any cyber infrastructure must give access to content" (Ex 7, 1d-E)

"we're forced to work together because the digital is ubiquitous but must be highly structured. ubiquity does not mean independence. must be thought out in advance. "fascist" program of commonality, uniformity. potential objection: in doing this we are blinding ourselves, limiting ourselves... if we could all be google apps in the big google world... i don't think this precludes idiosyncracy" (Ex 7, 1d-E)

"How do you manage the information flow you have to deal with as a scholar? Pressure to keep up." (W2, Scholarly Networks, group notes)

"Recent research in the field of evolutionary and developmental biology have challenged conventional notions of the gene as a code script. This project intends to collect citations of important papers, copies of the original papers, video interviews of researchers, material artifacts, and written stories from those involved to document this change. The project will work as a online database giving access to materials, a workspace for researchers to tag and collect materials, an exhibit space where invited scholars put together exhibits, and an online means to gather stories participants. Currently the prooject is using Zotero as the citation management system and will then use Omeka as the database/exhibition/collection space." (SN-0035 Extended Development Project- Online Database on the History of Evolutionary and Developmental Biology, Phillip Thurtle, 12/23/08)

"TASK: Managing and integrating digital images of archival / special collections materials with secondary sources and data analysis tools to support the interpretation process.
AUDIENCE: Historians, any other scholars who work with primary texts (e.g., English, Comparative Literature, Religious Studies, Languages, American Culture, etc.)
WHAT: Research using primary sources entails research visits or remotely contacting multiple repositories and special collections to collect materials about a person, activity, or subject. Now, many archives and special collections scan photographs or documents and deliver these to researchers electronically. However, researchers have no good means of managing these images or ones they have downloaded from existing online projects, such as American Memory. In particular, there is no software available to help integrate the primary sources with secondary sources, personal or research group notes, transcriptions, or other applications, such as GIS etc. to facilitate the interpretation process.
HOW: For example, if I am interested in a specific Civil War battle, I may collect hundreds of diary entries, documents, and letters about the event from participants, their family members, and government archives. I need to manage these scanned images on my server and integrate this with secondary sources (citations and perhaps even the full books out of copyright that are freely available). I also want to map the materials using GIS software, where was the soldier on the battlefield, where was he from (community/city/state) and can I integrate census data about the socio-economic status of the locality, many of the diary entries are difficult to read so my research groups has transcribed portions and I want to view these side by side with the original. All this data manipulation is really needed before I can really begin any interpretation. In short, I need to establish a context for my subject and that involves triangulating information from multiple places in order to gain new insights and generate new knowledge. The big X here is amassing, organizing, and preparing these data; the Y would be the actual analysis.
HELPS: There are currently no standard applications out there so different scholars use personalized and idiosyncratic solutions that are not sharable – and this complicates subsequent re-use or sharing (and definitely collaboration). There are currently discrete applications that enable some types of management: Zotero, Treepad, Transana, Google maps, A.nnotate; but they only do pieces of this puzzle. The multiple application approach also means that one must do multiple searches to get information and there is no interoperability for data exchange. Lots of duplicative data entry to maintain consistency across these data.
NEED: The technology I have in mind would not only be able to integrate information in different formats from different sources (with metadata) but also be able to search across the different types of information and help make new connections.
PLAYERS: Creating and maintaining this resource would require a variety of institutional players. Librarians, archivists, and curators would be data providers, so they would have to provide interoperable data in digital formats. There is a large programming role here for computer scientists to integrate existing tools (e.g., Zotero, A.nnotate), and academic technologists would have to support this program." (SN-0018 Plutarch - Portal for Learning and UndersTanding ARCHival sources)

"It is indeed very difficult to use for performing art digital content cataloguing the traditional library identification and classification models, since vastly different interrelated types of media are present in performing arts works - video, audio, images, drawings, blueprints, texts, music scores, lyrics, 3D models, etc. Thus, there is a pressing needs to define an integrated specific metadata model and standard for performing art digital content and works. Moreover, oftentimes Performing Arts archives were born as simple repositories of recordings without effective links to digital content (this is especially true of company archives). Performing art digital content is in the 90% of case comprised of unique copies of content since each experience is unique and rarely replicated in other locations. The integration of them in a unique European Digital Library would be of great value for the valorisation of the European Culture and will be a great service for all the Performing Arts institutions. This lack of a systematic cataloguing is another reason why both users and participating institutions would greatly benefit from improved interoperability and organisation - in the form of a common metadata standard and of easy search and access." (SN-0032 E-PALS- European Performing Arts Library Association, Raffaella Santucci, 1/7/09)

"Fruition of performing-arts content in libraries is difficult:
(i ) several formats and metadata types are used to store and catalogue films, tapes, images, texts, scores produced in many languages, which makes search and access cumbersome and expensive;
(ii) for many reasons, performing-arts archives are scattered around the EU and mutually disjointed;
(iii) scattered content should be bundled together in order to achieve the added value for the user that comes from accessing a single large online repository;
(iv) performing archives need to have a proven model to work together.
E-PALS is conceived as a best-practice network for solving these problems, providing:
(1) strongly interoperable tools for automated gathering, preparation and posting of content and metadata, connecting several performing-arts libraries in a network;
(2) thousands of hours of content of outstanding importance (including Nobel winner Dario Fo), available online for the first time;
(3) highly functional, intuitive interface satisfying user needs;
(4) MPEG21-based solutions for the integration of content and metadata, compatible with a large number of metadata models and content formats;
(5) best-practice guidelines for putting content on the EDL, defining multilingual metadata, mapping metadata, and IPR/DRM;
(6) integration of user contributions for tagging and titling;
(7) a self-sustainable framework for content distribution to extend collections, based on offering copyright owners unwilling to grant free access the possibility to exploit commercially the items that they will provide - this is done with the view to enlarge the choice of content given with materials that wouldn't be available otherwise, not even commercially." (SN-0032 E-PALS- European Performing Arts Library Association, Raffaella Santucci, 1/7/09)

"The EVIA Digital Archive Project is a collaborative effort to establish a digital archive of ethnographic video recordings and an infrastructure of tools and systems supporting scholars in the ethnographic disciplines. With a special focus on the fields of ethnomusicology, folklore, anthropology, and dance ethnology, the project has developed a set of tools and systems for use by scholars and instructors, as well as librarians and archivists. Since its inception in 2001, the archive has been built through funding by the Andrew W. Mellon Foundation, Indiana University, the University of Michigan, and the collaborative efforts of ethnomusicologists, archivists, librarians, technologists, and legal experts. The primary mission of the EVIA Project has been to preserve ethnographic field video created by scholars as part of their research. Its secondary mission is to make those materials available in conjunction with rich descriptive annotations that create a unique resource for scholars, instructors, and students. Project staff and contributors have created a support system and a suite of software tools for video annotation, online collection searching, controlled vocabulary and thesaurus maintenance, peer review, and technical metadata collection. The EVIA Digital Archive Project Summer Institute provides the principal channel for individuals depositing ethnographic materials in the Archive. The Summer Institutes are a wonderful opportunity for depositors to spend two weeks on the campus of Indiana University focusing on their own work without the distraction of every-day commitments. During the Summer Institutes of 2004, 2006 and 2008, EVIA depositors spent their days writing descriptive and analytical annotations time-coded to digital video images through the use of an EVIA Project software interface developed specifically for the project, called "Annotators Workbench." (SN-0055 Video Preservation, Annotation and Publishing for the Arts and Humanities - The EVIA Digital Archive Project, Alan Burdette)

"Tracing DNA in parchment to trace movement across medieval systems of commerce. Lots of interest in things in the humanities. When we deal w/ things in digital representation, have problems of re-presentation in digital. These come up a lot in digital humanities. High level of abstraction - want to engage in prosaic level. With text objects as example, but any of the problems are ones you'd encounter in your own flavor in other media. For text, there's TEI - maintains guidelines for literary and linguistic text encoding. International/interdisciplinary standards used by widely and inconsistently used by libraries, museums and publishers." (W3, Perspectives: Content, John Unsworth, Dean, Graduate School of Library and Information Science, University of Illinois)

""Irtnog efsitz". Comes from a short piece that EB White put in New Yorker, November 30, 1935. Arguing that now we have reviews, soon we'll need reviews of review. Until there's one word that would summarize everything that happened on one day. When one tries to be totally inclusive and concise everything into one thing, you get something that could be an accurate summary but not terribly useful. Awful lot of degrees of freedom in tech opportunities. If we try to capture all of it and embrace it all, we'll end up with this." (W3, Perspectives: Information Technology, Greg Jackson)

"Very broadly speaking, the Bidwern project is concerned with digital and environmental mapping. In various ways it seeks to digitally record and preserve aspects of what is happening across the spectrum of these new endeavours on the Plateau with a view to dissemination and use now and in the future. Processes include observational use of digital video, language, oral history and music recording, and GIS mapping of rock-art and other sites. Of great importance too is the assembly and collation of many hours of recordings, thousands of photographs, texts, maps and so on that have been created in the past but which have not been brought together as a documented, inter-related corpus of information.
A key challenge for the Bidwern project stems from the diversity of digital information involved - both in terms of media formats and, more particularly, in the nature of the content of information deriving from so many disparate sources and different disciplines. Although some, indeed much, information can be mapped geospatially, the project also needs a broader sense of "mapping" by which information will ultimately be able to be retrieved and analysed in new ways unforeseen by any one of us entering resources from the standpoint of our own interests. Of absolute importance is the need for the Bidwern project and its outcomes to belong to the rock country, in a way to be a part of its future. If it isn't used by, and useful to, all the people who care for this area - both now and in years to come - there will not have been much point." (SN-0030 The Bidwern Project - Digital recording and archiving on the western Arnhem Land Plateau, Kim McKenzie, 2/22/09)

"Questions of scale are raised in Gregory Crane's 'memography' (Scholarly narrative 0033), where I find myself in a very familiar disciplinary terrain of Nachleben or the survival of ancient literature. Trace Plato through myriads of pages of European writing, perhaps the 50 billion words printed annually in American newspapers of the 19th century. Here we find the application to scholarly agendas in the humanities of techniques that were developed in the very different domain of information retrieval such as
1. named entity analysis (telling one Plato apart from another)
2. sequence analysis that lets you identify quotations
3. collocation techniques that let you identify semantic clusters associated with particular people or topics
4. machine translation
5. visualization routines
Crane's interest in multilinguality is very much rooted in his experience as a classicist. In the humanities, Classics remains a distinctly cosmopolitan discipline, with a backlog of a multilingual secondary literature that is likely to remain relevant for decades to come. Projects of this kind point towards the utility of cross-national funding efforts, for which there are precedents in cooperative ventures betwen JISC and NEH or the NEH and the German DFG." (Tools & Content Partners working group, Analyzing Scholarly Narratives, Martin Mueller, 3/27/09)

Bamboo tags: 

Add new comment