Availability of data

"I can find material online never easily accessible before. I was doing research found a prisoner number, found the prison, sent the information to them, and was able to pay for them to send me a copy of the documents. To be able to get materials rapidly from home rather than traveling, going the archives, etc." (Ex 2, 1d-B)

"What are the current practices that might be facilitated by new technology? Provide alternate means of access to rare collections through technology and visualization. Possibility of virtualizing some data sets and collections can play a key role in humanities scholarship. Much art scholarship can be done with prints, as one day one might be able to do architecture using visual models. There still may be value in visiting the original. Using library, using surrogates (like Google). Cultural restitution and reunification of artifacts. Must visit physical artifacts is a crucial part of research. Assemblages of sensitive artifacts can happen virtually." (Ex 4, 1c-B)

"I want a magic room, where I have access to information, everything in the world digitize." (Ex 6b, 1a-A)

"It's in the air. My vision, the future will be that the libraries and museums will be together as repositories of digital content. Michael will have a site pop up on his screen and all the content will be there. The sociological barriers are formidable." (Ex 6b, 1a-C)

"If all the repository information could get from the desktop of scholars. The repositories should not be siloed at any level. The other thing is if we could map publishing onto layers of publishing." (Ex 6b, 1a-C)

"Access total repository of all humanities material that is CURRENT... drafts of articles, conference papers, publications, etc., with citation index that is NOT months or years old ... A "highwire" equivalent of humanities secondary sources ... Like jStor etc. but current, today, and hyperlinked to cited items" (Ex 6b, 1a-F)

"Everything digitized. (We've invested a lot in this.) Not just digitized, digitized well. They want access to everything. People get a lot of mileage out of even things not digitized well." (Ex 6b, 1a-G)

"Everything digitized" (Ex 6b, 1a-G)

"more funding to digitize objects than to create finding aids for physical objects? -- need financial support for metadata, maintenance, discovery tools" (Ex 6b, 1b-A)

"An archivist who is conversant with my discipline. Also the labor to perform the grunt work of digitization." (Ex 6b, 1b-B)

"Be able to afford a digitization platform. $12K per year is too much!" (Ex 6b, 1b-B)

"And fewer barriers to access to collections that are commercially owned." (Ex 6b, 1b-B)

"More stuff digitized without cost-restrictions and with higher quality." (Ex 6b, 1b-C)

"Quick and easy digitization of texts. So that we can offer a glossed texts." (Ex 6b, 1b-D)

"Sustainable ways to gather baseline data about student writing practices. Gathering reservoirs of samples of student data, and finding ways to slice and dice that data. Multimedia makes that proposition even more challenging." (Ex 6b, 1b-D)

"High resolution art photographs, easily accessed. May be there on ArtStor, but not quite sure how?" (Ex 6b, 1c-A)

"Authors insisting that books be published alongside a keychain drive, or CD that is searchable, along with libraries putting more full-texts searchable online. No reason for publishers to not do this. More incentive for publishers to provide service. Perhaps if more authors would ask about this, it seems to work with publishers. Publishers find that this increases hard copy sales, since most people don't want to read full books online. Publishers can be more agreeable if authors would pursue this. It is a changing landscape, a bit like music. Most musics are now freely available digitally, so publishers may change their attitude as they're forced to. This happens in music with LPs being packaged along keychain drives. The good thing about digital releases is that they can consistently be edited if there are typos. Not preserving mistakes - F15" (Ex 6b, 1c-A)

"Wikipedia effect for mapping, or art images which would somehow be instantly available" (Ex 6b, 1c-A)

"Digitization. Someone please convert files, put them into a database in a way that's flexible. I use analog audio and 35mm, so digitization is essential but a lot of work. It's easier right now to flip through 3-ring binders, but this "data" is not as persistent. I need a useful retrieval database that's as easy to use. I want the technology option to happen in the same time as my analog -- loading a 40-slide carousel takes time T, and so should the digital alternative, while still providing accessibility, reproducibility, and other advantages of digital." (Ex 6b, 1c-B)

"more resources for digitization. Not duplicating what others are doing" (Ex 6b, 1c-C)

""Free information", but when you look at free vs. paid, services are not cheap - everything seems to be going towards the worst" (Ex 6b, 1c-C)

" image scans that auto-update to new resolution cameras can produce. Magic wand - digitzation of text and OCR problem, would be useful" (Ex 6b, 1c-C)

"What about preservation of diverse materials? That's the real magic wand. Assurance of preservation and migration, and have the resources to do it. (J1) In the library world we have large collaborative and national initiatives and organizations with scattered missions, but some entity and collection of entities with a focused mission on preservation and migration needs to emerge. There are LC and Mellon funded programs in this area - JSTOR, Portico, etc." (Ex 6b, 1d-A)

"Bottom layer of that layer cake is content, research is based around some initial matter. One major impediment is scholars have less access now to stuff that might be of research interest. Worse at capturing and making accessible the world's content output. I'd find infrastructures that'd get us out into the archives of the world" (Ex 6b, 1d-C)

"Then there's delivery, how do you access it? Easier remote access. Not always possible without traveling to somewhere to use it." (Ex 6b, 1d-C)

"access those things no matter what institution i'm at, regardless of location, resources" (Ex 6b, 1d-D)

"would like that for film studies. tap into every film made, copyright poses a problem for this. MIT open courseware can't make films available because of copyright. people doing research on youtube, not the method we'd like but it's available. need to extend availability, adjust IP law" (Ex 6b, 1d-D)

"conference in may on supplementary materials for journal articles, talk of trying to get data sets out of cancer researchers, faced resistance about turning over data sets. journals should require turnover of datasets for publication. code of best practices proposed. as editor, do i have to edit datasets now? "dataverse" solves this for journals. where to put datasets, versions them." (Ex 6b, 1d-D)

"Intellectual Property -- copyright laws with shorter more rational lengths to allow commercially published material, passed the Orphaned Works Law" (Ex 6b, 1d-E)

""every object need to be accessed would be represented in the open scholarly information universe to a persistent surrogate" -- this will transform citation, ubiquitous infrastructure and record keeping" (Ex 6b, 1d-E)

"better, faster, stronger, cheaper way of getting physical object digitized, especially handwriting recognition; OSR - optical SCRIPT recognition" (Ex 6b, 1d-E)

"Clearly need a foundation piece to get more resources (digital) available. So many things are resources we don't even know they're valuable yet. Trying to solve digitizing everything in this consortium won't work. What about the next layer, of the stuff that's being produced: ie. The standards of what's being produced." (Ex 7, 1a-E)

"Focus on special collections & archives; categorical & systematic ways of collecting data" (Ex 7, 1a-E)

"question about finding collections elsewhere, but can't get as digital? sort of. Often exist, but may just be images and need transcriptions. Digitization is a very common theme for them. advocates a lot for digitizing non-book materials, especially Google and MSFT are doing so much in book space. Cal Bamboo help with this." (Ex 7, 1a-H)

"Access to assets: standards based repository that can be federated; aggregation of data for discovery & re-use" (Ex 7, 1b-B)

"In terms of repositories, institutions recognize these are their assets and might not want to hand them over. Notion of substrate of interoperable services that could guide to... Institution would participate by plugging in, not giving it to Bamboo." (Ex 7, 1d-B)

"how many have read acls report for cyberinfrastructure? found striking: isn't this what libraries (e.g. at harvard) already do? maybe we should link all these libraries together, given that we rent access to datasets. good for institutions with lots of funds. maybe bamboo could analyze carefully what infrastructure would work for the poor as well as the rich. one way we move from paper to digital (maps, e.g.) is by scanning, but at $20-30/sheet, hundreds of sheets, adds up. need to know who else has already done this. think of slide libraries already moving to scanned images, separately, when costs of moving slides into database with metadata is quite high." (Ex 7, 1d-E)

"any cyber infrastructure must give access to content" (Ex 7, 1d-E)

"a lot of repetition of scanning at different insts. has to do with copyright, unique artifacts. we've been talking about this for years. somebody needs to be organizing, promoting sharing, seeing how far fair use can be taken." (Ex 7, 1d-E)

"A long time before individual scholars have all resources they need to do their work. Have to facilitate process of digitization for people working with a particular corpus of materials." (W2, Analyzing Directions, Group L)

"Here is a 'demonstrator' that may be relevant to Bamboo but doesn't depend on it and can't really make use of the offered help. Katrina Fenlon, a graduate student at UIUC's Information and Library Science school, will do a practicum with Tim Cole, Alan Renear, and me in early 2009. The Rare Book Library at UIUC is thinking of digitizing its valuable collection of 19th century British novels. Can the resultant OCR texts be transformed with an acceptable level of human labor into TEI texts that will play well with existing digitized collections, such as Early American Fiction or the TCP EEBO, ECCO, and EVANS projects? If the answer to that question is positive, one can see a five- to seven-year project in which high-value texts in the public domain become part of a very large and fully interoperable collection of texts that will form a single document space for many inquiries. The approximately 50,000 texts from the TCP collections will pass into the public domain after the middle of the next decade. Add a large collection of similarly encoded 19th century texts and you have what you can call a Book of English or cultural genome of written English from 1475, the date of the earliest printed book in English, and 1923, the current copyright cutoff. Katrina will work with a handful of 19th century novels. She will look at an XML output in which the very mechanical XML generated by the OCR process has been transformed into a TEI format, which will require various forms of adjustment. She will look for the necessary adjustments, keep a careful log of what needs doing, and think about repetitive steps that may lend themselves to algorithmic treatment. On the basis of her experience we will try to figure out realistic estimates for human editorial labor and think about a model of distributed editing that would allow for a user-driven process of adding to the Book of English. There are terrific opportunities for volunteer labor in this field, provided one can construct a sufficiently user-friendly and network-based editing environment. I should add that in the Monk project we have solved the problem of transforming TCP texts (and similar texts) into a linguistic corpus where a single morphosyntactic tagging and lemmatization scheme has been applied to texts from the late 1400's to the early 1900's, thus making those texts fully comparable at the level of their metadata. If Katrina's experiment gives us hope that OCR texts can with acceptable human intervention be turned into good enough TEI editions, these editions can without trouble be morphosyntactically tagged and become part of a Book of English. That is a lot of interoperable content." (Tools & Content Partners working group, Fleshing out ideas for Demonstrators, Martin Mueller, 11/28/08)

"The model we have discussed is a web-portal integrating web archives and analytical tools and e-publishing, in consultation with applied multi-disciplinary organization(s) such as the Association of Moving Image Archivists. I attended the recent AMIA convention, and this idea was met with enthusiasm. One take on the process to follow would be to locate historical newscasts already online; advocate for additional newscasts to be placed online; cultivate tool sets for tagging and metadata; investigate possible tools for precise citation and other scholarly notation; deliberate about possible licenses for mash-up capabilities, etc. This project would require skills across the full complement of Bamboo's constituency: Humanities, IT, and library personnel. Because of recent amendments to Section 108(f)(3) of the copyright law, there may be a key role for libraries as points of distribution. It is assumed that we might start small, working with local and regional collections, and aspire to work with major network news collections. (Could we eventually posit, for example, an Academic Hulu that provides fair use scholarly access to news libraries, etc?)." (Tools & Content Partners working group, Fleshing out ideas for Demonstrators, Mark Williams, 11/28/08)

"Over the centuries Buddhist monasteries have housed thousands of "books" of Tibetan Buddhist literature, each composed of the print from several hundred
woodblocks. As was common in the early phases of the digital revolution in the humanities, many of these have been scanned - approximately four million page
images are available at www.tbrc.org - but many have not. First attempts at transcribing and collating these vast collections and creating a simple index, all done
by hand, have proved expensive and time-consuming. With an investment estimated to be in excess of $1M, less than 2% of Tibetan texts have been input." (SN-0051 Tibetan Buddhist Literature Scenario, Jim Muehlenberg)

"Doing research with inscriptions is much easier with a digital corpus. Using print editions a scholar has to read through many volumes, as the material has been published over many years - some corpora have been in continuous publication for over 100 years. The other significant improvement that a digital corpus can offer are high quality images of the inscriptions." (SN-0034 Finding and using inscriptions- Building a corpus, Elli Mylonas, 12/18/08)

"The Internet provides unprecedented new ways of compiling and publishing this information [Catalogue Raisonné is "a monograph giving a comprehensive catalogue of artworks by an artist"] as a dynamic, collaborative, ongoing process. A seed catalogue containing known information, perhaps from an already well-documented collection or exhibition catalogue can provide a model for the entries and can form the basis for gathering new works and information. One of the important characteristics of this method of research is to make the seed catalogue discoverable on the Internet. This can be a potent way of attracting potential collaborators and also of enabling non-researchers - dealers, buyers, sellers and private collectors - to discover the project and contribute information on works they own or that have passed through their hands. Some may become researchers in their own right and contribute directly to the growing catalogue. Others may provide information via more traditional methods - letters, email, telephone discussions, visits. Support for a creative commons approach might be the default for this tool. However, there would need to be a minimum level of access control for inviting contributors and approving contributions. At some stage in the process, a version of the Catalogue Raisonné might still be published in the form of a high quality printed monograph." (SN-0002 Technology Support for Collaborative Development of a Catalogue Raisonné, Judith Pearce, 1/5/09)

"A rich audiovisual heritage exists in the field of Performing Arts: collections of films, videos and audio tapes, pertaining to shows, film versions of plays, rehearsals, seminars, workshops, interviews are scattered around the world and they form an invaluable repository of knowledge. The lack of tools to remotely access these resources, coupled to their limited marketability in terms of selling them as videos, means that these unique resources are, for all practical purposes, 'locked off' from the circulation of knowledge. It must be pointed out again that these records really are central to the study of the Performing Arts." (SN-0038 IM-Theatre, Interactive Multimedia Theatre, Raffaella Santucci, 1/7/09)

"A third problem, although not directly related to the inaccessibility of media collections, is felt to be important: to provide tuition to students who are unable to personally attend lectures. This can be due to a variety of reasons (working students, students living too far away to be able to commute, etc.). In any case, remote tuition in the form of e-learning is part of the remit of many institutions to provide equal opportunities and, in particular, easy access to their courses and the related materials." (SN-0038 IM-Theatre, Interactive Multimedia Theatre, Raffaella Santucci, 1/7/09)

"In order to perform tasks #1 and #2 [1) Find all mention of these artists in texts that date to the sixteenth and seventeenth centuries. Such material primarily includes the collected writings of individuals, local and imperial histories, and gazetteers. Read and translate such material. 2) Because these painters were categorized with the label "Zhe School" at some point in the 17th century (this label was construed as perjorative), I also need to find all uses of the term Zhe pai 浙派 in texts that date to the sixteenth and seventeenth centuries. Read and translate such material.], I currently have to find all collected writings (wenji 文集and biji 筆記), local and imperial histories and gazetteers in print form and examine the table of contents (if one exists) for titles of texts that might relate to painting and then look at those individual texts. This is tedious and extremely time consuming. Because the closest research university (University of Minnesota) now has the electronic imperial library from the 18th century, the Siku quan shu, which contains all books extant at the time and not subject to censorship, I can electronically search for artists' names and other terms with vastly more efficiency and speed. The problem is that the University owns the CD-Rom version, which is only installed on one workstation and is only accessible by driving an hour to the university when the limited hours of the East Asia Library are open (they are not open on weekends). There is also no printing facility available for the terminal. There is a Web-based version of the Siku quan shu and ideally access to this would enable me to do my research better and faster. This is very expensive, and my institution (small liberal arts college) simply cannot afford a subscription. Evidently, it was even too costly for the University of Minnesota to consider. There are also other electronic databases of historical texts that might be useful to me, mostly from Academia Sinica in Taiwan, but again, my institution cannot afford access. Databases of scholarly articles in Chiense also exist and the University of Minnesota subscribes to some, but I need to go there and download to PDF files to disk. While my situation could be worse, lack of easy access to the Siku quanshu database due to the fact that it can only be used on one computer terminal during the work week when I teach makes using this revolutionary tool very difficult." (SN-0013 Limited Access, Quality and Technology Support for Historical Chinese Painting Collections, Kathleen Ryor, 1/9/09)

"For tasks #3 and #4 [3) Examine all extant attributions to these painters, with particular attention to any inscriptions and seals by other contemporary figures who either saw or owned the work. 4)Examine anonymous paintings attributed to the Song dynasty and anonymous paintings of the Ming dynasty that exhibit the styles of these artists in order to look for seals of sixteenth century individuals.], print sources do exist that reproduce all Chinese paintings in public collections (and a few private ones), and they have indices. The problem with this is that the individual photographs in such print sources are tiny black and white thumbnails for the most part. Thus, the inscriptions and seals are not legible. In the end, I need to see all works of potential importance to the project in person. This may not be feasible, but the specific technology that would best support my research in this area is the high resolution scanning of Chinese paintings in all museums worldwide . This would necessarily have to include any colophons attached to the original work of art. Then if one could gain access to such databases, it would be possible to save time and money by eliminating extensive travel. Even if only the museums with the largest and/or most important collections digitized in this manner, it would still greatly improve my ability to conduct research on this and other similar types of projects." (SN-0013 Limited Access, Quality and Technology Support for Historical Chinese Painting Collections, Kathleen Ryor, 1/9/09)

"The current portion of the research requires access to full-text databases of nineteenth-century newspapers. Proquest Historical Newspapers is the most reliable, but contains only 11 papers, all major dailies. What makes this project possible is the rapid development of full-text archives for genealogy research. These archives are developed from microform, and the full text OCR is very unreliable. Currently the two fullest archives are NewspaperArchive.com and GenealogyBank.com, but there are numerous smaller databases as well. So my process is to locate a poem (as soon as the procedure is set I will work from one large daily, covering a month at a time), select 2-3 word search phrases, then search NA and GB for them. On the result lists I have to verify each hit visually because both databases are notoriously incorrect on dating. I must search on multiple strings because of the unreliable text. And I can't do a single search for both databases-there is no search aggregator. Currently I do not keep a copy of each hit PDF due to file sizes and some poems have 50 reprints spanning a decade." (SN-0014 Tools to Aid Search, Review and Citation of 19th Century Newspapers, Clai Rice, 1/9/09)

"The Mellon Foundation's various initiatives have laid a solid foundation for the future of scholarly publishing in the humanities with JSTOR (as well as its contemporary cousin, Project Muse), and for the future of the scholarly research process with Zotero. The latter is particularly interesting because we now have a solid tool for individual research that will, I believe, shortly also offer the possibility for making research "social," creating new forms of collaboration and innovation heretofore the product of combing through footnotes or chasing someone down at a conference. (More on this in a moment.) Humanities scholarship is the study of complex artifacts in the service of understanding human nature. What humanists need are these artifacts as well as the variety of information "clouds" (culture, history, biography) that surround them in order better to understand how the artifacts refract/reveal human nature. The kinds of artifacts humanities scholars work with vary by discipline. In some, the data is widely available as already published texts; in others, the data for their research is not as readily available but still secured in various kinds of archives — museums, yes, but also local courthouses. ARTstor provides a solid foundation for scholars researching materials found in conventional arts collections. But what about those humanists who create their own data? There are scores of verbal and material items that will never grace the pages of most books and will never be catalogued in any collection. These are the focus of documentary efforts by scholars in the humanities disciplines of folklore studies and oral history, fields which blend over into the human sciences of linguistics, anthropology, psychology, and cognitive science. In the future, they will have DATAstor." (Tools & Content Partners working group, DATAstor, 1/12/09)

"Use case for DATAstor: A scholar of American social history has just returned from an interview with an individual involved in designing the Higgins boat, the signature landing craft of WW2. She has a recording of the interview, captured on an Edirol R-09 and thus sitting on an SD card as a WAV file, and she has the scan of sketch the individual had kept of an early version of the craft, captured on her laptop as a high-resolution TIFF file using a portable Canon LED scanner. As a member of the Oral History Association, she has access to the on-line archive the Association maintains of oral history materials. She fires up her web-browser and accesses the archive using the friendly, Zotero-like interface. She clicks on the collection that bears her name and types in her password to authenticate herself as a content creator. She types in the various metadata suggested by the extant fields in the database, including the individual interviewed, the date and place (given both as the human place of Gretna, Louisiana but also as the geo-coordinates) of the interview. As content creator, she knows the application will automatically fill in those fields that associate her own part in the process with the data she is publishing.) The first record she creates is for the interview, which she outlines from memory, with a notation that the text is just that, and she then uploads the WAV file, knowing that the DATAstor application will, when it makes the file available to others, automatically transform the file into a compressed version with proper watermarking. Since this is research early in the process of a book she is working on, she marks the data to remain private for the next year, whereupon others will have access to the recording. She, however, is more than happy to have the metadata made public, because someone else working in an adjacent field may come across this entry and contact her with a question which could lead to an interesting dialogue or with some information she could use right away. The second record is for the sketch, and so she tells the DATAstor UI to duplicate the previous entry but still to create a new one with the TIFF file attached." (Tools & Content Partners working group, DATAstor, 1/12/09)

"DATAstor use case: Hundreds of miles from our scholar, and with a different set of concerns about data, the archivist for the Urban Appalachian Center in Cincinnati, Ohio has opened up a box of papers left to the Center by an eminent sociologist who had been about to throw away a lifetime's worth of research until someone told her that the UAC would be able to make the research available to others. The box contains letters, field notes, and photographs — some of which have careful annotations about who is in the photograph and others that do not. In cooperation with a local university, the UAC has a dedicated server with the DATAstor application on it and a reasonable amount of room for attached files. Our archivist considers possible uses of the materials and the limitations of their current infrastructure, he decides to scan and upload the images, but that he will OCR what he can of the texts — some of the field notes are typed (yay!) — and then do the rest himself, reading hand-written letters and notes and typing them in, as he has time. Later in the week, the photos are at least up, with some annotations still to type in, and our archivist has poked at a few letters and other documents, but there really isn't that much time. He has just finished figuring out how many months it's going to take if he dedicates an hour a day to the task when in walks a young linguist, who is interested in the written works of Appalachians. Through the scholarly grapevine, she has heard about the eminent sociologist's materials being here and wonders if she can look through the collection in hopes of discovering a few letters from some Appalachians that the scholar worked with. They would be a real boon to her research into the differences between oral and written discourse among ethnic minorities. The archivist is glad to take the box from her desk. As our young scholar reaches down to pick it up, she notices the DATAstor UI on his screen. "I'd be happy to input any materials not yet catalogued," she says. "I'll be typing up whatever I use anyway, since I'm using a particular XML format my advisor has recently shown me." "I'll set you up with a limited user account on our system," he replies. As she sits down at a nearby table, he goes to an administrator page of the DATAstor application and enters information for her that gives her limited abilities to enter content, and, just as importantly, alerts him to the new content for him to review before making it public. Later, when she leaves with some photocopies of documents, he gives her her login information for the UAC's DATAstor application and reminds her that the account will automatically expire in 30 days." (Tools & Content Partners working group, DATAstor, 1/12/09)

"A senior professor in American Indian Studies whose work deals with the popular representation of American Indians struggles with the question of how best to organize, preserve, and make accessible both the history of the department ( founding documents, curricular and language materials, and materials collected from local Native organizations and communities in Minnesota), as well as the corpus of primary documents collected over the course of her own research career. This unique collection, mostly picture postcards, tourism materials, and other paraphernalia collected at rummage sales, private sales, and from EBAY, consists of over 21,000 images of American Indians and more than 50,000 American ethnographic images. Currently, the collection is inaccessible to other researchers, and generally unknown. It is organized in local file cabinets and organized according to geographic region or sub-region, tribal name, time period, and publisher or author. Some rare and fragile materials are kept in special cases for protection. All relevant information about the individual documents are catalogued and stored on a local, personal computer, but none of the images or materials have been digitized by the scholar. Though the scholar intends eventually to donate the personal research collection to an institution that specializes in the field of American Indian Studies and history, there is no immediate sense of when the collection would be publicly accessible. In the interim, the scholar is unsure how to begin the massive digitization project, as well as how to build the kind of website that would appropriately display and describe the materials. Rights issues, often a thorny process for book and article publication, is a process the scholar is reticent to take on without support from publishers, research assistants, and the like. Though some digitization and technical support is available through the University, this scholar, and many other scholars, are unaware of the resources that are available. Further, there is inadequate financial support for the cost of such projects, as well as for the cost of research assistants who would likely have a primary role in the production and maintenance of a such a website. The future of the department's holdings of unique materials relating to the department's history and the history of local Native groups and communities is even more uncertain. Currently, boxes of uncatalogued materials, and some web-based materials are kept in the department's limited storage space or are in the possession of individual faculty members. There is no substantial organization of these materials, which are not available for use by students or other scholars. Despite the University's efforts at promoting the University Archives for preservation of such collections, this scholar has some apprehensions about donating the collection to them. The exception to this predicament is a small collection of web-based materials that trace the material foundations of Native language and cultures. The department of American Indian Studies and the Minnesota Historical Society are working to make this collection available through the Minnesota Historical Society." (SN-0043 Personal Research Collections- Data and Archival Preservation and Access in the Humanities and Social Sciences, Cecily Marcus)

"Methods of organization are haphazard, idiosyncratic, and often bordering on untenable. At the same time, researchers engage in more structured and intentional activities---scanning and digitizing archival materials and working with experts to make those materials accessible online or through searchable databases; storing large data sets and thinking about how to preserve data from multiple media; building substantial archival collections with idiosyncratic organization and naming practices; and sometimes planning to donate materials to a specialized archive or institution. Often, the fate of these collections is dependent of the individual skill set of the researcher or one of their close colleagues (often an administrative employee who happens to know something about, say, Photoshop). There is little systematic knowledge of how to go about preserving and making accessible a collection. The scholars how engage in complex projects to do so, often with the support of small grants, are rare. Further, some scholars express reticence to share "too much" of their research collections until their careers are firmly established or even approaching their end." (SN-0043 Personal Research Collections- Data and Archival Preservation and Access in the Humanities and Social Sciences, Cecily Marcus)

"Discussion earlier - obvious ways in which we each contribute (resources: money, people time = money, expertise = people time = money) might get back tools/services, but tools/services are nothing w/o content. In selling the plan, we need to remember we are all providers of content and collections. Content is being neglected in the big picture. Tools/service facilitate access to everyone's content - implies being plugged into standards PB might define; leads to implication that whatever else it is, PB grid is a grid of rights of some kind; sharing content rights. Content becomes another currency of involvement and participation. This could be a hook for individuals and institutions alike at every scale" (W3, Table Discussions of Consortial Model 2, Table 9)

"How should we think of JSTOR environment? Acceleration/transitioning of content into electronic format for greater usage. Content in a digital form so you don't have to travel to see it. Helped provide savings in community - shelf space, making content more discoverable by scholars across the globe. Also, important globally: broadening access so eeryone can get access to it. Expanding our outreach to make it available all over." (W3, Perspectives: Content, Timothy Babbitt, Chief Information Officer, JSTOR)

"Lower the tech barrier. A lot more files are there than are immediately obvious from library UI. Make all those files discoverable, visible to people. Sometimes we'll get a page image with OCR, but you can't download the OCR (just for searching). Re-OCRing is ridiculous. Structural files that hold all the pages, describe all the relationships - should be able to give those to you. Lots of issues involved; should also be able to give you high-quality images, we keep these off-line." (W3, Perspectives: Content, Stacy Kowalczyk, Digital Library Program, Indiana University)

"Not only problems in giving data, but also would like to receive data from researchers. That's another problematic area. Most with digital library programs/content have repository systems. Preservation is a big part - persistent data store that helps maintain integrity of individual files + files that make an intellectual object. Persistent access to those. Most libraries don't have tools/processes for individuals who aren't associated to contribute content - this is a real problem. This would be helpful to make the round-trip of data from library to researcher and back. We would like to do this for different types of data - individual additions but also community and reference collections. Not just a tech problem, also a policy problem. Lib don't have collection models for dealing with taking digital assets." (W3, Perspectives: Content, Stacy Kowalczyk, Digital Library Program, Indiana University)

"Primary source material that comes from a number of anything you can think of. Need to be judicious about what we decide to invest in. Auction catalogs - each catalog going back to 1600s so you can look at gavel price of a piece of art through all major auction houses in the world. That's primary source material, even though it wasn't thought of that way. Rock art to plant specimens. If you try hard enough, "every closet is a walk-in closet" -- everything is a resource content." W3, Perspectives: Content, Q&A, Timothy Babbitt, Chief Information Officer, JSTOR)

"I was particularly intrigued by Scholarly Narrative 0051, the construction of a fully digitized corpus of Tibetan Buddhist writings originally encoded on wooden blocks. What value do you you add if you can take all or most of the writings of a culture and transform them into a digital corpus that not only emulates the sequence of glyphs in the original but adds bibliographical, lexical, semantic, morphological, and syntactic metadata so that the original body of writings in its digital medium becomes an enhanced surrogate with affordances for inquiry that far exceed the original? This project takes you through the stages of this process from the first step of digital images through manual transcription (too expensive), optical character recognition (accurate enough?) to algorithmically applied annotation. A wonderful project, although I suspect that in the end its execution will require more editorial human intervention at all stages of the process. It is useful to remember that the 'thousands' of Tibetan books add up to a quite small collection of highly curated data. Tens (but not hundreds?) of millions of words, but they will all fit comfortably on a laptop, metadata included. Digitization is in many ways easier with dead cultures where you have a limited number of documents, and the project of digitizing all or many of them is a manageable task in terms of scale --never mind the scholarly labor that classicists, Assyriologists, or Anglo-Saxonists don't mind lavishing on the objects of their research. The Perseus project, which includes just about all major texts of ancient Greek from Homer to the Second Sophistic, is just five million words. The Thesaurus Linguae Graecae (TLG), which includes most written Greek from Homer to late antiquity, adds up to ~100 milion words but will fit comfortably on the flash drive of a digital camera." (Tools & Content Partners working group, Analyzing Scholarly Narratives, Martin Mueller, 3/27/09)

"Two scholarly narratives (0034 and 0039) concern ancient inscriptions. 0034 begins with the sentence "Doing research with inscriptions is much easier with a digital corpus." Indeed. Inscriptions, by their nature scattered documents, do not yield their full query potential until organized in a firm data structure. That was the central insight of Mommsen's great 19th century project of gathering and publishing inscriptions across the entire Roman empire, which revolutionized the study of Roman administrative history. Digitization adds powerful affordances to print publication." (Tools & Content Partners working group, Analyzing Scholarly Narratives, Martin Mueller, 3/27/09)

"[Scholarly narrative] 0043 is a very interesting narrative about digitization of personal research collections. Somewhere in Minnesota a professor has over a lifetime assembled an archive of ~70,000 postcards, images, and other memorabilia of American Indian culture. You don't want this stuff to be tossed or stowed away in file cabinets in some basements. Problems of this kind arise daily all over the world from sleepy local history societies to Research I libraries. Do I remember that Mellon once funded the development of mobile and easy-to-use digitization equipment that at least secures the good enough digital capture of the original stuff?" (Tools & Content Partners working group, Analyzing Scholarly Narratives, Martin Mueller, 3/27/09)

