Standards for interoperability and durability

"persistence of data -- commercial services are attractive to faculty, but what if they go away? if something is open and standards based, where the data can be extracted" (Ex 1, 1b-A)

"Bamboo: to introduce a set of standards and practices, a peer-review equivalent" (Ex 1, 1b-B)

"coordinating work, finding structures in which we can come together and share similar kinds of digital resources ... Bamboo's work should feed back to projects like ArtSTOR vis-à-vis meeting needs of scholarship, accommodating institutional collections ... what commonalities and standards can allow sharing ... how can I get and use digital images of sufficient quality that they're better than analog (though they can be of sufficient quality in theory, and might actually exist, access to the best digital images isn't possible for me right now)?" (Ex 1, 1b-C)

"want to see Bamboo solve TEI issues and standardization issues." (Ex 1, 1b-D)

"Persistence of digital resources would be a great win." (Ex 1, 1b-F)

"Research communities are not usually within a technical framework. Must be open and accessible to other frameworks - while certain materials (i.e. music, film) must be in closed environments." (Ex 1, 1c-A)

"Being able to find content is key. Being able to discover existing tools is key. Being able to learn how to use available tools is key. Develop particular kinds of interfaces that are standardized for canonical activities in A&H scholarship. Glue that cuts the learning-about-tools barrier for A&H scholars. Recognition that "standard" does limit function, but if the entry-point is easier and more sophisticated (in terms of functionality) that would be a win." (Ex 1, 1c-C)

"Many organizations expressing interest in PB is very promising in that it may allow opportunity to set international standards for scholarly collaboration in this domain (digital humanities). Critical mass may be achievable, and thus evolve into an international body for pulling together institutions and nations, a hoped-for outcome." (Ex 1, 1c-C)

"Hard to find tools---other people's custom tools are based on different assumptions" (Ex 1, 1c-E)

"It's not about "preservation" it's "durability" (as a way to get away from the freeze-dry parts of the metaphor). Need more standards for e.g. context." (Ex 1, 1d-A)

"Would like to see a growing awareness of what the tools discovery process is so entrenched. But what if we find them? Old interfaces, defunct languages, different purposes.. still effectively unusable. So then what are the obstacles to re-use? Discovery alone is not enough." (Ex 1, 1d-A)

"Need to create a more generalizable template" (Ex 1, 1d-B)

"access to resources requires interoperability" (Ex 1, 1d-F)

"Bamboo as force for data abstractions, document complexity, standards" (Ex 1, 1d-G)

"Getting a common and/or standard format. It is more than the hardware and software." (Ex 1, 1d-H)

"Defining digital formats and standards. Linguistics. Holy grail is building tools. Had to start by defining data so when got to building tools, would have data in a format open, interoperable, long-lasting." (Day 2, 1a)

"Could be highest value is helping people to knit tools together" (Day 2, 1a)

"format conversion, mine website, make usable, extend. This is creative in the development of tools and techniques, but the process is actually a data/source material transformation, manipulation, munging" (Ex 2, 1a-C)

"Develop common terminology for the good of the community" (Ex 2, 1a-E)

"A google of most-referenced pieces. affinity algorithms, expert tagging, learning what is good and what is bad. Online socially-enhanced version of EndNote or RefWorks. A trusted evaluation & classification of resources. Would people tag things? What about methodological clashes? Grade the resource at the end? A guided get-started, most-used list as a first cut. Citation index for all arts & humanities journals? Old systems failed the "completeness" test. How do we blend the new/untried with the expert/high quality? High value to one might be low value to another? Avoid too much reliance on a highly-engaged small group?" (Ex 2, 1b-C)

"Faculty collect data without assistance. Work closely to integrate their dataset into standardized methods. Adding schema to the dataset." (Ex 2, 1b-E)

"The varied schema that exist within archives have to communicate with each other." (Ex 2, 1b-E)

"Must balance old habits with new ones - working at tasks to put projects in compatibility/collaboration with other existing projects." (Ex 2, 1c-B)

"Automated annotation and analysis of video. One of the best way of working out shot is int./ext. is color-analysis, via low-level "brute-force" analyses. If these can be embedded in standards, it would facilitate abilities to look for "stuff."" (Ex 2, 1c-B)

"Different ways of digitization across archives can lead to researcher having to organize things non-electronically. Easier to photocopy, maybe make notes on the computer re: context" (Ex 2, 1c-D)

"Working with faculty on what to digitize, but different scholars want to do different things with output. Scholars wanted to use text mining, requiring some flavor of TEI (so just give it to me that way!) " (Ex 2, 1d-A)

"great concern about how standardization can inhibit innovation, but came to think about how standardization will open new paths for innovation and bring new people into the discussion who cannot create their own digital tools but need them regardless; in particular, could Bamboo serve as a clearing house/ meeting place where those involved in Digital Humanities could share their work, find new collaborators, share ideas, and push for increased interoperability and expand fair use in the digital environment" (Ex 1, 1d-F)

"Identify best practices and then implement them in a tool set: gives scholars a roadmap, gives IT people a set of practices and tools to deploy/develop on behalf/ in collaboration with scholars. Finding commonalities on the "what do I want to know". Structured data as a work in of itself is the real revolution in scholarship" (Ex 3, 1d-C)

"We've done a lot of data mining with numbers, they're trying to do the same principles applied to texts, and you come up with different problems. This informs our approach, it distinguishes humanities from the sciences. We've solved the problems of how to encode numbers a long time ago, but encoding humanities material is an ongoing issue. Still no standard format for film. New versions of MPEG every year, no standard for music encoding. Difficult to build tools. A question of representation of humanities objects. Problem with text is you have to deal with synonyms/homonyms/alternative spelling. Fields like astronomy have agreed on a standard way to do things; progress will be made when fields decide "this is how we're going to do X". Semantics are understood when scientists use numbers: a count of something, a spectrum, etc. Nothing is inherent about the semantics of Humanities texts. When you're talking about texts, text is in a historical context - a particular point in time. Important to see it in a continuum of time and evolution. Has to be room for multiple interpretations. Ambiguity, which you try not to have in science" (Ex 3, 1d-D)

"Have you found issues where "if we had done this digitization today, it'd be better/more usable"? It's happened to me. Librarians say "We won't digitize again unless you give us a really good reason why; we're not going to keep doing it - this stuff is fragile." If you're going to do it, do it right." (Ex 3, 1d-D)

"Bringing it back to technology: archeology has this problem of managing a huge amount of data. And how to find from other people's excavations parallels between your work? This is v hard to do digitally because every archeologist has a different framework. So they have a huge information integration problem. So the question is how can tech help our foraging integrate data from other areas?" (Ex 4, 1b-B)

"global schema to which local heterogeneous schemas can be mapped. a very conventional old-fashioned database problem. is there an appropriately abstract and universal global schema that would let you map local taxonomies to a more general tax." (Ex 4, 1b-B)

"One may have to convert formats - standardize, via having things digitized. Some early audio/video can simply not be played and need to be digitally redone/recreated. A/V formats which go out of date need to be digitized and updated more often than print sources." (Ex 4, 1c-B)

"Highly involved with presentation and curation. Do scholars preserve? Or is it done for them because of the scale?" (Ex 4, 1c-B)

"What are the priorities in digitizing? Technology can create extra tasks which are gratuitous from the point of research. Type-setting as an example." (Ex 4, 1c-B)

"Resources may sit on shelves if in the wrong form. Requires a funding bid to be put in the right form. Scholar is required to demonstrated a scholarly need in order to transfer/change formats." (Ex 4, 1c-B)

"Relatedly, what scholars have to do to transform between models. Enriching by looking at what else is out there. People are creating GIS in my field for their archeological project. As we get more projects, we get patchwork quilts of databases that don't talk to each other" (Ex 4, 1d-E)

"can db access be done in a standardized way? valley of the shadow: 1990s presentation, not interactive particularly, outdated, could be done better now. images etc. independent of indexes. would be nice if humanists had a space where all that was preserved, could be revisited, rebuild, recreated. amazon S3 model: content-free infrastructure that humanists could fill with content." (Ex 4, 1d-F)

"who will be responsible for all ways all these media will be joined and made accessible? library might have different baskets for jpegs, tiffs, etc. library can preserve the object, who indexes for access. in SOA that would be done on a case by case basis, library not responsible. isn't that the problem? if goal is for re-use, maintaining at faculty level... objects maintained in library or such, mashups would be handled by consortia." (Ex 4, 1d-F)

"aesthetic of just-good-enough. better to do something badly than not at all" (Ex 5, 1b-B)

"Highly heterogeneous; Cannot come up with a convergence because of variances. Need a common means for incubating these different approaches." (Ex 5, 1c-A)

"A fair number of film annotation projects. If anything, there's a lot of redundancy. No convergence yet around any project" (Ex 5, 1c-D)

"Once your materials come from different sources, then the question of putting them in comparable format does become a burden. Lots of things can be digital, but what good does it do you if you have PDF's of images, then HTML from other things, and then you might as well treat them as analog" (Ex 5, 1c-D)

"Lots of issues re: different formats for recorded versions. With text, at some level you can move from one platform to another. Much harder for video. YouTube doesn't cut it for scholarly analysis of most performances" (Ex 5, 1c-D)

"Yes. Say I want to cite a passage in a PDF, want to cut & paste. In Jstore, cut & paste of PDFs is common. Two orthogonal issues: Maybe don't have $; versus access. Why doesn't this material work with this format? It's a standards issue." (Ex 6b, 1a-G)

"Subset category of digitizing: Wand would solve the problem of locking & unlocking materials. Look at PDFs: For some purposes want to unlock PDFs" (Ex 6b, 1a-G)

"I can't edit an .MOV in Adobe. Compatibility issue. Ability to easily move in & out video etc. doesn't exist" (Ex 6b, 1a-G)

"Requires standards for interoperability. Heard that the more you have standards, It hamstrings things. When you have open standards you don't have the problems w. proprietary stnds. Just having a standard doesn't solve the problem; need a community that supports it. That's where Bamboo can help. Important to not recreate standards that already exist. Some corps. Make variants because they don't like 1 thing. A Microsoft std. isn't a standard if it's proprietary. Having been thru a bunch of standard-setting processes, they're very difficult & don't always happen in the time they're needed." (Ex 6b, 1a-G)

"Solve format compatibility issues" (Ex 6b, 1a-G)

"Standards: open dev. Process, reuse of existing" (Ex 6b, 1a-G)

"tools must be translatable among roman/non-roman languages" (Ex 6b, 1b-A)

"Transparent data to separate processing from creating" (Ex 6b, 1c-C)

"Sharing of data in common formats and common software across fields and sharing earlier in projects. Example: linguists data is highly relevant for film maker." (Ex 6b, 1d-A)

"a better way to digitize original (print, photos, audio, film) materials. There is not a good way of changing from one medium to another. Born digital objects. " (Ex 6b, 1d-B)

"A technical Esperanto. That people actually speak. Breaking down technical silos that exist; exposing content so someone else can use it even if they use a different technology" (Ex 6b, 1d-C)

"Convertible, interoperable technologies for transferring technologies" (Ex 6b, 1d-E)

"Bamboo enables sharing of digitized resources and digital technologies and publications" (Ex 7, 1a-C)

"conversion, versioning, repository archive services, metadata schema harmonizing, presentation and visualization, licensing, and citation". (Ex 7 flipcharts, 1a-C)

"sustainability, preservation, maintenance, succession ('estate planning'), migration" (Ex 7, 1a-D)

"Going back to digitization, how do we establish common formats across all this digitization and getting access to all these archives without having to travel all over the planet. And how to make them all interoperable. The way you get access to archives in Spain requires a letter vouching for you and if the archivist likes the look of you, then maybe they'll let you see some things. (That could even happen in California.) It's a very different culture." (Ex 7, 1a-E)

"interoperable with existing open source things like Sakai & Kuali - take advantage of local things, leverage a lot of existing tools & services. perpetual beta. cross-domain, unbiased by disciplinary boundaries. making sustainable, preservable, maintainable, succession, migrable" (Ex 7, 1a-F)

"need schema harmonization tools. that would be nice. Right now is hard. getting people to agree on what a schema means is very hard, and may be not very meaningful. story of EAD development and requests for tons of outlier fields, etc., many of which are equivalent and parallel." (Ex 7, 1a-H)

"format converters to make it easier to apply std. Tools. also need archival and preservation formats and tools to convert data and results into those formats. Needs help in how to collect data in better formats and practices, to make it easier to manipulate." (Ex 7, 1a-H)

"Access to assets: standards based repository that can be federated; aggregation of data for discovery & re-use" (Ex 7, 1b-B)

"Can tools built to common standards coffer a compromise between commercial and open-source? Always an investment - no way to de-couple tools from standards in an effective way. True, sometimes the interface of certain tools require training that are beyond the scope of a projects. Return to a J-Stor for tools. A company like Oxygen has a better record for producing tools that are useful and sustainable, as opposed to Digital Humanitites Project. Should be careful not to be tied into particular commercial software. We haven't got a good track record for building useful open-source tools. But people have been building tools for themselves. We have to explore possibility of approaching open-source developers to make these tools more widely available and easier to use. If commercial activity stops at some point, it is important so that the base is available for open-source development" (Ex 7, 1c-B)

"Focus on how one gets tools to work together. Interrelation/integration/interoperability of all tools. Workshop has identified this as a major problem, so it must be addressed. Interoperability of not only tools, but data sets. This is desirable, but how Bamboo would do this is not clear. Must establish standards and guidelines" (Ex 7, 1c-B)

"Want to reuse technology/resources where you can, but sometimes these have to be models that engage with each other, rather than comprehensive" (Ex 7, 1c-C)

"You don't have to start w/ universal agreement on standards to have something of value. If there's some good reason for people to come together, standards will emerge" (Ex 7, 1c-C)

"should be pushing some standards for tool building. Develop the infrastructure for the humanities that lets people build durable projects. Need good level of abstractions, before we jump on protocols. Principals. Standards, but at a pretty high level of shared assumptions. That will facilitate interoperability. interoperability is the key word. Without the standards, you build projects that stand alone" (Ex 7, 1d-C)

"suggest a deliverable for bamboo... If people are providing web services according to standards, distinguish between reliable services that you can build on vs other of short term projects or things under development. EG jstor or google maps. Things you can build on. Provide a tools registry which lets you know which are "production" and how to use them. Registry itself is a service; not simply a directory. Provide formally defined inputs and outputs. A "mashup registry service" readable by human beings, but also by software" (Ex 7, 1d-C)

"different example, have a grant to fund someone with a large amount of data to allow them to move that data into an open environment. Provide funding for the transformation. Eg Perseus" (Ex 7, 1d-C)

"the durability of data often gets left out. Application designer has to account for the durability of the information. It's all about putting repositories under everything. It's about packaging in a way that's abstract enough from the technologies you're using at the moment. the abstractions have to evolve. The way the data is stored affects how the information is used. but the abstractions should evolve at a slower rate than the technology" (Ex 7, 1d-C)

"A Facebook for databases and objects, so they can "friend" each other" (Ex 7, 1d-D)

"once heard it said, "metadata is like a toothbrush; everybody agrees we should use them but nobody wants to use anybody else's"" (Ex 7, 1d-E)

"transition assistance: refactoring, retrofitting, rehosting of datasets or processes to de-silo them." (Ex 7, 1d-E)

"why 18 different metadata standards: different views. mot difficult piece if himalaya digital library was heirarchical ontologies. that's where arguments happen and that's where limits must be set. certain mode of scholarship always involves prioritizing certain kinds of information which can be gotten at in varous ways. true coherence is hard to pin down. when you move into codable facts, we're talking about db's, documents and such which have meaning, meaning is set aside because meaning doesn't code easily. in data extraction from biographies, we want a system that doesn't involve judgment. factoids. place names, titles... everything has edges but when you start talking about personalities that can't be coded. if db driven, how to know what giving up? how to make it easy to pay attention to things that can't be encoded? one thing google and others are doing in figuring out similarities commonalities and ranking, is not to take AI approach of understanding, but to take context where something is identified by someone, characterizing clusters of groups mathermatically and then finding analogous clusters. parts of speech, clusters of words, n-grams, statistical hashes. find another instance where these things predminate. area of activity for us whether we're trying to communicate machine or "hand" work, ways of exposing judgment to computational proxies, ways of doing that are helpful. some kind of analysis that chunks texts and then applies analysis to distill to a group of words, some way to expose that in ways easily translated (e.g. links) modes of exchange can become trained datasets. easy case is reference works. ways of linking between discursive analyses may bias. "st augustine" is not a town in florida, ways of disambiguating. people worry about students doing reasarch google lets them do easily. as undergrad found one book on the illiad, thought had found "the book", copied down something from that book thought he was done writing about the iliad" (Ex 7, 1d-E)

"focus on making existing tools more usable. emphasize lightweight interfaces rather than complex protocols of access" (Ex 7, 1d-E)

"I should add that in the Monk project we have solved the problem of transforming TCP texts (and similar texts) into a linguistic corpus where a single morphosyntactic tagging and lemmatization scheme has been applied to texts from the late 1400's to the early 1900's, thus making those texts fully comparable at the level of their metadata. If Katrina's experiment gives us hope that OCR texts can with acceptable human intervention be turned into good enough TEI editions, these editions can without trouble be morphosyntactically tagged and become part of a Book of English. That is a lot of interoperable content." (Tools & Content Partners working group, Fleshing out ideas for Demonstrators, Martin Mueller, 11/28/08)

"Piers Plowman Background Piers Plowman is a Middle English poem attributed to William Langland. It is written in unrhymed alliterative verse into sections called Passus. It stands next to Chaucer's Canterbury Tales as one of the great early English poems. A large number of versions exist. Professor Miceal Vaughan is an expert on the "A" version of the poem. The most popular version and the one found widely is the "B" version of the poem. Some indication of the various texts and fragments of texts can be seen at Harvard University's Piers page at Early Vaughan digital copy: About 10 years ago Professor Vaughan and a graduate student wrote an HTML version of Piers Plowman and mounted it on Vaughan's website. It was publicly available for student use and teaching. This file featured custom JavaScript and style code. With the development/change of HTML browsers, by 2006 the HTML version of Piers Plowman no longer worked in Internet Explorer. In short, to continue to use Piers Plowman in teaching, a new digital version was required. Motivation of a new digital version: Vaughan wished to re-edit the poem with new commentary, and produced both paper and a new HTML version. In effect, however, his earlier electronic version was being held hostage by the HTML presentation technology. Vaughan received a grant from the Simpson Center for the Humanities at the Unversity of Washington for support of his re-editing efforts, but the grant stipulated that he work with another faculty member at the University of Washington. He contacted Terrence Brooks of the Information School and requested technical support in his project to produce a new digital version of the Piers Plowman. Structuring a middle English poem: The first task was the development of an XML Schema for the poem. The following diagram outlines some of the complexity of the structural model of the poem. This chart illustrates that the Prologue of the poem can be composed of many individual lines and that each line may contain many words. Each word could fall into three categories: Old reading/New reading/both. Individual words could appear with a gloss term of explanation and a Middle English spelling. Any given line of the poem could also have numerous footnotes. " (SN-0031 Electronic Piers Plowman, Terrence A. Brooks, 12/19/08)

"I stumble upon "promise" here, because I remember working for a short time with a team at Indiana University back in the early nineties which had been commissioned by AT&T to work on what it was calling a "WorldBoard." (I think the term was supposed to stand in contrast with the electronic bulletin boards of the time, for those who are old enough to remember, in being "location-aware" information.) Fifteen years later and it doesn't really seem like we've made all that much progress. There is KML and there is the Dublin Core. But there is nothing like a Zotero that allows one either to write data to some sort of common database or to "browse" it. I bring up Zotero here because I find myself using it and liking it. It's not the world's greatest UI, but it offers a fair amount of flexibility for me as a particular researcher and it seems on its way to offering a way to share information with me as part of a greater collective of individuals studying humans as they move through the world. I can even imagine Zotero becoming a kind of front-end for prior Mellon Foundation funded projects like JSTOR and Project Muse." (Tools & Content Partners working group, John Laudun, 12/21/08 comment)

"It's not enough to simply discover relevant digital content -- scholars need to be able to re-purpose and manipulate digital resources (including derivatives and representations of resources), making use of repository-independent tools and being able to reference resources used via persistent URIs and in accord with bibliographic citation standards. This requires that content providers describe and make digital resources available in accord with standards that support interoperability and integration. It requires tool builders to construct tools that can ingest resources without regard to brand or location of repository and can generate output and new derivatives that persist and are themselves referenceable. Additionally, while newer tools and repositories are more sensitive to the need for integration and conformance to emerging standards, additional services and frameworks are needed to glue everything together more seamlessly, which is where Bamboo might come in. This demonstrator focuses on exploitation of digital image resources using a simple Djatoka-based Web application. Djatoka is a robust, repository-independent image manipulation tool through which users and other applications can transform and manipulate Web-addressable digital image resources and create new cropped, rotated, and re-sized derivative image resources with persistent identifiers. The illustration scenario used hints at the potential benefits to be realized when quality digital content and tools like Djatoka are integrated in support of scholarship, but it also highlights the clusimness of current methods used to integrate digital content and tools." (Tools & Content Partners working group, Djatoka-based image cropping demonstrator, 1/9/09)

"Sound archives have reached a critical point in their history marked by the simultaneous rapid deterioration of unique original materials, the development of powerful new digital technologies, and the consequent decline of analog formats and media. Motivated by these concerns, in 2005 the Indiana University Archives of Traditional Music and the Archive of World Music at Harvard University began Phase 1 of Sound Directions: Digital Preservation and Access for Global Audio Heritage - a joint technical archiving project with funding from the National Endowment for the Humanities. One major goal of the project was to test emerging standards and develop best practices for audio preservation. The project created a number of software tools that may be placed into service including the Harvard Sound Directions Toolkit - a suite of forty open-source, scriptable, command line interface tools that streamline workflow, reduce labor costs, and reduce the potential for human error in the creation of preservation metadata and in the encompassing preservation package. To aid selection for preservation, Indiana University developed the Field Audio Collection Evaluation Tool (FACET), which is a point-based, open-source software tool for ranking field collections for the level of deterioration they exhibit and the amount of risk they carry. These tools are all open source. Indiana also developed the Audio Technical Metadata Collector (ATMC) software for collecting and storing technical and digital provenance metadata. Harvard also produced Audio Object Manager for audio object metadata creation and Audio Processing XML Editor (APXE) for collection of digital provenance metadata. These tools will be released as open source later after further development. On a broad scale, audiovisual preservation is a key, but often overlooked infrastructure need. Many new digital audio and video projects in the arts and humanities have the term "archive" as part of their description, but few are relying on sustainable digital preservation practices. Without attention to the preservation of audiovisual assets, many significant and irreplaceable documents in innumerable fields will be completely lost in the next few decades. The Sound Directions project presents a model for other archives to use, but there is a critical lack of facilities and funding for the necessary transfer of analog collections across the country. Even at the Archives of Traditional Music—one of the partners in Sound Directions— at the current rate of transfer there is not enough existing personnel or funding to effectively preserve their holdings. One of the options they are exploring at Indiana University is a campus-wide or even regional facility that would support preservation transfers. They are in the midst of surveying the audio and video holdings on the entire campus to assess the scope of the needs that exist. Needs are not limited to analog source recordings, either. Most scholars and even many special collections are not equipped to handle the long-term stewardship of born-digital recordings. Another broad need is for equivalent standards and best practices in the field of video preservation to match those that now exist for audio preservation." (SN-0010 Audiovisual Preservation Issues and the Sound Directions Project, Alan Burdette, 1/9/09)

"Generally, scientists have workflow problems that often limit their abilities to effectively organize, preserve, and disseminate their data and research materials. There is also a need for better methods of retrieving, connecting, and relating data to published research accounts, especially beyond the data presented for publication. A lack of clear policies guiding the storage and preservation of data despite the requirements of funding agencies exacerbates the data problem. At the same time, standards, guidelines, and technological assistance, whether developed and implemented locally or nationally, all need to be sensitive to personal and disciplinary practices, which vary widely. The data challenges faced in the sciences may offer a prescient view of how humanities and social sciences scholars will confront their needs to preserve and make accessible increasingly complex research collections, many of which are data intensive in their right, especially in the social sciences." (SN-0043 Personal Research Collections- Data and Archival Preservation and Access in the Humanities and Social Sciences, Cecily Marcus)

"For text, there's TEI - maintains guidelines for literary and linguistic text encoding. International/interdisciplinary standards used by widely and inconsistently used by libraries, museums and publishers. But TEI call themselves guidelines for encoding and interchange. Interchange has been part of the goal from the beginning." (W3, Perspectives: Content, John Unsworth, Dean, Graduate School of Library and Information Science, University of Illinois)

"Using things in different ways in different purposes, it doesn't achieve interoperability - why not chuck it and roll your own? But it's better to not go alone, you say. ECCO texts we're trying to bring in [to MONK]. Data store includes different kinds of data. "Needless and heedless divergence" - a lot of work, but a ot of problems can be solved by supplementary conventions. TEI level 4 guidelines, if you'd spent 2 hours about what to do re recommendations of soft hyphens, a lot less grief. A lot of difference s in encoding practice are unmotivated, unnecessary, accidental. One of roles for PB is reducing accidental divergence across data collections." (W3, Perspectives: Content, John Unsworth, Dean, Graduate School of Library and Information Science, University of Illinois)

"For a sufficiently uniform format: first requirement for doing things with it. Argument - what's a sufficiently uniform level? You need to have this in the presence of examples of things, communities. TEI hasn't risen to solve problem of interchange, not an error in approaching the problem, but most resources haven't been interchanged yet. Building them in "clumps" without any "runners". Data interoperates within clumps, but not a system of clumps and runners yet." (W3, Perspectives: Content, John Unsworth, Dean, Graduate School of Library and Information Science, University of Illinois)

"It is also a moment to acknowledge the extraordinary success of the TEI Consortium at creating standards that cut across languages and millennia. There is a global and tightly knit community there, and a question that arises in the encoding of a French medieval manuscript may find its answer in a practice developed in the encoding of Buddhist manuscripts in Kyoto. To my mind, the TEI community is a remarkable example of a scholarly group that has global breadth, temporal depth, and is united in a common purpose to use technology to help with philological problems of long standing." (Tools & Content Partners working group, Analyzing Scholarly Narratives, Martin Mueller, 3/27/09)

"Moving from the very large to the very small, there is a scholarly narrative about an electronic edition Langland's Piers Plowman(0031). A nice project, but to judge from the code examples, a project that stands entirely outside the richly collaborative work of medievalists who use the TEI. That seems to me on the face of it a big mistake, and I would make a similar point about a Donne project at Texas A&M (not in the scholarly narratives). In both these cases, we have thoughtfully designed but idiosyncratic encoding schemes that almost guarantee the impossibility of using the data outside their original project." (Tools & Content Partners working group, Analyzing Scholarly Narratives, Martin Mueller, 3/27/09)

"One thing I miss in the scholarly narratives is attention to well-curated archives of texts written in English, especially texts before 1923 that are in the public domain. Interoperable documents, textual or otherwise, are highly desirable, and 'interoperability' has certainly been mentioned often in the two Bamboo conferences I attended. The ability to range quickly among many texts may well be the most important advantage of the digital medium. If you have one book and one reader, the computer adds comparatively little. If you have one researcher and a thousand texts or more, the digital medium shines. American universities have taken the lead in creating highly curated and interoperable textual archives such as the TLG, the Cuneiform Digital Library Initiative, or the Tibetan Buddhist archive with which I began this posting. Where is the sufficiently comprehensive, sufficiently well curated, and fully interoperable archive of texts in English? There are tons of digital texts here and there, but it is difficult or impossible to get them to play well with each other. From the perspective of textually oriented research, this very anecdotal review of some scholarly narratives raises some general agenda items for scholars and librarians. "Only connect," the motto from Howards End may be a good tag line. A digital edition should always be done on the basis of a standard, in practice the TEI. If goals cannot be realized within that standard, it should use the standard as far as it will and use extension beyond it, but in such a way that the edition can be 'stepped down' to be interoperable with other texts. At the level of library collections, much more attention needs to be given to making text archives interoperable. Even more importantly, libraries need to start thinking of digital 'repositories' beyond the simple model of a digital shelf from which readers will pick this or that book for reading. The digital repository needs to be rethought as a laboratory in which entire collections or subsets become objects of complex manipulations. (Tools & Content Partners working group, Analyzing Scholarly Narratives, Martin Mueller, 3/27/09)

Bamboo tags: 

Add new comment