Metadata

"database thinking: how can you fit all these different pieces into a central store, so that others can reorganize the same information in ways that help them" (Ex 1, 1b-B)

"Search across multiple digital collections. Finding aids are inconsistent. "There are finding-aids and finding-aids."" (Ex 1, 1b-C)

"want to see program reflect in importance for given data, so certain info given more weight, certain author given more weight, etc. need to have databases conceptually linked." (Ex 1, 1b-D)

"Museum curators provide one way of classification, invite help from public to do tagging that the public finds relevant" (Ex 1, 1c-E)

"there are lists [of tools, resources, and projects] in every discipline, but CL model may not work because descriptions vary. "librarians as brokers"; we need a group to curate the research apparatus. how to taxonomicaly characterize? "rosetta stone" of research." (Ex 1, 1d-C)

"Build a repository of meta-data. A place to harvest the meta-data. " (Ex 1, 1d-H)

"Faculty collect data without assistance. Work closely to integrate their dataset into standardized methods. Adding schema to the dataset." (Ex 2, 1b-E)

"Not "where do I find a parallel for this phrase", but "where has someone talked about this problem?" Much of the secondary literature isn't digital, but we also have a poverty of tools for what has been digitized. I don't yet see the path clearly for that frontier; with other things, I see the path leading to a sustainable method. With other things, when I don't have the energy to sustain it, there's a community that can pick up the slack; issue of needing translation skill. In biology, plant names have changed drastically over 150 years. Now going back and trying to develop a vocabulary that will allow vocabulary switching" (Ex 2, 1d-A)

"You say "we're marking up the primary stuff", but we're not marking up your article? There's no semantic-web type capability for articles, unless I translate X into a series of keywords? Do you think a scholar when they submit their paper would take the time to do some kind of semantic markup? Pushed to do that more by journals now, but that doesn't solve the problem. More sophisticated tools that use Google-like technologies are more likely to help than pre-coded stuff" (Ex 2, 1d-A)

"Notion of annotation and metadata and becoming formal has numerous degrees. To make it accessible and standards compliant - we've turned to librarians who have rigid standards for metadata. Can write intellectual search machines but has to get more specific. Question of data shoveling is also who is going to define?" (Ex 2, 1d-D)

"Non-machine evaluations can be helpful - a selector (a person) helps you know what might be relevant in a way that simple cataloguing, subject headings, cannot." (Ex 2, 1d-F)

"have encoded archival descriptions to help people find materials; an xml representation of "finding aids" - description of materials, along with info about what box it's in. " (Ex 2, 1d-G)

"Text drill-down: whole text, down to specific point. Need to cite down that far, at that granularity. Other media? Yes. Crane: "everything needs to be rethought" (!). Ontologies - we consider the attempts to build them, but recognize that they are problematic." (Ex 2, 1d-H)

"Problem with working across big, different data sets - different metadata, for example. Name authority is an issue: if we can't bridge different authorities, we're screwed. Build inter-source mapping tools.. We should move between command and control silos to open networks ; anxieties about information quality (faculty) and difficulty in managing collections in flux." (Ex 2, 1d-H)

"Some people were trying to build image information database. "We're looking for the 'right spelling' of Michelangelo". They should be putting an index pointer to a canonical database! Don't copy it, point to it. Basic database design principal. We're creating these references that you can then use; you put the record, and the record has both. The better library catalogs are doing that now. Controlled vocabulary being used in a flexible fashion. What tools do people need to create that kind of pointing? These are also the tools to say "this resource is about Michelangelo, however you're going to spell it." Making authority files more flexible. Connecting authoritative names and ID's to resources they should be connected to. We know how to do authority control files, but we don't know how to tie things together. Should Bamboo support creation of universal authoritative database that anyone can point to? There might be an identity you could point to, but the specific information would depend on your discipline (identifying people/place names/concepts canonically)" (Ex 3, 1d-D)

"Making formal controlled vocabularies more useful by doing text mining against primary resources. How can communities maintain and support controlled vocabulary lists? Who's in charge? Without authoritarian structure? Not authoritarian, authority control. Uncontrolled control." (Ex 3, 1d-D)

"Ideally what we would have from the process is an inductive approach to categories... allowing categories to emerge including uses of the data when we gathered the data. Emergent categorization rather than inductive categories. Taxonomies are incredibly important to not lock everything down." (Ex 3, 1d-G)

"Sometimes not possible to distinguish between metadata and the analysis. Intertwined. Private knowledge needs to get out there. Model we have is empirical stuff and the analysis we impose upon it." (Ex 3, 1d-G)

"hard time imagining a system that doesn't have as its apex or subbasement a standard set of terms" (Ex 4, 1b-B)

"global schema to which local heterogeneous schemas can be mapped. a very conventional old-fashioned database problem. is there an appropriately abstract and universal global schema that would let you map local taxonomies to a more general tax." (Ex 4, 1b-B)

"they have this problem too in ling; there is one ontology becoming more and more accepted. the task of the researcher isn't to reinvent ideology but to provide a mapping" (Ex 4, 1b-B)

"has a vague image of foraging digitally and being able to see other people's taxonomic structures transparently associated with a given item" (Ex 4, 1b-B)

"the library world has plenty of thesauri. even a small number of baseline metadata that everyone agrees on, v. broad, is v useful" (Ex 4, 1b-B)

"even these MD standards and tagging schemes themselves are products of particular communities of scholars that don't cover the waterfront and are "historically contingent products of scholarship that are ultimately ephemeral" so we shouldn't build structure on ultimately ephemeral things" (Ex 4, 1b-B)

"need taxonomies. traditionally would look to Library of Congress as the "name authority," but they're not doing that any more. (identity management). Have seen the rise of folksonomies at the same time. Library of Congress has recently started inviting the public to identify pictures. do taxonomies arise in other walks of life now? taxonomies can help clarify things. Have started to abandon taxonomic based search in favor of intelligent searching. Contextual filtering - not taxonomy based. used to look at an index to find things; now use search. an index tells you how the author thinks about their own work. how you do tags will bias how search works. would rather use a full text search rather than an index to find a subject. building taxonomies may be valuable but has a lot of overhead; would want that to be done close to the author. The value of the index varies according to what field you're in. the index or tags or annotation shouldn't be static. Should be able to be expanded over time - extensible." (Ex 4, 1d-D)

"Representing/Recreating/Modeling - scholarly practices and what outstanding issues are there? Create databases, or at least schemas. Could create schema for tagging set of documents. Creating ontologies. Most important aspect is that it's plural; it's not One Ontology. Create controlled vocabularies?" (Ex 4, 1d-E)

"Tools and services that can grow and resift. Planned growth. A built-in assumption that it's an imperfect description. Need checks to make sure you haven't left something out. Automatic classification of the '70s. How do you let a machine help you define something? Ambiguity could be developmental---evolution." (Ex 4, 1d-F)

"Extended metadata - what is that? What do people need to know about these materials? Key decision. Hard to go back and change that decision. Key moment to define your ontologies." (Ex 4, 1d-G)

"metadata story about people adding terms offensively (not deliberately)" (Ex 4, 1d-H)

"we aren't talking about changing the basic scholarly practices, we're talking about refining it and incorporating new things. convergence: we began talking about data sets, tagging, metadata, and getting better at that, getting consistency, and now we've come back to that cluster of subjects" (Ex 5, 1b-B)

"Issues of metadata; access to object itself, but in many multimedia you have access in effect through searching through the metadata" (Ex 5, 1c-D)

"tagging as a way to make exterior what humanities scholars have tended to do privately" (Ex 6a, 1b-B)

"Biggest stumbling block is data structures and metadata. I would have a data architecture that would fulfill everyone needs and populate them with robust metadata." (Ex 6b, 1b-C)

"Could we create a typology of the kinds of things that exist, that create and enable new methodologies or ways of looking at things." (Ex 6b, 1d-A)

"Harvest and render searchable large aggregated sets of metadata to enhance resource discovery. This could be a Bamboo, consortial effort." (Ex 6b, 1d-A)

"Lightweight, shared metadata standards. Like RSS, but more flexible and robust. OAI-PHM is a start, but too big a gun. Between RSS and OIA" (Ex 6b, 1d-F)

"conversion, versioning, repository archive services, metadata schema harmonizing, presentation and visualization, licensing, and citation". (Ex 7 flipcharts, 1a-C)

"user-generated contributions to the semantic web" (Ex 7, 1a-F)

"enrich metadata" (Ex 7, 1b-A)

"once heard it said, "metadata is like a toothbrush; everybody agrees we should use them but nobody wants to use anybody else's"" (Ex 7, 1d-E)

"why 18 different metadata standards: different views. mot difficult piece if himalaya digital library was heirarchical ontologies. that's where arguments happen and that's where limits must be set. certain mode of scholarship always involves prioritizing certain kinds of information which can be gotten at in varous ways. true coherence is hard to pin down. when you move into codable facts, we're talking about db's, documents and such which have meaning, meaning is set aside because meaning doesn't code easily. in data extraction from biographies, we want a system that doesn't involve judgment. factoids. place names, titles... everything has edges but when you start talking about personalities that can't be coded. if db driven, how to know what giving up? how to make it easy to pay attention to things that can't be encoded? one thing google and others are doing in figuring out similarities commonalities and ranking, is not to take AI approach of understanding, but to take context where something is identified by someone, characterizing clusters of groups mathermatically and then finding analogous clusters. parts of speech, clusters of words, n-grams, statistical hashes. find another instance where these things predminate. area of activity for us whether we're trying to communicate machine or "hand" work, ways of exposing judgment to computational proxies, ways of doing that are helpful. some kind of analysis that chunks texts and then applies analysis to distill to a group of words, some way to expose that in ways easily translated (e.g. links) modes of exchange can become trained datasets. easy case is reference works. ways of linking between discursive analyses may bias. "st augustine" is not a town in florida, ways of disambiguating. people worry about students doing reasarch google lets them do easily. as undergrad found one book on the illiad, thought had found "the book", copied down something from that book thought he was done writing about the iliad" (Ex 7, 1d-E)

"I can find large groups of good images to use in certain on-line databases - ARTstor, the Library Image Database, Gardner's Image Set. After that, I might turn to Flickr and Google Image, which, of course, are not search-able according to any standard metadata system, and which may be completely mis-identified, thus finding anything in particular is pure serendipity. In the process of putting all this together (a big presentation of 99 PPT slides in the end), I will experience certain frustrations related to image quality and metadata... I might want to show several views of one particular sculpture from the East Pediment. This sculpture is known as Figure D. He is one of the "Elgin Marbles." He is identified by some scholars as Herakles (or Hercules, if you like the anachronistic Latin form of his name), others call him Dionysos (or Dionysus), while still others might call him Theseus. We don't know for sure - a common situation with archaeological stuff. Furthermore, I might be looking for a detail of his back (which is fabulously sculpted). The metadata in ARTstor is not fine enough to search reliably for something like this (and the same goes for all the other sources). I would have to just scroll through pages and pages and pages of Parthenon sculpture images hoping to stumble upon it." (SN-0044 Preparing Lecture Materials on Architecture and Sculptural Program of the Parthenon, Ann Nicgorski)

"Canonical text services allow us to call up canonical texts by standard chapter/verse citation schemes. Christopher Blackwell and Neel Smith, working in conjunction with Harvard's Center for Hellenic Studies (CHS), have developed a general protocol for canonical text services that provides essential functions for any system that serves classicists - or any scholarly community working with canonical texts. Early modern books or MSS that defy current OCR technology can be indexed by conventional citation (e.g., this page of the Venetus A manuscript contains the following lines of the Iliad)." (SN-0047 Services for eClassics, Gregory Crane)

"Named entity identification provides semantic classification (e.g., is Salamis a place or a Greek nymph by that name) and then associates names with particular entities in the real world (e.g., if Salamis is a place, is it the Salamis near Athens, Salamis in Cyprus or some other Salamis?). We have developed a serviceable named entity identification system for English and have support from the Advancing Knowledge IMLS/NEH Digital Partnership to extend this work to documents about Greco-Roman antiquity. We expect more general named entity systems to supersede the system that we developed and we are therefore focusing our efforts on creating knowledge sources that will allow these more general systems to perform effective named entity identification on classical materials. Our work focuses on creating (1) a labeled training set, based on print indices, with place and personal names identified, (2) a multilingual list of 60,000 Greek and Latin names in Greek, Latin, English, French, German, Italian, and Spanish, and (3) contextual information, or in other words, which authors mention which people and places in which passages, extracted from the 19th century encyclopedias of biography and geography edited by William Smith." (SN-0047 Services for eClassics, Gregory Crane)

"Citation identification is a particular case of named entity identification that focuses on recognizing particular: e.g., determining whether the string "Th. 1.33" refers to book 1, chapter 33 of Thucydides, line 33 of the first Idyll of Theocritus or something else? Are numbers floating in the text such as "333" or "1.33" partial citations and, if so, what are the full citations? Primary source citations tend to be shorter and more variable in form from the bibliographic citations found in scientific publications. Perseus has, over the course of more than twenty years, extracted millions of citations from thousands of documents but the citation extractors tend to be ad hoc systems tuned for the subtly different formats by which publications represent these already brief and cryptic abbreviations. In the million book world, we need citation extractors that can recognize the underlying citation conventions of arbitrary documents and then match them to known citations on the fly (e.g, observe numerous references to Thucydides and then infer that strings such as "T. 1,33" describe Thucydides, Book 1, Chapter 33)." (SN-0047 Services for eClassics, Gregory Crane)

"Markup projection services, implicit in many of the services above, automatically associate machine actionable data from one source with the same passage in another source. Thus, an index might state that a reference to Salamis in passage A describes Salamis near Athens but that the reference in passage B is to Salamis of Cyprus. Markup projection services would associate those statements with all references to Salamis in various versions of passages A or B, including not only full scholarly editions but also quotations of those passages that appear in journal articles or monographs." (SN-0047 Services for eClassics, Gregory Crane)

"Professor Q, a professor of folklore, is collecting oral histories of local storytellers. Using a digital voice recorder, she has multiple interviews with each of her subjects. She uploads the files to the server. She has several graduate assistants transcribe the notes. These, too, are stored on the server. When the data is ready, she begins to analyze the data using the timeline tool to note the beginnings and endings of stories, the commentary that the subject provided for each story, the different phases of the subjects life, etc. Professor Q and her GAs can use the synchronization tool to coordinate the audio to the transcription. Using the annotation tools, Professor Q can add her own analysis and observations. The timelines and other products can be exported to an interactive Web page that can be sent out for peer review." (SN-0054 Variations - a Tool Set for Music Research and Pedagogy, Stacy Kowalczyk)

"The Inscriptions of Israel/Palestine tries to remedy this problem. It's relatively easy, and very worthwhile to add as much metadata as possible as part of the encoding of an inscription. For example, an inscription may be marked as written in Greek, containing a Hebrew name, in red letters. This is already more information than the print index can contain. Since the researcher who is building a digital corpus doesn't know what uses it will serve for other scholars, it is important to capture metadata of general interest as well as information that refects the researcher's own interests." (SN-0034 Finding and using inscriptions- Building a corpus, Elli Mylonas, 12/18/08)

"Many of us in the Arts would like to see a system by which we can search libraries, collections, and other arts-media assets deeply, referring to meta-data that is designed by a cross-disciplinary group of scholars. To be able to search a poetry reading by, not only the usual key words such as date of reading, who read it, who is the author, etc., but also by more "arts-oriented" keys like "rhythm, tempo, rhyme structure, length (time), emotional content, subject matter, key (music), inflection, prosody" etc. To be able to search music, lighting designs, dance compositions (labda notation?), visual arts, static arts (sculpture, paintings) in this way, and make use of these connections, would enhance performance, research, re-construction, arranging, composition, and teaching. In other words, there needs to be a system by which curators, artists, performers, composers, and all in the arts and humanities, can participate in allowing their unique collections to be connected via cross-media and cross-discipline data search and manipulations. I believe Bamboo could be the perfect consortium to develop the WAY to search these "obscure" types of meta-data within any participant collection, to develop the meta-data itself, and to create, what would be unique, sensible ACCESS to this data connection tool." (SN-0007 Arts Assets Data Base with Intense Meta-Data Referencing, Patrick Neher, 12/20/08)

"We have created the Global Performing Arts Database (www.glopad.org), a multimedia, multilingual, Web-accessible database containing digital images, texts, video clips, sound recordings, and complex media objects related to the performing arts from around the world, plus information about related pieces, productions, performers, and creators. Our partners use the GloPAD ingest system and metadata structure to directly input the digital images and descriptions of their performing arts related items. The database offers a highly sophisticated metadata schema that was created to accommodate the complexity of describing the elements of a performance. One of our pressing needs is to develop an efficient way to incorporate the images and descriptions of performing arts related material that reside in digital collections that were created outside of the GloPAD structure—for example, material from a library's website, an online museum exhibit, or a theatre company's digital archives. Many creators of these small collections would be happy to see their material available in GloPAD, in addition to their home site. Right now the only way that can be done is for someone to manually enter all of the metadata from those collections into the GloPAD system.—a huge investment of time and resources. The metadata already exists in electronic form, so it is taking a step backwards to manually re-enter that data into GloPAD. Our dream is to have the means to easily harvest and export the metadata from these smaller digital projects without having to hire a database expert to set up each such transfer. Ideally there would be a service to which we, and others, could send that data, a service that would reformat that data to allow for direct import into the GloPAD metadata schema. The Open Archives Initiative protocol for harvesting is a good step in this direction, but does not offer the non-expert a service that would convert data into the forms for use in various display systems. Several database-based content management systems (Drupal, Joomla) and digital collections systems (Omeka, Open Collection) have, or are working on, extensions for export and import of data sets. What is needed is a reliable non-commercial service for carrying out transfers of data between collections. Our goal is to make it easier for scholars to find the digital resources they need for their work. By incorporating material from these smaller sites into GloPAD, we can get closer to providing theatre and other performing art scholars with a single authoritative repository of digital resources for their research." (SN-0019 The Global Performing Arts Database No. 1, Ann Ferguson)

"Then, for another example, I might want to show several views of one particular sculpture from the East Pediment. This sculpture is known as Figure D. He is one of the "Elgin Marbles." He is identified by some scholars as Herakles (or Hercules, if you like the anachronistic Latin form of his name), others call him Dionysos (or Dionysus), while still others might call him Theseus. We don't know for sure - a common situation with archaeological stuff. Furthermore, I might be looking for a detail of his back (which is fabulously sculpted). The metadata in ARTstor is not fine enough to search reliably for something like this (and the same goes for all the other sources). I would have to just scroll through pages and pages and pages of Parthenon sculpture images hoping to stumble upon it." (SN-0044 Preparing lecture materials on architecture and sculptural program of the Parthenon, Ann Nicgorski)

"One of the Google lessons - why do markup at all? Isn't it just words that matter? Can't we reduce it to sea of words? Recent experience in MONK - good if you're interested in words but if you want subset of books you couldn't ask OCA or Google Books "I just want fiction". Not because no one's categorized things that way (OCA has MARC record fields) but no one fills those fields in. Can't ask for books about England, except by implication. All this metadata isn't there, would take the form of markup. Do we need structural markup? Does it matter paragraphs? Paragraphs, sentences, verses, lines, are meaningful units of composition - meaningful units of analysis. Being able to approach analytic l tool/process w/ aware of structural units is useful. This requires structural markup. Also useful if you're going to ask statistical questions - differentiate between core intellectual content and other stuff (table of contents, running headers, index, etc). Those words could throw off your stat results in ways that would obscure info. What about word level? Tag cloud visualizations of frequency of words in a novel, most frequent words are names of characters. So ignore proper names to get at the next level of what's going on in the book, but to ignore them you have to identify them -> tag as proper names. Other tagging takes place in POS tagging, preliminary stage to almost anything you'd want to do. Argument w/in MONK project - is interoperability desirable/achievable/TEI is the way to do it?" (W3, Perspectives: Content, John Unsworth, Dean, Graduate School of Library and Information Science, University of Illinois)

"Metadata guesser, avoiding divergence, there has to be human element in this process. Have to know about workflows and modules that may be interrupted and checked by humans. Heuristics can't replace what humans need to do. As we evolve services and tools, there's hybrids of humans and mechanical. In this community, same issue comes up. Need for human involvement in the process." (W3, Perspectives: Content, Q&A, Martin Mueller)

"The georef project description (Purdue, in the discussion this morning) is very interesting -- how far can the idea of simple "sameness" services be extended? Databases that have no actual content but provide valuable links between content that exists elsewhere. Geotags are nice and lightweight. Metadata is also, though slightly less so. Sameness services for full text, full image, full media content would be incredibly valuable. They might overcome the description problem (creating metadata) which is still crippling both for individual scholars and librarians/archivists, etc. "I have this thing, how is it like other things..." and push that information back to my data repository (or data repository cataloging tool), or back to the individual scholar." (Tools & Content Partners working group, Claire Stewart, 1/12/09 comment)

"Any improvements we make to displaying epigraphic texts online, the texts themselves, the metadata, and the indices. The creation of the indices could be automated so that they could be updated automatically when we enter new texts. Our texts could be taken and cross-walked into other platforms, such as the EAGLE database of Latin Inscriptions." (SN-0039 Inscriptions of the Isparta Archaeological Survey, Paul A. Iversen, 3/6/09)

"Our group felt that categorization of projects, tools and methods would work best if it was "folksonomic" and came from the participants. This is a useful and interesting activity, and a way to really figure out what the community thinks it's doing! Rigid categorization schemes, like the ones that are formally developed and require extra effort to change or supplement, can lead to a simplificaton of a domain. In scholarship, often it's useful to know how things are different from one another, as well as how they are similar. Funding and prioritization that are based on categories may end up ignoring significant differences among seemingly similar things. So, categorization seems useful, but identifying it as a criterion for validation seems arbitrary. A corollary to this thought is that imposed categories tend to be normative. Scholarship is about the exploration of new and different things." (Shared Services working group, Program Document Sec 4 - Discussion Draft of 9 March 2009, Elli Mylonas, 3/16/09 comment)

"Do librarians at your institution find this a credible concept for navigating and linking diverse perspectives on inter-related primary material and secondary analysis? Run of the mill librarian: if it's folksomic, they won't like it. If it's taxonomic, they'll like it, but would want to have authority over it. Librarians have an interest in being the gatekeepers for the organization of scholarly resources, which extends to digital resources in the changing library landscape. It's even possible that some librarians might react to by wondering why we can't just catalog services and make them available through the library catalog. This isn't likely to come from the Bamboo friendly librarians." (Shared Services working group, Program Document Sec 4 - Discussion Draft of 9 March 2009, Elli Mylonas, 3/16/09 comment)

Bamboo tags: 

Add new comment