Filtering/synthesizing

"data flows in so fast, how do you organize it?" (Ex 1, 1b-B)

"in my field drowned in data, need database models, information integration, need to develop an ontology of digital support for humanities." (Ex 1, 1b-D)

"want to see program reflect in importance for given data, so certain info given more weight, certain author given more weight, etc. need to have databases conceptually linked." (Ex 1, 1b-D)

"ITunes for my articles. Indexing, referring, recommending for scholarly articles." (Ex 1, 1b-E)

"For a humanist, what happens at scale? "You must screen the backfill." "It's only 14th century; we're going for Rome."" (Ex 1, 1c-B)

"Disc space on any given computer is being filled with digital media (not text). What does the "replacement" of text as a primary area of focus mean to humanities scholarship? What tools are available for searching on, annotating, extracting from, publishing, etc. these -- these are at a less-developed stage than tools for text." (Ex 1, 1c-C)

"Museum curators provide one way of classification, invite help from public to do tagging that the public finds relevant" (Ex 1, 1c-E)

"keeping up with the rate of proliferation of digital data" (Ex 1 group sharing, 1c)

"Read source materials: reading to build congitive foundation; reading to help answer scholarly questions" (Ex 2, 1a-A)

"Assessing relevance. This is the whole issue of sifting the wheat from the chaff." (Ex 2, 1a-C)

"Coalesce materials which can be organized and interpreted to form an argument; ultimately, judged by how many arcs or arguments or narratives you can produce." (Ex 2, 1a-C)

"coordinate the activities of many fields and filter them through [the user's] field" (Ex 2, 1a-E)

"Take objects into different contexts to take a perspective & tell a story. Re-represent after others' works to put things in a new context, to separate layers to make distinct, recombine layers in a new context" (Ex 2, 1a-E)

"Coordinate the activities of many fields and filter them thru my field" (Ex 2, 1a-E)

" Rumination, which may include consideration of material/thoughts we have access to through on-line Web 2.0 tools" (Ex 2, 1a-F)

"Research is an argument. The presentation is in service of forwarding the argument. The Library can be a neutral space where the argument can take place." (Ex 2, 1b-E)

" Thinks in many places (shower, walking dog, riding horse), mostly internally rather than on paper, but it continues as she produces." (Ex 2, 1c-A)

"Sense that knowing what one's looking for allows for selectivity." (Ex 2, 1c-B)

""Well, that's the trick!" Difficult in filtering is the major problem in disseminating content. " (Ex 2, 1c-B)

"Clean up and consolidate databases and spatial analysis - in prep for interpretation." (Ex 2, 1b-F)

"Interactive process of reading (drawing on things I know, looking for parallel documents/images/things with similar handwriting)" (Ex 2, 1d-A)

"How do you decide which essays to read? I know the authors work from reading other bibliographies, authors of books, or I happen to know them. "You can no longer have a mastery of a field. The material available is inexhaustible." When you are working with international literature, much of the material isn't digital and you need language skills to access it or to employ a translator." (Ex 2, 1d-B)

"Entails looking at materials, from magazines, a batch of pubs from 20th century were categorized in ways no longer useful. Critiquing existing metadata. Look at data itself to come up with own gestalt categories. Turn to librarians for someone else's ontology so not reinventing wheel. Discover someone else's imperfect ontology. Goal is to have something machine actionable to create a corpus that lets me read at another degree. Can ask questions of the material w/ enough specifity to get answers. Hopefullly not my perspective alone." (Ex 2, 1d-D)

"a machine reads a book sentence by sentence or phrase by phase to look for similarities. What is your value as the scholar? The trained eye. It is about having training, which takes time." (Ex 2, 1d-E)

"begin with reading primary materials (a story, for example) and get an intuition. Then, begin investigating the intuition. Flesh out the intuition via traditional research-draft, share with colleagues, re-draft. A modality question-read online, read at the library." (Ex 2, 1d-E)

"Non-machine evaluations can be helpful - a selector (a person) helps you know what might be relevant in a way that simple cataloguing, subject headings, cannot." (Ex 2, 1d-F)

"wrong, waste of resources to allow tech. to drive scholarship. desire to have the tool at hand can lead to a lack of self reflection on how a tool works, what it lets us do. this problem is not new; anybody who uses a dictionary may not reflect on how the dictionary influences them" (Ex 2, 1d-F)

"translation from operating in the print world to operating in the "desktop world". complexity of the formats of the resources. understand which resources researcher is after. mindfulness of budgets of time and money. many different sources (journals, newspapers, db's) some better than others enable discovery in a complex world search engine strategy across wide variety of formats (text, visual, etc.) "federation"" (Ex 2, 1d-F)

"historical research which doesn't focus on canonical texts. moving through lots of material very quickly, need for low time investment. need subsets of data which are constrained. "what does medical practice look like from perspective of [patient, small town clinician, small child]?" how to avoid the tyrrany of the database? db's involve overhead. time to create machine-readable db often not worth it unless you have canonical texts. what do db's do well? what constraints do they impose? not sure the db is "the solution" for all kinds of problems. stitching together loosely structured collections, "metacollections" looking across broad collections when you might not even know what you're looking for "low overhead" " (Ex 2, 1d-F)

"Correlate outside materials to one's own expertise and research" (Ex 3, 1a-C)

"Coalesce materials which can be organized and interpreted to form an argument" (Ex 3, 1a-C)

"Synthesizing, crystallizing significant contribution, new, original narrative" (Ex 3, 1a-C)

"Exploration: Synthesize information; match patterns; critical analysis; create opportunities for structured serendipity; choose, select/filter/frame;" (Ex 3, 1a-E)

"Finding validated or trustworthy primary data on a research subject" (Ex 3, 1b-C)

"Engaging w/primary materials: pattern recognition, "junk filter" (i.e. filtering out the things you're not interested in, looking for differences), inductive/deductive relation of evidence and hypothesis." (Ex 3, 1b-F)

"Amazon Books making recommendations; I find the recommendations totally off base a lot of the time. "No, I didn't mean that! I meant the thing I wrote!" Has to be automated in some way, but human judgments have to be made at some point" (Ex 3, 1d-D)

"more material is going to be surfaced, summarized automatically in response to searches, user profiles, on the basis of materials that's being aggregated and mined. structured references are being leveraged into resources. it's interesting to see what google books surfaces next to a given resource. next generation will enter college accustomed to haveing information ordered for them. commercial interests around filtering, aggregating, linking, takes us away from the "hard yard" of light reading at large scale. it's not in google's interests to make everyting available" (Ex 3, 1d-F)

"to students: i can research quickly, you get bogged down in one tool or one resource. how can we train grad students to research more quickly in the digital world?" (Ex 3, 1d-F)

"difference b/w context for digital vs. physical reading. we know what it means to scan a page; we use indexes, but when we scan we're looking for important words. in digital, light reading changes." (Ex 3, 1d-F)

"Grand Narrative of Humanities Scholarship: Condensing meaning from vapor of nuance" (Ex 3, 1d-G)

"When we say sifting we are not talking about ... the idea that there is one refinement to get to. Alternative ways of sifting. One is an elaboration, a contextualization, outside of my personal space. Why building schema is such a challenge." (Ex 3, 1d-G)

"Sometimes not possible to distinguish between metadata and the analysis. Intertwined. Private knowledge needs to get out there. Model we have is empirical stuff and the analysis we impose upon it." (Ex 3, 1d-G)

"what are you making sense OF? Do you have to start with a dataset? Isn't that an assumption in and of itself? what about starting with a phenomenon? a text corpora? What about starting with an observation or an engagement? "data" is methodologically marked -- but materials isn't good either. What counts as "data", and what should the "data" count as? Recognize that the operation is recursive." (Ex 4, 1b-A)

"One of the things that technology brings a benefit in is: when you have a huge pile of material, the unassisted person can only scope one thing at a time, can only see things in the order they come. Technological assistance can help see patterns across materials." (Ex 4, 1c-A)

"Part of the research process, copying everything. Making notes on something is so often copying bits of the things that you're reading. Transforming formats for conservation involves copying things. Prepare it for students, another copy. A talk, another copy or version. Versioning. Perhaps a subset of copying." (Ex 4, 1c-B)

"What other clues do you use to assess whether things are worth pursuing? Citation. Write title and reviews, but often only Amazon. Journals on JStor, can usually make a list of 5-6 decent reviews. Interesting idea for Bamboo - hotels.com for reviews. If it's a field I don't know 100%, I go see if people I trust are in favor of it. Don't need to find it; unless you're doing an article on how something has been reviewed. Cross-check validation: you have an instinct, but you can't go with your gut all the time. You cross-check, try within reason to validate independently your judgment. If my judgment is that I need to go to California for an archive, I need to convince someone through a peer review that I'm worthy of money for that trip. Leveraging community" (Ex 4, 1c-D)

"You want to start by reading the "important stuff". How do you define important? Is the "cited a lot" algorithm enough? It might be the reverse - an inverse citation analysis? Ideally, it'd allow me to tell it what it should look for. I might be saying "ignore citation, look for frequency where things are mentioned". User-customizable and trainable so you develop your own profile. Suddenly tell it "But today, I want citation heavy!" Not new connections, necessarily, but relevant to what you want to do. Just find them, and I'll decide what's important. Citation works for some things, he's got different needs on different days. Need something that's morphable over time, software that works with you to define different metrics" (Ex 4, 1d-E)

"the development of shared annotated bibliography" (Ex 4, 1d-C)

"need taxonomies. traditionally would look to Library of Congress as the "name authority," but they're not doing that any more. (identity management). Have seen the rise of folksonomies at the same time. Library of Congress has recently started inviting the public to identify pictures. do taxonomies arise in other walks of life now? taxonomies can help clarify things. Have started to abandon taxonomic based search in favor of intelligent searching. Contextual filtering - not taxonomy based. used to look at an index to find things; now use search. an index tells you how the author thinks about their own work. how you do tags will bias how search works. would rather use a full text search rather than an index to find a subject. building taxonomies may be valuable but has a lot of overhead; would want that to be done close to the author. The value of the index varies according to what field you're in. the index or tags or annotation shouldn't be static. Should be able to be expanded over time - extensible." (Ex 4, 1d-D)

"Decision of a librarian to buy a book is a filter. tools to decide which digital resources to use (reviews) don't exist the way they do in print. bamboo could serve as a sort of gatekeeper; out of fashion but useful. Example of bryn mawr classical review inviting submissions of classical work for e-mail review" (Ex 4, 1d-E)

"Representing/Recreating/Modeling - scholarly practices and what outstanding issues are there? Create databases, or at least schemas. Could create schema for tagging set of documents. Creating ontologies. Most important aspect is that it's plural; it's not One Ontology. Create controlled vocabularies?" (Ex 4, 1d-E)

"Finding what a work influenced. Issue of similar-to; tricky computer task of figuring out when you have the same thing. Especially if they've used an OCR engine; digital identicality is zero. "Recognizing duplicates". The same article published in two different places. Undesired material - sounds like censorship. Maybe "set aside", not "weed out - you might want it later. It's a question of your information management tools. Spam filters - you tell it what's spam, and over time it tries to learn. Is there any possibility like that? You can imagine the same type of tool > "this is interesting, this isn't." Spam filters are just the negative version of recommender systems. Recognizing that what you have is part of something else" (Ex 4, 1d-E)

"You want to start by reading the "important stuff". How do you define important? Is the "cited a lot" algorithm enough? It might be the reverse - an inverse citation analysis? Ideally, it'd allow me to tell it what it should look for. I might be saying "ignore citation, look for frequency where things are mentioned". User-customizable and trainable so you develop your own profile. Suddenly tell it "But today, I want citation heavy!" Not new connections, necessarily, but relevant to what you want to do. Just find them, and I'll decide what's important. Citation works for some things, he's got different needs on different days. Need something that's morphable over time, software that works with you to define different metrics" (Ex 4, 1d-E)

"In a full text world, you should be able to do better than that. Large area of association making by software routines allows people to look in different terms. What you want is to make associations in your search of secondary literature that haven't been made before" (Ex 4, 1d-E)

"Internet is no longer usable after a certain point for my research. Can get some kind of bibliography, but then you have to read articles/footnotes, constantly go back to originals, which aren't available on-line. These are temporary measures at this time, a first step. But what if everything has been digitized? What tools will you need for that? At that point, have to refine what you're doing, ask more interesting questions. Interesting if you develop techniques where you can sense that there's a gap of stuff that needs to be brought on-line sooner rather than later. Start down these roads learning new ways to do it; leads to machine intuition, etc." (Ex 4, 1d-E)

"Tools to decide which digital resources to use (reviews) don't exist the way they do in print. Bamboo could serve as a sort of gatekeeper; out of fashion but useful." (Ex 4, 1d-F)

"Hermeutic circle - susceptible w/ technology. We have certain tools that give us what we want to do but there's stuff that doesn't fit into those categories. Easy to get seduced into gathering data, instrumenting things because we can. Sifting, elaborating, annealing. How do you sift a piece of art? Done twice, on the level of the item and then on a larger scale. What kind of data are we interested in? Sifting happens over and over again. This painting is an excellent example of "blah" but it's different. We come up w/ an imperfect description, this is interpretation. Something is different, unique in here. Always reassess. Grinding it down, refining it, elaborating. That's where the audience comes in. That's an ideal process, but not sure that always occurs. Start w/ Dublin core, add some fields, and use it. Find out later it doesn't work, but you don't fix it, that's what we've got. Sifting happens too quickly. Groping toward tools, the sifter. Shouldn't be so much trouble to go back. Major shift in last 5 yrs in these paintings. Whole body of contact rock art - depictions of guns, dresses, hats, pipes. Huge interest now in this, so there's a resifting that's going on." (Ex 4, 1d-G)

"Digital technology shapes practices, changes content" (Ex 1, 1d-G)

"we would like to hang on to everything for posterity, but that sticks posterity with the problem of picking out what's valuable. letting a thousand flowers bloom vs. existence of evaluative criteria. so in the digital world there ought to be clear ways of evaluating and sharing evaluations" (Ex 5, 1b-B)

"Smashing = questioning extant structures; metaphor that is instantiated in practice: smash a plaster cast (art piece) and from the rubble make a new piece of art" (Ex 5, 1b-C)

"In my time, could master material. Now, would not be able. Student: interesting challenge, not bothered by fact that she would be unable to conquer topic, satisfied with a chunk of it. Fun is in the quest." (Ex 6a, 1c-A)

"Issues of metadata; access to object itself, but in many multimedia you have access in effect through searching through the metadata" (Ex 5, 1c-D)

"One way in which the field becomes boundless is in connections to everything else. Is that part of the process? Yes, something about getting the failure you'd expect." (Ex 6a, 1c-A)

"On-line museum collections; now detailed photography, etc, but is it the same thing? Someone's filtered it already - new generation doesn't realize that vase photo doesn't equal vase. Not that researchers don't use libraries, just want to use libraries in different ways; Different generations use it in different ways; different strategies; librarians need to change practices-People get frustrated when they can't get right to the stuff. Librarians try to add metadata, and it doesn't help (Ex 6a, 1c-C)

"Grad students expect digital surrogates of print forms provide enhanced access (allow them to do more). Is added value seen as coming along with the surrogates themselves, or that tools can be brought to bear on the surrogates? -Will future scholars see tools and bodies of material as closely linked? Tools can be brought to bear on them, but the surrogates have to be structured in a particular way -Grad students don't care how it happens. If it's digital, there's an assumption we can read it anytime, anywhere. Internet Archive, we've submitted things including an enormous pdf file we can't figure out how to do anything with -An enormous pdf file isn't helpful, but having tools to dissect and analyze it, give enhanced access would make it useful. You end up reading/understanding/remembering it differently in digital format. They want to do different things, take scholarship in a different direction. Pushing the boundaries. They're looking to make their mark, have their work build upon others', hope that technology will help them do that, and if technology doesn't, they don't care so much. I'm a graduate student, I don't know what others are finding, but I value the ability to do non-traditional projects in a digital medium, exploring digital media for substantial scholarly use" (Ex 6a, 1d-D)

"On-line museum collections; now detailed photography, etc, but is it the same thing? Someone's filtered it already - new generation doesn't realize that vase photo doesn't equal vase. Not that researchers don't use libraries, just want to use libraries in different ways; Different generations use it in different ways; different strategies; librarians need to change practices-People get frustrated when they can't get right to the stuff. Librarians try to add metadata, and it doesn't help (Ex 6a, 1c-C)

"Mine data easily in ways that will get interesting results" (Ex 6b, 1a-F)

"Going to library to input text from primary materials. It's an interpretive act itself." (Ex 6b, 1a-G)

"Tracking/finding reports, inferring content/category" (Ex 6b, 1a-G)

"When I'm a researcher, I'm always worried that my research findings are biased. Technology increases this problem." (Ex 6b, 1b-D)

"harness the capacities of computers to archive and code the work and efforts since the 19th century. Moving toward a grand challenge/effort, which bring about, for example, new forms of motion pictures illiteracies." (Ex 6b, 1d-B)

"Thought Ark is a community-oriented collaborative research, social networking and publishing space for learning and discovery in the field of humanities (but also beyond it). It aims to use the "native" digital strategies of undergraduate and graduate students to support their research and learning activities. Thought Ark observes how students and their instructors retrieve, store, and share with other students the results of their bibliographic searches and uses these patterns to create value criteria for sorting and hierarchically ordering bibliographic sources. Sources that are searched the most (and especially those utilized by professors) are evaluated higher and considered to be more relevant. The platform includes a set of interconnected bibliographic search, social networking, and paper writing interfaces. These allow instructors or students to retrieve information from online databases, share customized lists of citations with each other and most importantly o judge the popularity and quality of the citations based on the other users' patterns of use." (Tools & Content Partners working group, Thought Ark Demonstrator, Sorin Matei, 12/8/08)

"Notification services - if you find a resource you're interested in, and you want to know when there's new material, you could be notified. If you're interested in a topic (WWI) - could you get that from a finding aid system as well as the sheet music systems, and other systems. Could be very useful to researchers, could be facilitated by Bamboo" (W3, Perspectives: Content, Stacy Kowalczyk, Digital Library Program, Indiana University)

"Named entity analysis finds passages that refer to Plato the philosopher, filtering out those passages that refer to other figures of the same name (e.g., the Athenian Comic poet named Plato)." (SN-0033 ePhilology and Memographies, Greg Crane)

"Customization and personalization services then provide individual analysts with relevant materials in languages that they understand as well as machine translation and interactive translation support services to help them with languages in which they have little or no fluency. Thus, the system might present scholars of Islamic thought with translations of Plato and translation support geared to their particular knowledge of Greek." (SN-0033 ePhilology and Memographies, Greg Crane)

"Characteristic of memography: Scale: A project becomes a memography as its scope brings in more primary materials than a single human author can effectively analyze. Topics so vast that authors in print culture needed to focus their work on synthesizing specialized studies and could base their work primarily upon the primary sources would be subjects for memographies. The author must depend upon techniques such as sampling and automated analyses. A memography of George Washington would, for example, require, as one foundational dataset, the relative frequency of references to George Washington in multiple periods, genres, languages and cultural contexts. Such figures would require automated named entity analysis applied to very large collections. The memography would include a human author's assessment of the accuracy of the automatically generated data." (SN-0033 ePhilology and Memographies, Greg Crane)

"Transcription captures the keystrokes. Page layout analysis captures the logical structures implicit in the page. These logical structures include not only header, footnote, chapter title, encyclopedia/index/lexicon entry etc., but more scholarly forms such as commentary and textual notes. All disciplines have used tables to represent structured data and we need much better tools with which to convert tabular data into semantically analyzed machine actionable data. Much of the work in the Mellon funded Cybereditions Project will focus on this stage of the workflow, focusing on the problem of mining highly accurate data from OCR output of scholarly editions in Greek and Latin." (SN-0047 Services for eClassics, Gregory Crane)

"Morphological analysis takes an inflected form (e.g, fecit) and identifies its possible morphological analyses (e.g., 3rd sg perfect indicative active) and dictionary entries (e.g., Latin facio, "to do, make"). David Packard developed the first morphological analyzer for classical Greek, Morph, over a generation ago. Gregory Crane began the initial work on what would become the core morphological analyzer for Greek and Latin in Perseus in 1984. Neel Smith and Joshua Kosman, then graduate students at Berkeley, extended this work and created a library of subroutines that remain part of the current code base for Morpheus. Morpheus is written in C, has been compiled on a range of Unix systems over the course of more than twenty years, and contains extensive databases of Greek and Latin inflections and stems. Of all the classics specific services with which we are familar, Morpheus is the most mature and well developed. The goal has long been to create an open source version of Morpheus. Desiderata include new documentation, modern XML formats for the stems and endings and a distributed environment whereby users can add new stems and endings." (SN-0047 Services for eClassics, Gregory Crane)

"Metrical analysis both discovers and analyzes the underlying metrical forms of digital texts. Metrical analysis provides information about vowel quantity that can improve performance of morphological, syntactic and named entity analysis. Metrical analysis is particularly important for areas such as post-classical Latin, which have very large bodies of poetic materials that will never receive the manual analysis applied to Homer, the Athenian Dramatists, Vergil and other canonical authors." (SN-0047 Services for eClassics, Gregory Crane)

"Quotation identification can recognize where one text quotes - either precisely or with small modifications - another even when there is no explicit machine actionable citation information: e.g., it can recognize "arma virumque cano" as a quotation from the first line of the Aeneid. The fundamental problem is analogous to plagiarism detection. Support from the Mellon-funded Classics in the Million Book Library study allowed us to begin work on exploring quotation identification techniques." (SN-0047 Services for eClassics, Gregory Crane)

"Translation identification builds on both CLIR and quotation identification to identify translations, primary but not exclusively, of Greek and Latin texts that are on-line in large digital collections. These translations may be of entire works or of small excerpts." (SN-0047 Services for eClassics, Gregory Crane)

"Text alignment services most commonly align translations with their source texts and are components of word sense disambiguation systems. Text alignment, however, serves also to create human readable links between source texts and translations that do not have machine actionable book/chapter/section/verse or other citation markers or between source texts that are tagged with different citation schemes. Text alignment is one of the priorities of the Mellon-funded Cybereditions Project at Tufts University." (SN-0047 Services for eClassics, Gregory Crane)

"Version analysis services can collate transcriptions of manuscript sources or of different printed editions of the same work. Such services allow readers to identify which versions of a work are closest to one another, which differences are most influential, and, on a smaller scale, how the text in one passage varies in multiple editions. Version analysis can also be used for automated error correction: when two versions of a text differ and one version contains a word that does not generate a valid Greek and Latin morphological analysis, we flag that word as a possible error and associate the parseable word from the other text with it as a possible correction." (SN-0047 Services for eClassics, Gregory Crane)

"The resolution of many pertinent historical questions lies in the identification and cross-correlation of key historical figures across a range of literature. EALCProf2 is interested in using the texts as a historical source, and works with CSProf2 and CS-Prof3 to apply their information-extraction technology to pull out basic historical assertions from the corpus, as well as from other related corpora including those containing the writings of diverse ethnic groups of Silk Route travelers in Central Asia from the same period." (SN-0051 Tibetan Buddhist Literature Scenario, from the Bamboo Planning Project proposal)

"Researcher A wants to compare several performances of Johannes Brahms's Piano Concerto No. 2 in B-flat Major. He turns to his computer, opens a music search tool, and types "brahms" in the composer field and "concerto" in the title field. Scanning the search results, he sees the work he wants and clicks on it, generating a list of all available recordings and scores of that work. He selects recordings of three performances, along with an encoded version of the score, and creates bookmarks for each of them. He instructs the system to synchronize each recording with the score, then uses a set of controls that allow him to play back the piece and view the score, cycling among the three performances on the fly. To help him navigate within the piece, he creates form diagrams for each of its four movements by dividing a timeline of each movement into sections and grouping the sections into higher-level structures. He then uses the timeline to move around within the piece, comparing the performances and storing his notes as text annotations attached to individual time spans. To find a particular section he's interested in, he might play a sequence of notes on a musical instrument digital interface (MIDI) keyboard attached to his computer, prompting the system to locate the sequence in the score. When he finishes, he exports the timelines as an interactive Web page and email the page to his collaborator for comment." (SN-0054 Variations - a Tool Set for Music Research and Pedagogy, Stacy Kowalczyk)

Bamboo tags: 

Add new comment