Gathering/Foraging

"One strength in A&H is sheer open-endedness of it, people cross boundaries all the time. One of my fears (lib and it background)---in those fields solve problems of the past, workflows, concern that will try to solve enterprise problem, contrary to exploratory serendipitous nature of A&H." (Day 1, 1a)

"database thinking: how can you fit all these different pieces into a central store, so that others can reorganize the same information in ways that help them" (Ex 1, 1b-B)

""You can now find things that you never could have found before ... it has revolutionized scholarship" Scholars who have skills to mine that data are coming up with ideas that seem "crazy" to those who lack skills to find and mine resources that were not available a few decades ago." (Ex 1, 1b-C)

"Need for the primary material is enormous: demand increasing faster than the digitized collections can supply. Students want/expect to be able to find 1920s Irish newspapers, and 1950s Chinese census data, etc., etc., instantly available on-line." (Ex 1, 1b-C)

"Search across multiple digital collections. Finding aids are inconsistent. "There are finding-aids and finding-aids."" (Ex 1, 1b-C)

"Being able to find content is key. Being able to discover existing tools is key. Being able to learn how to use available tools is key. Develop particular kinds of interfaces that are standardized for canonical activities in A&H scholarship. Glue that cuts the learning-about-tools barrier for A&H scholars. Recognition that "standard" does limit function, but if the entry-point is easier and more sophisticated (in terms of functionality) that would be a win." (Ex 1, 1c-C)

"Hard to find tools---other people's custom tools are based on different assumptions" (Ex 1, 1c-E)

"Maybe Bamboo can help searching/understanding what's going on, but focused on ways of research rather than discipline names" (Ex 1, 1c-E)

"good w/ indexing text, nothing satisfactory with indexing images. How to derive meaning." (Ex 1, 1d-B)

"Images, audio, data, gen Chinese culture at [large university]. Try to archive my doc video; tools so cumbersome. Video, so complex. How do you make that available so that it doesn't anticipate or impose categories that may not be intuitive." (Ex 1, 1d-B)

"Think laterally: serendipity, non-linear, unplanned, unpredictable" (Ex 2, 1a-A)

"Find source materials: primary - stories, original materials; secondary - analysis, interpretations, etc." (Ex 2, 1a-A)

"I want to cast a very wide net; This is a basic research questions. You keep working your way around databases and explore and find new content and define the space of inquiry. ; Conference on representation of older women, (reference to Slate article on Hilary aging) , How do I find materials; start with fragments, what you notice or YouTube; Could also include production and reception history." (Ex 2, 1a-B)

"Keeping abreast of current relevant scholarship. Question of how to find relevent feeds. Libraries call this current awareness function. For example, I want to get information on "air" from chemists as well as humanites scholars. You can follow a topic, say typewriters, on a metanymic level; almost every example is interdiciplinary." (Ex 2, 1a-B)

"I want to cast a very wide net; This is a basic research questions. You keep working your way around databases and explore and find new content and define the space of inquiry. ; Conference on representation of older women, (reference to Slate article on Hilary aging), How do I find materials; start with fragments, what you notice or YouTube; Could also include production and reception history." (Ex 2, 1a-B)

"Collecting primary sources. This might involve interviewing people and discovering grammars; Discovering grammar is metaphor, I'm interogating dead people. (might be different methodology for back and forth with real human); At a museum you might encounter a primary source (but has already been collected); Are different methodologies different practices? Interogating primary sources (split out?) like evidence gathering; I wouldn't collect Goya paintings, I'd go to musem and experience them;" (Ex 2, 1a-B)

"Keeping abreast of current relevant scholarship. Question of how to find relevent feeds. Libraries call this current awareness function. For example, I want to get information on "air" from chemists as well as humanites scholars. You can follow a topic, say typewriters, on a metanymic level; almost every example is interdiciplinary." (Ex 2, 1a-B)

" Browsing materials in the service of formulating an idea or project. Open-ended ended framework in the service of formulating an idea or project" (Ex 2, 1a-B)

"Searching on net. Very time consuming, but high reward when identify a new resource." (Ex 2, 1a-C)

"Discover resource; particularly one primary source to another primary source" (Ex 2, 1a-C)

"Find stuff you didn't know was there through sampling. Develop a sampling strategy: what to sample through; monitoring your research environment; scan/peruse; identify/filter/accession; create opportunities for structured serendipity" (Ex 2, 1a-F)

"Reading (or otherwise taking in ideas or materials ... Looking ... S-feeds, library lists). "Encounter other people, disciplines, materials."" (Ex 2, 1a-F)

"you don't bring a theoretical basis to a gene bank, you just search it" (Ex 2, 1b-A)

"I'm much better off now than I would have been ten years ago because of ability to find materials in WorldCat or JSTOR. "The field has been leveled considerably" already." (Ex 2, 1b-C)

"Looking at physical archives is a richer experience due to serendipity of how one browses and finds. When looking through an internet portal, one misses "incidentally related" materials." (Ex 2, 1b-C)

"search as a paradigm: best search is to find someone who has been there before" (Ex 2, 1c-C)

"Different levels of judgment working on British Library catalog (title, publication, author are background knowledge to suggest usefulness); if it's a good use of time on that item, it suggests connections to other pieces of evidence. If the first one is a waste of time, follow-up doesn't happen. 50/50 trial & error vs. background knowledge; more experience > background knowledge takes priority. Time/money risks; 20 minutes in British Library isn't bad, but if it's in California you ask much harder questions" (Ex 2, 1c-D)

"In terms of doing research, not looking systematically at a single source, but using researcher's instinct to look at a variety of things including newspapers, searching library catalogs (esp. British library), see what literature is also relevant, looking for individuals' manuscript archives (national register of archives) to see what might be there. Open to critique of missing a potential source; relies on one's own judgment of relevance" (Ex 2, 1c-D)

"Universal search is a one-stop shop, but the biggest problem is selectivity. Knowing a field - who is teaching, who is doing what research, what are the key publications, and trendspotting - is what delivers the most value to the researcher. Is it a push or pull model for access? Both. Experimental. Creating sites dedicated to specific needs (e.g. Digital South Asian Library) and working to increase their discovery and page ranking via Google. Libraries are thinking about (struggling with?) how to make their collections findable in the right set of ways. There is also "face-to-face" pushing. "We meet with scholars daily. It's a 2-way communication where they tell us about their research interests, and we talk about our collections." Scholars want "just in time information" from libraries and IT." (Ex 2, 1d-B)

"Process of discovery. Can accomplish high level of granularity online; intersection of scholarly and non-scholarly. What I can find through UVA library plus google scholar and google books is incredible " (Ex 2, 1d-F)

""trying to develop the discovery mechanisms" for scholar to know what's already been done. what other pieces of content are necessary to be useful to the scholar? struggle to have a corpus large enough. example from art history of having a large enough collection of images of, say, vases from different areas and periods. aggregation as well as context, metadata. tension between common denominator approach vs. building sophisticated tools. technologies, organizations, faculty with specific needs vs. trying to build common infrastructure for broad use." (Ex 2, 1d-F)

"have encoded archival descriptions to help people find materials; an xml representation of "finding aids" - description of materials, along with info about what box it's in. " (Ex 2, 1d-G)

"With materials online, can no longer anticipate who the users will be in library, esp. rare books. Can be overwhelming. Don't see it as our responsibility to make guides for K-12 users " (Ex 2, 1d-G)

"Finding primary texts, and secondary materials. How do I do that? Mostly online; sometimes going to library. In germany, most libraries are closed stacks. Likes open stacks in US; serendipity of finding things; similar experience online. Don't go to particular journal to find articles, but to database. " (Ex 2, 1d-G)

"Contextual information management. Annotation , or cross-reference. "Every architecture needs to allow for conflicting statements about things"." (Ex 2, 1d-H)

"Text drill-down: whole text, down to specific point. Need to cite down that far, at that granularity. Other media? Yes. Crane: "everything needs to be rethought" (!). Ontologies - we consider the attempts to build them, but recognize that they are problematic." (Ex 2, 1d-H)

"Problem with working across big, different data sets - different metadata, for example. Name authority is an issue: if we can't bridge different authorities, we're screwed. Build inter-source mapping tools.. We should move between command and control silos to open networks ; anxieties about information quality (faculty) and difficulty in managing collections in flux." (Ex 2, 1d-H)

"Finding validated or trustworthy primary data on a research subject" (Ex 3, 1b-C)

"reading, searching, seeing, posting on list serves, conversing, brainstorming" (Ex 3, 1b-D)

"discovery. everyone used the internet to discover research materials to investigate; discover other people; discover object's internal structure; identify the research question; discover questions that we couldn't ask before because we couldn't conceptualize it; discovery include search and access - search is finding a resource but access is being able to actually get it." (Ex 3, 1d-B)

"An issue for searching across languages. How do you avoid the digital world becoming an English-only world. The humanities have a lot more linguistic nationalism at work. Less than a quarter century ago, but more than in physics or biology; won't go away anytime soon. Language and national boundaries are two sides to the same coin" (Ex 3, 1d-D)

"took 3 years grad school to learn to read like a historian. tools for light reading are quite simple but don't have digital analogs. useful tool is a pdf reader called Skim (macOS) which displays highlighted text next to the current page. simple concept of laying different pieces of a text next to each other very useful, not available with adobe's tools. "digital equivalent of the post-it note"" (Ex 3, 1d-F)

"what changes between digital and physical worlds? search algorithms change. how to make full use of google books?" (Ex 3, 1d-F)

"foraging produces research that doesn't fall into already-researched paradigms, so that can be the object of the foraging" (Ex 4, 1b-B)

"What other clues do you use to assess whether things are worth pursuing? Citation. Write title and reviews, but often only Amazon. Journals on JStor, can usually make a list of 5-6 decent reviews. Interesting idea for Bamboo - hotels.com for reviews. If it's a field I don't know 100%, I go see if people I trust are in favor of it. Don't need to find it; unless you're doing an article on how something has been reviewed. Cross-check validation: you have an instinct, but you can't go with your gut all the time. You cross-check, try within reason to validate independently your judgment. If my judgment is that I need to go to California for an archive, I need to convince someone through a peer review that I'm worthy of money for that trip. Leveraging community" (W1, Ex 4, 1c-D)

"Interface to the data that allows us to "go fishing" Interfaces that teach people how to search - pop up menus on the search bars to give users additional information (ex. did you want biblical or modern hebrew?)" (Ex 4, 1d-B)

"ethos: part of what we're suggesting is foraging as a typically nonvalidated research method needs to be promoted as worthwhile, and that foraging tools should be available" (Ex 4, 1b-B)

"foraging produces research that doesn't fall into already-researched paradigms, so that can be the object of the foraging" (Ex 4, 1b-B)

"he likes the verb "foraging"; it applies on different levels to the cognitive process. He produces CDs, self-sustaining projects; producing stuff with public television, etc. Foraging is artistic process and also idea-finding and -incubation. Librarian at large state school, runs institutional repository; foraging provides a way for her to find value in the weird stuff she has in her repository. Her angle on producing nontraditional materials is sustainability; she's involved in the ebook movement. Foraging as a practice guided by a specific theoretical frame? opposed to a more structured search process...? It's searching, discovering more broadly construed. There's the way you're "supposed" to do it and then there's the other stuff" (Ex 4, 1b-B)

"foraging combines discovery and serendipity; that kind of investigation can inform both linear and untraditional forms of understanding" (Ex 4, 1b-B)

"the more interdisciplinary, the more likely the need to go to multiple sources to review the work of others and to determine if the problem/question has already been solved" (Ex 4, 1d-C)

"it's all getting better but its also getting more difficult. now with full-text searching, everything is a keyword. does the super-abundance of information make things better? adds more noise to the system" (Ex 4, 1d-D)

"20 years ago, I was able to review the whole body of literature in MLA subject index. Couldn't do that now - can't say now that you've done an exhaustive search given that so much is available" (Ex 4, 1d-D)

"when doing literature searches may be biased toward electronic resources, away from print" (Ex 4, 1d-D)

"We have a lot of bibliographic tools, and they don't accomplish what I want: finding things not as structured by pre-existing description. If I want to find something relevant to my subject, if I go to a bibliography, only a certain # of possibilities for finding the object. I think of the word, and the word is in a title/abstract; or bibliography has a structure, and I figure out how it fits in. These aren't satisfactory; multilingual issues, vocab may not be the words used in the title (synonyms). Bibliography structure reveals mindset of bibliographers maybe 100 years ago" (Ex 4, 1d-E)

"Marginal notes, highlighting, post-it notes; [...] you can walk around a library, spot new things easily [...] mining footnotes, you can flip through a book looking at the bottom inch [...]taking a set of articles and looking only at the first and last paragraphs, abstracts. [...] A database doesn't lend itself well to that." (Ex 4, 1d-F)

"marginal notes, highlighting, post-it notes; you can't browse a database. you can walk around a library, spot new things easily. db doesn't lend itself well to that. "serendipity". i browse amazon. i surf amazon. you have to search. aren't there instances where the library classification scheme surprises you? taking a set of articles and looking only at the first and last paragraphs. abstracts. scanning for main points. mining footnotes, you can flip through a book looking at the bottom inch. i can skim a film, can't do that with the first couple pages on amazon. OCR loses the structure of the book. especially important in journals. clickable annotation, 'pentags' is slow. taking notes. speed is an issue in light reading. waiting is bad. this is all stuff we do with text i a physical env., how change in digital?" (Ex 4, 1d-F)

"Younger scholars are interested and expect access to primary data. We didn't have access, it's now expected. They want to trace arguments through data. They are willing to make data available even if not published. Maybe in a decade's time that these objects will be 'published' contributions." (Ex 6a, 1a-C)

"To what extent are new folk replacing library search w. Google search? "Huge". Frightening how exclusively people rely on Google. It's the first place people go (to Google). There's a reliance on electronic sources (Some young people have never seen book journals.) Students don't always understand the big investment U's pay for online journal access. To what extent are online journals a scholarly source for research? Authors are cited much more when journals are online. Not definitive. Need to disaggregate what's ref'ed for research versus teaching. In my field no one uses journals (technology field) because too old by the time it's published." (Ex 6a, 1a-D)

"Finding everything full text on the web, versus going to the library, if it's not online then it's not of interest." (Ex 6a, 1b-C)

"There's so much more material available now. I can discover elements much more easily. To say there's less mastery isn't accurate; people can find much more about relevant topics. In the 1920s one could find ever so much more *of* a topic. Easy to be confident of the completeness of your findings." (Ex 6a, 1c-A)

"One way in which the field becomes boundless is in connections to everything else. Is that part of the process? Yes, something about getting the failure you'd expect." (Ex 6a, 1c-A)

"Use of library printed text tends to be in decline. Availability of journals online means that students aren't going to the library. On other hand, not necessarily negative" (Ex 6a, 1c-B)

"Some students create self-identity around idea of looking for stuff, finding printed materials to access stuff people didn't know about before
-In some places, spaces > recreational, in others, increasing public space devoted to research rather than hanging out" (Ex 6a, 1c-C)

"Modes of discovery of content, because research depends on what you can find on the screen, relative rise of searching as discovery (vs. browsing in a structured universe of shelved books), scholars would know about something and find it, then find related things; now, people find things with greater increase in serendipity; finding stuff on the stacks was result of librarians' intentions - not serendipity. Keywords allow you to find things out of context. Context is provided by an algorithm not made a librarian, you might not even know why two things show up together. If something has subject headings AND keywords, subject headings are just another mode of access. Focus groups of OPAC's - people don't use subject headings much; people want a box where they can type a word. If they type a word and it comes up in a subject heading, they're happy" (Ex 6a, 1d-D)

"Search across data/collection silos, as if they were all one searchable body of information" (Ex 6b, 1a-F)

"Federated search" (Ex 6b, 1a-G)

"Federated/deep search across DBs" (Ex 6b, 1a-G)

"And how do you get at it in multiple contexts." (Ex 6b, 1a-G)

"I need more finding aids in physical archives -- online tools that tell you "what's there" in the physical archive" (Ex 6b, 1b-A)

"technology that can afford & enable foraging" (Ex 6b, 1b-A)

"A search engine easier than Google. Facebook for scholars." (Ex 6b, 1b-C)

"Better search engine, comprehensive." (Ex 6b, 1b-C)

"Better search engines, fuzzier logic, better metadata, precision that is precise for Humanities." (Ex 6b, 1b-C)

"Computer technologies are up to conceptual levels we are working on, so we don't have to "bow down" to technologies. Searching, for instance, according to concepts instead of text. So that computers can support scholars at the level at which we're working" (Ex 6b, 1c-A)

"magic wand: i want to find all the (chinese) texts that exist relevant to a subject, in rank order of relevance, and then all the secondary work that has made use of those texts, and then hving chosing the text i want to pay attention to i want the text to be fully annotated for dates places persons titles whatever else. that's "bamboogle"" (Ex 6b, 1d-D)

"better searching abilities across resources" (Ex 6b, 1d-E)

"Digital concierge" [...] "Expose digital assets (includes people, tools, content) across institutions [...] across disciplines, engages multi-dimensional search" (Ex 7 flipcharts, 1a-D)

"Wants a concierge. Wants it to know enough about my interests that it goes out and pre-filters. Wants to get back a summary, a digest w/opportunity to drill down w/links. Doesn't want social networking but what social networking provides." (W2, Scholarly Networks, group notes)

"We have created a digital library for the writings of Alexander von Humboldt (1769-1859). We are trying to cover his publications of 29 volumes about the Americas. We completed the fourteen English volumes, the only ones available in translation. We have a system that allows the user to search through all the volumes and continue to search. If you know of other systems that can do this, we would like to know. We are trying to develop the system in four languages, according to paragraphs. We want to recreate the range of information available on the environment at every point in Humboldt's five-year travel (Teneriffa, Venezuela, Cuba, Colombia, Ecuador, Peru, and Mexico). We could use financial support. We have had very little in the eight years to accomplish a goal that Bamboo claims to have set as a future possibility (moving between text corpora). See http://www.avhumboldt.net." (SN-0001 Providing a Multi-language Search System, Frank Baron, 1/5/09)

"One project I am embarking on now is a "distant reading" project (Moretti, Graphs, Maps and Trees, 1). I am interested in patterns of diffusion in American newspaper poetry of the late nineteenth century. It was common for newspapers to reprint poems (and other small items, such as jokes or stories) from other newspapers. I have been wondering lately if there are any geographical, chronological, or formal patterns to this dispersal. Do poems appear first in larger papers and then disperse to smaller ones? Is there an overall geographical pattern, like dispersal from east to west? Do poems on certain topics, or in cast in certain forms, gain preference? To study this, I simply locate a poem in a newspaper, then search for it in other newspapers, noting the date and location of the papers (and examining any significant textual alterations. Titles are commonly quite variable). All of this information is going into a database, with the goal of creating a geographical map-based display that will allow users to track individual poems, groups of poems, authors, topics, and newspapers of origin (what papers print frequently reprinted original poems?)" (SN-0014 Tools to Aid Search, Review and Citation of 19th Century Newspapers, Clai Rice, 1/9/09)

"Current tools include the browser, newspaper databases, and a text editor. Later I will be using a database, probably mysql, with a web interface. The online newspaper databases all have authentication procedures that frequently interrupt searching or make it more time-consuming. The ideal tool would be a search aggregator for the different databases, one that would return hits in a uniform format. Also helpful would be an onscreen OCR that would allow rapid text searching of graphic PDFs. Even if it worked only 50% of the time it would save a good deal of time overall. One way I would do this would be to adapt something like the Zotero ability to make entries from current page views. On one click it could grab and search the PDF, then after visual verification was complete, another click would cause it to store the PDF and create a bibliography entry. Then the data could be dumped into another database as needed for analysis and display." (SN-0014 Tools to Aid Search, Review and Citation of 19th Century Newspapers, Clai Rice, 1/9/09)

"Content wants to be found, we want to help make it so it could be found. Large corpus of journal material - already have some monographs in JSTOR. Addition of monographs, pamphlets, etc. More linking - finding one thing > finding more things. Navigation features - finding things in unique ways (faceted search), text enhancement with keywords, zeroing in on the content of more importance to you w/ searching. Trying to build services for discovery." (W3, Perspectives: Content, Timothy Babbitt, Chief Information Officer, JSTOR)

"Quotation identification finds direct quotations and paraphrases of passages in Plato. Cross language information retrieval extends named entity and quotation identification to multiple languages (e.g., Arabic, Chinese, Latin, English, French, German, Italian, and Russian and other languages for which major cross-lingual resources are available). Text mining identifies words and phrases that appear in conjunction with references to and quotations of Plato. These words and phrases allow us to discover common ideas associated with Plato across different genres and periods. Machine translation links similar words and phrases associated with Plato in multiple languages, identifying cross-lingual cultural units." (SN-0033 ePhilology and Memographies, Greg Crane)

"Characteristic of memographies: Heterogeneity: Memographies include not only more content than authors can review but content that assumes more categories of background knowledge than individual authors can expect to acquire. Such barriers can be language, cultural background, mathematics and any other topic. The history of mechanics could thus justify a memography because it requires not only a substantial understanding of mathematics and physics but sources produced over millennia and across Europe, North Africa and the Middle East in Greek, Latin, Arabic and every European language. Memographies thus require scalable, automated systems that can provide customized background information with which readers can examine and manually analyze any given object referenced. Thus, readers without training in Arabic but familiar with other languages and with the underlying scientific contexts can use automated morphological analyses, links to an on-line dictionary, and existing translations in languages that they do understand to pull apart Arabic source texts and determine which words are used in particular contexts to describe key concepts." (SN-0033 ePhilology and Memographies, Greg Crane)

"I can find large groups of good images to use in certain on-line databases - ARTstor, the Library Image Database, Gardner's Image Set. After that, I might turn to Flickr and Google Image, which, of course, are not search-able according to any standard metadata system, and which may be completely mis-identified, thus finding anything in particular is pure serendipity." (SN-0044 Preparing lecture materials on architecture and sculptural program of the Parthenon, Ann Nicgorski)

"In the process of putting all this together (a big presentation of 99 PPT slides in the end), I will experience certain frustrations related to image quality and metadata. E.g., I will want to show some ground plans - first of the Akropolis (or Acropolis depending on the source...) and then of the building itself. For this particular class, I don't want plans that include too much archaeological detail that will just be confusing for beginning students, so I may reject some of what I find on that pedagogical basis. I may reject other options because the images are too blurry (common problem with maps and plans) and/or not big/high enough in resolution. Ultimately, I may have to order/scan a plan myself or order one from an on-line vendor (Saskia/Universal)."
(SN-0044 Preparing lecture materials on architecture and sculptural program of the Parthenon, Ann Nicgorski)

"Scenario: A scholar is searching for information about an Italian 19th c. archaeologist who was an orientalist and Egyptologist, who is the subject of a chapter of his book. The researcher is interested in this figure because of other work that he is doing, for example, studying orientalism in a Victorian context. The scholar wants to discover recent scholarship on this archaeologist, and as a first step, to do so without moving from his desk. Research starts by searching for books and references on the internet using Google books, Worldcat, other types of searches. This research is preliminary to actually going to the university library, or to London, to search in libraries and archives there. The researcher is also identifying other scholars who have published on the same or relevant topics, and building a network of possible contacts. This first step is dependent on serendipity; the researcher finds useful sources that he wasn't looking for, and that may not have been obviously relevant at first glance. The next step is to buy books, go to the library and look up references, images, articles, and finally, to travel to more distant archives." (SN-0045 Starting a Research Project, Massimo Riva)

"The searching and gathering parts as well as the analysis contain exploratory, playful elements, that can lead to serendipitous discovery." (SN-0046 Research Methods of an Individual Scholar-2, Michael Satlow)

"Canonical text services allow us to call up canonical texts by standard chapter/verse citation schemes. Christopher Blackwell and Neel Smith, working in conjunction with Harvard's Center for Hellenic Studies (CHS), have developed a general protocol for canonical text services that provides essential functions for any system that serves classicists - or any scholarly community working with canonical texts. Early modern books or MSS that defy current OCR technology can be indexed by conventional citation (e.g., this page of the Venetus A manuscript contains the following lines of the Iliad)." (SN-0047 Services for eClassics, Gregory Crane)

"Morphological analysis takes an inflected form (e.g, fecit) and identifies its possible morphological analyses (e.g., 3rd sg perfect indicative active) and dictionary entries (e.g., Latin facio, "to do, make"). David Packard developed the first morphological analyzer for classical Greek, Morph, over a generation ago. Gregory Crane began the initial work on what would become the core morphological analyzer for Greek and Latin in Perseus in 1984. Neel Smith and Joshua Kosman, then graduate students at Berkeley, extended this work and created a library of subroutines that remain part of the current code base for Morpheus. Morpheus is written in C, has been compiled on a range of Unix systems over the course of more than twenty years, and contains extensive databases of Greek and Latin inflections and stems. Of all the classics specific services with which we are familar, Morpheus is the most mature and well developed. The goal has long been to create an open source version of Morpheus. Desiderata include new documentation, modern XML formats for the stems and endings and a distributed environment whereby users can add new stems and endings." (SN-0047 Services for eClassics, Gregory Crane)

"Syntactic analysis identifies the syntactic relationships between words in a sentence; it allows us to provide quantitative data about lexicography (e.g., which nouns are the subjects and objects of particular verbs), word usage (e.g., which verbs take dative indirect objects? Where do we have indirect discourse using the infinitive vs. a participle vs. a conjunction?), style (e.g., hyperbaton, periodic composition), and linguistics (e.g., changes from SOV to SVO word order). Even relatively coarse syntactic analysis can yield valuable results when applied to a large corpus: working with our morphological analyzer and a tiny Latin Treebank of 30,000 words with which to train a syntactic analyzer, we were able to tag 54% of the untagged words correctly, but the correct analyses provided a strong enough signal for us to detect larger lexical patterns. More robust syntactic analysis based on very large treebanks can yield accuracies of 80 and 85%. Human annotators can build upon preliminary automated analysis to create treebanks, where every word's function has been examined and accounted for. Treebanks provide not only training data for automated parsing but also explanatory data whereby readers can see the underlying structure of complex sentences - a valuable instrument to support interdisciplinary researchers from fields such as Philosophy or the History of Science who are not specialists in Latin and Greek." (SN-0047 Services for eClassics, Gregory Crane)

"Word sense discovery automatically identifies distinctive word usage in electronic corpora. Even without syntactic analysis, collocation analysis can reveal words that are closely associated (e.g., phrases such as the English "ham and eggs") and thus identify idiomatic expressions. Jeff Rydberg Cox developed collocational analysis for the Greek and Latin texts in Perseus and the results are visible as part of the on-line Greek and Latin lexica in Perseus 3.0. Access to translations aligned to the original allows us to identify distinct senses: e.g., oratio corresponds both to English "oration" but in other instances to English "prayer." At Perseus, we have been experimenting with this technique since 2005 and have begun a project, funded by the NEH Research and Development Program, to explore methods for a Dynamic Lexicon for Greek and Latin." (SN-0047 Services for eClassics, Gregory Crane)

"Translation support aims at fluent translation of full text but can provide useful results at a much earlier stage of development. Thus, word sense disambiguation, a component within machine translation, helps translate words and phrases: e.g., given an instance of the Latin word oratio, word sense disambiguation identifies when that word most likely corresponds to "oration," "prayer" or some other English word or phrase. The same service also supports semantic queries such as "list all Latin words that correspond to the English word 'prayer' in particular contexts." [cf. Gist - approximate translation - an activity defined by Bamboo's Shared Services working group]" (SN-0047 Services for eClassics, Gregory Crane)

"Cross language information retrieval (CLIR) allows users to pose a query in one language (e.g., English) and retrieve results in other languages (e.g., Arabic or Chinese). For classics, CLIR is an extremely important technology because classicists are expected to work with materials not only in Greek and Latin but, at a minimum, in English, French, German and Italian. CLIR is a mature technology where the cross language queries in some competitions perform better than the monolingual baseline systems (e.g., you get better results searching Arabic with an English query than if you searched with Arabic). Classicists should be able to type queries for secondary sources in various languages such as English, French, German or Italian." (SN-0047 Services for eClassics, Gregory Crane)

Bamboo tags: 

Add new comment