Project Bamboo wiki: XSLT Original Proposal

This page was published on the Project Bamboo wiki at <a href="https://wiki.projectbamboo.org/display/~khc@uchicago.edu/XSLT+Original+Proposal">https://wiki.projectbamboo.org/display/~khc@uchicago.edu/XSLT+Original+P...</a>, last updated 10 November 2008. To: Kaylea Champion Demonstrator Project: Building an XSLT web service engine to transform XML-marked up bibliographic entries into HTML Kent Hooper Director of Humanities / Professor of German University of Puget Sound Tacoma, Washington Rick Peterson Chief Technology Officer Washington & Lee University Lexington, Virginia Our Story, in short [cf. Our Story, full version, later in proposal]: Professor Hooper has created a bibliographical listing of secondary literature relating to the life and work of the early 20th century German writer/artist Ernst Barlach. Management of this bibliographical listing of approximately 3700 entries grew cumbersome due to the need to create multiple sub-listings (chronological, art exhibition catalogues, dissertations, articles in periodicals, etc.) deriving from the master listing (alpha, by author). These sub-listings, and the potential to create even more sublistings easily, greatly increases the utility from the perspective of Barlach scholars. Copying and pasting by hand bibliographic entries from the master to the sub-listings became unmanageable for Hooper and clearly would not allow for easy updating over time. Using XML and XSLT and an HTML engine (currently <a href="http://cnx.org">http://cnx.org</a>), Hooper could publish his work (using Creative Commons licencing), make it available to the scholarly community, and allow the Barlach bibliography to be updated. Description, in short: This demonstrator would create a Web Service that takes standard, XML-marked-up, bibliographical entries and transforms them using XSLT into multiple HTML (and PDF if possible) based files, based on user input. Hooper will publish the resulting transformed content on <a href="http://cnx.org">http://cnx.org</a> unless, or until, Project Bamboo (through this demonstrator project?) would provide a home for digital scholarship of this nature. XSLT Web Service Demonstrator: This demonstrator project will be used to create a web service that scholars could use to easily input TEI-tagged bibliographical information which has been validated against the TEI data type description (DTD) and transform it using XSLT into a series of HTML (and, if possible, PDF and other transforms) files. These files would be the primary listing (alpha, by author) and sub-listings of the bibliographical data. Hooper can provide a list of sub-listings. Obviously it would be even better if the demonstrator could generate a query tool that would allow scholars to create their own sublistings. If time is short, these transforms may be pre-set. (A sub-listing transform example would be: "Provide a list of all articles, alpha by title of periodical in which they appear, then by chronological order of publication." Project Bamboo Activities that are represented in this demonstrator project: (based on language deriving from individual theme-pages from <a href="https://wiki.projectbamboo.org/display/BPUB/Theme+Groups">https://wiki.projectbamboo.org/display/BPUB/Theme+Groups</a>) Discover The activities, which could be listed upon request, were used to create the XML for this bibliography. We would not need further assistance with these activities in this particular demonstrator. Aggregate: The activities, which could be listed upon request, were used to create the XML for this bibliography. We would not need further assistance with these activities in this particular demonstrator. Annotate: The activities, which could be listed upon request, were used to create the XML for this bibliography. We would not need further assistance with these activities in this particular demonstrator. Consider: The activities, which could be listed upon request, were used to create the XML for this bibliography. We would not need further assistance with these activities in this particular demonstrator. Share/Publish: • Publish (a bibliographical listing) Engage: • In exchange for the Barlach community's or an an individual Barlach scholar's contribution to a scholarly project, open privileged access to the project to the contributors. • Solicit feedback from Barlach scholars on the scholarship performed. • Engaging broad community in non-automatable tasks involving disambiguation or correction of data • Engage "non-traditional" [communities], such as archives, art museums, individual art collectors, independent scholars. Preserve: • Preserve materials [i.e., individual entries gathered from a wide variety of sources listing material relevant to the bibliographical listing]. • Preserve the results of a search for materials [i.e., a permanent record of the origin of the identified titles]. • Preserve materials in a digital content management system [cnx.org]. • Preserve materials in a relational database. For all preservation activities: • Provide for notification of appropriate, interested individuals and communities when a digital collection's sustainability or accessibility is threatened • Preserve materials in conformity with broadly acceped standards • Preserve materials in such a way that they can be found when needed • Preserve materials such that appropriate access controls are in-place Interact: • Identify, join, and/or create formal and informal groups (networks) of individuals along arbitrary axes of affiliation, shared interest, affinity, and interest in scholarly collabortion • Assert affiliation with groups.... • Facilitate interchange between participants who speak different languages. • Identify and engage an appropriate publisher [cnx.org] • Find a technologist or librarian who can help realize a scholar's mbitions in digital scholarship • Expose maintenance/sustainability costs, issues, and risks involved in use of tools and digital resources ********************************************* Organizations and Groups formally engaged with the study, preservation, dissemination, and display of Barlach's works of literature or visual art or with the administration of the artist's estate: Ernst Barlach Stiftung [EB Foundation] ernst-barlach-stiftung.de Heidberg 15 D-18273 Güstrow Ernst Barlach Gesellschaft, Hamburg [EB Society] ernst-barlach.de Ernst-Barlach-Haus, Stiftung Hermann F. Reemtsma [Museum and Archive] barlach-haus.de Jenischpark Baron-Voght-Strasse 50a D-22609 Hamburg Ernst Barlach Museum, Wedel [Museum and Archive] Mühlenstrasse 1 D-22880 Wedel Germany Ernst Barlach Museum, "Altes Vatershaus" [Musem and Archive] Barlachplatz 3 D-23909 Ratzeburg Ernst Barlach Lizenzverwaltung GmbH & Co. KG [own the rights to the visual art] ernst-barlach.com Königsdamm 2 D-23909 Ratzeburg Research organization more generally concerned with literature and culture from 1750 to the present, and with an emphasis on digital scholarship Deutsches Literatur Archiv Marbach dla-marbach.de Schillerhöhe 8-10 D-71672 Marbach a. N Our Story, in full [as a first-person narrative] I, Professor Kent W. Hooper, Director of Humanities, University of Puget Sound, Tacoma, Washington, have been working for many years on Ernst Barlach: A Bibliographical Listing of Secondary Literature and had originally planned to publish it conventionally, in two-volumes. However, I was recently convinced by Rick Peterson, Chief Technology Officer, Washington and Lee University, Lexington, Virginia to publish instead the bibliographical listing online. During the summer of 2007 I traveled to Rice University to consult with Peterson, at that time Director of Academic and Research Computing. By late summer, he and I had managed to place online a beta version of the bibliographical listing. A portion of an outdated and uncorrected version is still available at <a href="http://www.petehills.com/barlach/">http://www.petehills.com/barlach/</a>. Even this early beta version required both Rick and me to familiarize ourselves with the Text Encoding Initiative, <a href="http://www.tei-c.org/index.xml">http://www.tei-c.org/index.xml</a>, which is by its own definition "a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. These guidelines are expressed as a modular, extensible XML schema, accompanied by detailed documentation, and are published under an open-source license." More specifically--after wading through the voluminous documentation associated with the TEI Guidelines (the P4 version, which has since been upgraded to a P5 version)--Rick and I immersed ourselves in Part 6, "Elements Available in All TEI Documents," and the most relevant to our project 6.10 "Bibliographic Citations and References." I have been working sporadically on this bibliographical listing since the early 1990s and had always been dissatisfied with proprietary bibliographical software programs for a variety of reasons that need not here be discussed. What both Peterson and I find extremely appealing about TEI and XML is that they are open-source, as opposed to proprietary, and are also cross-platform by definition. XML, or Extensible Markup Language, is a general-purpose specification for creating custom markup languages (for more information, cf. <a href="http://xml.silmaril.ie/">http://xml.silmaril.ie/</a> or <a href="http://www.w3.org/XML/">http://www.w3.org/XML/</a>, etc). The primary purpose of XML is to help information systems share structured data (such as a bibliographical listing), particularly via the Internet, and it is used both to encode documents and to serialize data. XML is recommended by the World Wide Web Consortium (W3C). It is a fee-free open standard. When I brought Rick into this project, or rather when he not so gently told me to get off my duff and get back to working on it, I had in Microsoft Word what amounted to a camera-ready copy of a completed 2-volume bibliographical listing and a firm agreement with a publisherbut my manuscript was about 8 years out of date, i.e., I had not listed any publications from years after the late 1990s. Converting a Microsoft Word document to XML turned out to be more difficult than either Peterson or I thought it would be. To get started, though, Peterson offered to write a number of PERL scripts (a high-level, general-purpose, interpreted, dynamic programming language which is way beyond anything I could hope to master) to convert my master listing (alphabetical by author) to XML with information tagged according to the TEI Guidelines and then to machine-generate all my various sublistings (articles appearing in periodicals, dissertations, exhibition catalogues, chronological, etc.). Through his efforts, he expected he could convert about 80% of the master listing quickly and painlessly and that I could tag the rest manually. It turns out that we were only to get what will probably turn out to be a bit over 50% of the material auto-converted and that I would need to manually tag the remaining 50%. Further complicating matters was the fact that even with a beautifully edited XML document (I am using oXygen for the Mac, an XML editor) with all material tagged according to TEI Guidelines, the document still needed transforms to get it to display properly on a website (to display in XHTML); that is, XML was designed to carry data, not to display data. Rick needed to learn about XSL transforms (Extensible Stylesheet Language Transformations=XSLT), an XML-based language transforms XML documents into "human-readable" documents. That is, if most people looked at my XML document with the TEI tagged information, it would appear like nearly gibberish. XSL transformations convert this gibberish into a beautiful webpage (an XHTML document for a web page, if I understand everything correctly). No colleagues in the Library at the University of Puget Sound are familiar with the Text Encoding Initiative (which is essentially a metadata markup language)---most librarians instead are familiar with MODS (Metadata Object Description Schema) or MARC21 MAchine-Readable Cataloging); no colleague in Technology Services is more than minimally proficient in XML or with XSL Transformations. And at the outset, nobody was going to generate PERL scripts for me. At any rate, in order even to get that beta version of the bibliographical listing on the website during summer 2007, Peterson had to be able to handle PERL scripts, and become minimally proficient in or with TEI, XML, and XSLT. He explained all these things to me, but quite frankly I decided in the late 1980s to regard myself as "a content expert" on any project involving computers. I lazily allowed Peterson to assume the programming burden, although he did warn me that I would still have to do the manual TEI tagging myself, since only I could make decisions when it came to this point. Fair enough, I figured. Rick Peterson, who is one of my oldest friends in addition to being a computer software/hardware/programming über-geek, had to put my project on hold during the fall of 2007 for a number of reasons. Most relevant to this proposal, I realized I could no longer afford to rely on the generosity and kindness of my buddy Rick to handle all the computing chores, and so I now have a better understanding than he of TEI Guidelines and tagging. In October, I attended a NITLE workshop on TEI at Wheaton College lead by Julia Flanders and Syd Bauman, from Brown University, where I was helped enormously, and I now feel comfortable enough with the XML editor (oXygen) to say I believe I can tag everything correctly and then validate against TEI standards (although I will require the further assistance of Syd Bauman to do so). I will never be able to do the XSL transforms, however. Either Peterson will have do them (he has completed simple transforms for a number of the sublistings), or we will have to request funding to hire specialists who do transforms on a daily basis. During my sabbatical, Spring 2008, I flew out to consult twice with Peterson, who beginning in the summer of 2007 assumed the position of Chief Technology Officer at Washington and Lee University, in Lexington, Virginia. I did this primarily because both he and I needed to talk ourselves through the procedures necessary to upload our bibliographical listing to Connexions (<a href="http://www.cnx.org">www.cnx.org</a>), which by its own definition is "an environment for collaboratively developing, freely sharing, and rapidly publishing scholarly content on the Web....All content is free to use and reuse under the Creative Commons 'attribution' license." cnx.org looks to be a stable place to archive digital material. In addition to being an open source repository of digital material, cnx.org is sponsored generously by the William and Flora Hewlett Foundation, Hewlett-Packard, The NSF Partnership for Innovation Program, National Instruments, Rice University and other similarly deep-pocket organizations or individuals. In short, it would have made no sense to put my listing on a server located at UPS and to try to manage the site myself or hope that someone in Technical Services would do so. How digital information is going to be stored (by libraries, for one) and then retrieved in the future by users is as important a consideration for me as actually completing the listing itself (as anyone knows who has moved from typewriter, to the first Mac, to floppy disks, to hard drives, to offsite storage locations....and then who has had to try ten years later to retrieve information from, say, floppy disks). That is, this listing is an attempt to take advantage of new thinking associated with the concept of "Web 2.0" which to quote from wikipedia.com "is a term describing changing trends in the use of World Wide Web technology and web design that aims to enhance creativity, information sharing, and collaboration among users. These concepts have led to the development and evolution of web-based communities and hosted services, such as social-networking sites, video sharing sites, wikis, blogs, and folksonomies." This change in philosophy is one I agree with wholeheartedly. Moving information from one proprietary product to another, especially if any amount of time is involved, is simply not the way to go---backward compatibility between versions of Word is mere child's play. My goal is to have the bibliographical listing (and associate sublistings) updateable and editable. In a way, it turned out to be a fantastic idea not to publish my work nearly a decade ago as two thick volumes that would merely have gathered dust on the shelves of research libraries and that would make permanent all my mistakes, omissions, etc.---even though scholars back then clamored for just such a work (and still do, I hasten to add). A seriously flawed beta version (yet for demonstration purposes quite useful version) of my online listing is available by going to <a href="http://www.cnx.org">www.cnx.org</a> and searching for "Barlach" or by going directly to <a href="http://cnx.org/content/m17178/latest/">http://cnx.org/content/m17178/latest/</a> . Peterson and I now know how to upload material to cnx.org (no small feat mind you, although the details are uninteresting except to us). What I need now to do is to finish tagging all the material according to TEI Guidelines (I have checked and corrected 60% of the listing manually (out of 3632 total, and progress on average at about 1% per working day). When I have finished this work, Peterson or someone else will run XSL transforms, and the bibliography will then be uploaded and available for public, or at least scholarly, consumption. If one googles "Barlach," even our beta version at cnx.org currently shows up on page three. In conclusion: I consider myself one of the lucky colleagues in the Humanities, because I have access to a good friend, Rick Peterson, who can handle most, if not all, of the research computing needs relevant to my bibliographical listing. Had I not collaborated with Peterson, but had projectbamboo existed, I would have • had a place to go to ask how best to proceed to put my bibliographical listing online; • been directed to the TEI where I might have been provided with templates on which to base my own tagging efforts; • been pointed to oXygen for the Mac as an XML editor and received training on how to use it; my preliminary tagging efforts could have been validated against TEI Guidelines and problems solved through the programming efforts of someone like Syd Bauman; • been put in contact with programmers who might have executed the xsl transforms allowing my material to display properly on a website; • been directed to cnx.org as a place to upload my material and then relied on the generosity of xml-savvy individuals to walk me through that uploading. • had access to a list of professionals who do xslt's for a living, who could have taken my master listing and done the transforms so that the various sublistings I want would have generated automatically, after I'd inputted new information in the master listing. Peterson and I are just a couple of steps away from completion of this project; I need to finish the rest of the TEI tagging; TEI specialists need to review my work against TEI standards; xsl transforms need to be done to generate sublistings; and then everything needs to be uploaded to cnx.org. Everything beyond the tagging is, quite frankly, beyond me and I dare say beyond what I can continue to rely on Peterson to handle. It is my hope, through this projectBamboo demonstrator project, to demonstrate to other scholars that their bibliographical entries can be easily tagged, transformed, and presented in a way that is sustainable for the long term and available for continued scholarship.