Where's the credit? Attribution in digital humanities slides

I've been at Digital Humanities 2011 since Sunday, and it's been as delightful and inspiring as always. This is the first time I've actively followed the tweet stream at a DH conference while I've been there in person, and while the extent to which it has added value has varied depending on the session, it may have been one of the most fascinating aspects of yesterday's plenary (Chad Gaffield's "Re-Imagining Scholarship in the Digital Age"). The audience was highly impressed that Gaffield walked around the stage and gave a talk, rather than reading a paper from the podium, which seems to be the common practice. His use of slides with a single interesting image, with fewer text-heavy or (as commonly seen here) screenshot-heavy slides, was also notable.

Day 101: Creative CommonsWhat struck me, though, was the lack of any credit line for most or all of the images. At least one of the charts had some burned-in metadata indicating its source, but the origin of the images remained opaque. At first, I thought he was using stock photos that he'd purchased (in which case no credit line would be necessary, though I'd still like to know something about the source). But unless the standards for stock photo companies have plummeted, I don't think that can account for all of his images. There were a couple photos-- charming, if not great on the technical level-- of students sleeping that really got me wondering.

For a talk that made reference to the importance of managing open data and copyright towards the end, I find it ironic, but not unexpected. Managing credit for slides takes work. There's no equivalent to the WordPress plugin that lets you find Creative Commons licensed images within the post-writing screen, and inserts them for you with a reasonable credit line. The easiest way to find images for a talk is to do the Wrong ThingTM and just hit up Google Images. A presentation with a wide variety of images, and no credit line for any of them (besides, possibly, what's burned into the images) generally suggests that's how the image sourcing happened, and there's no social censure.

I've seen a few different approaches to doing image sourcing right, with "right" meaning both legally and in accordance to the ideals espoused by DH-ers (including the importance of giving others credit for their work, given the essential role that credit plays in our professional lives). In the slides for her talk on DH syllabi, Lisa Spiro included the URLs on Flickr for the images she used, and put a Creative Commons Attribution license on her own slides, too. I like that approach better than an alternative I've seen (though not at this conference), where all the credit names/URLs are piled together in a tiny font on the last slide. Doing it that way loses the connection between the image and its creator, and it feels a little bit begrudging. For myself, I think I'm a bit unusual insofar as it's been a long time since my personal stash of 55,000+ Creative Commons licensed images didn't suffice.

What I'd like to see, ideally, would be some sort of presentation software plugin that-- like the WordPress plugin-- would easily let you search Creative Commons licensed and public domain images without having to switch to your browser and copy-and-paste. The likelihood with which people will adopt legal practices is a lot better if it's more convenient than the common illegal alternative. This software would insert the image into your presentation, with a small, unobtrusive credit line that pulls from the cc:attributionName property in the HTML license, or whatever the closest equivalent is in the data available from the hosting service (e.g. I don't think Flickr includes the Creative Commons properties such as attributionName, but "username/Flickr" might be the closest equivalent.) Not putting the full Flickr URL on the slide itself would reduce the amount of extra text, make it clearer who the creator is, and probably not be any less useful-- how often do people actually type out those long URLs while the slide is visible?

As a final step I'd like to see such a plugin generate a webpage that shows a thumbnail of and link to all the images that were used in the presentation, with full information (including information about, and a link to, the Creative Commons license used/public domain information). The plugin would automatically insert a link to this page on the last slide. There is, admittedly, some hand-waving necessary here about where such credit pages would be hosted, but some kind of (initially, at least, grant-funded) service for scholars seems not inconceivable. The service could also provide information about the most commonly used images, both as material for scholars studying scholarship, and for those who are less interested in browsing the wide world of CC-licensed images and would prefer to choose from images that someone else has already deemed to be of high enough quality to merit inclusion in a presentation. Since none of the images would be hosted there in any way, the number of bits needed for any page would be tiny. Of course, if such a service were to exist and then go away, the impact of the link rot on the traceability of the images used in the presentation would be non-trivial. That said, given how willing people are to use link shorteners just to get URLs to fit within the arbitrarily small character limits of Twitter, it doesn't seem like an image credit/connector service that could make reusable multimedia more accessible and better-cited should be immediately ruled out due to the possibility of link rot.

I'd like to hope that the trend towards Creative Commons licenses for scholarship will have the effect of increasing the social pressure for providing reasonable credit, as the implications of not providing credit become personal for more and more digital humanists. Having a tool that makes it easier would be a boon, though. Anyone up for applying for a grant to do something like this?


Collaborations between R1's and liberal arts colleges will not succeed

In the week since the joint session between THATCamp LAC participants and the people at THATCamp Prime, I've found myself reflecting on that conversation a number of times. As I said then (in person, and to the Twitterverse thanks to Rebecca Davis), the collaborations that work well aren’t between institutions, but between people.

When funding agencies state that they want to “fund collaborations between R1 and Liberal Arts Colleges”, there’s a temptation to immediately reach for a consortial solution. Why not have Liberal Arts Colleges join together to form a consortium that can partner with digital humanities centers at R1 institutions as a peer? The problem is, when collaboration is framed as something that happens between entities, people fall back on assumptions. It’s hard to avoid-- how do you, as someone at an R1 tasked with implementing this collaboration, sit down with a Liberal Arts College and find out what it does, what its needs are, and why it wants to collaborate with you? You can talk to representatives on the LAC end, but if your task is to collaborate with an entity as a whole, there’s always a lingering concern that those individuals don’t represent the full range of needs that you must somehow address.

During the THATCamp LAC/Prime conversation, someone on the George Mason end asked-- in honest curiosity, with no intent of malice or condescension-- what LAC’s can bring to the table in a partnership with R1’s. This provoked an understandably strong reaction from the LAC audience, who were taken aback at what was misread as an implication that collaborating with LACs is an act of charity. When it came to light that some THATCamp Prime attendees didn’t realize that a number of LACs have a computer science department, certain LAC attendees were disgruntled at what that misunderstanding suggested about how R1s see LACs. The part of the conversation where participants tried to relate to one another as representatives of a class of institutions, rather than scholar-to-scholar, was colored by a certain awkwardness that I suspect is inevitable given a framing of “LAC/R1 collaboration”.

I think the key to fulfilling funding agencies’ requests for “LAC/R1 collaboration” is to find people who have shared interests and a common goal, set requirements in place (if necessary) as safeguards to ensure the project doesn’t get sidetracked from meeting both parties’ needs, and not make a fuss over institutional affiliation. Certainly, the differences in the incentive structure of R1 institutions and LACs will shape the process, but when participants are able to focus on exchanging ideas and working together towards an outcome, the collaboration is more likely to succeed because the participants see each other as valued contributors, regardless of institutional affiliation. For successful collaboration, the shared interests and goals and desire to work together need to be genuine, rather than (primarily) grant-incentivized. A scholar from an R1 institution who is skeptical of the value of working with an LAC scholar, but needs some grant money to finish a project whose direction he has already determined is unlikely to be a good collaborator if “assigned” to an LAC scholar as part of a “R1/LAC partnership program”. Neither will an LAC scholar who feels resentful towards R1s due to previous failed attempts at collaboration, who hopes that the promise of funding will make R1 scholars do what he wants. Both of these are “worst-case scenario” stereotypes, but not without some grounding in reality.

The TAPAS Project and the Bamboo Planning Project are two recent examples of what a successful partnership between individuals from R1 schools and LACs can look like. During the TAPAS planning workshops, all the participants engaged with the problem at hand as individuals with a unique perspective on a common problem, and a genuine desire to work together to find a solution. Needs specific to LACs were identified, and often worked into the project scope. When consensus determined that some of those needs were outside a reasonable project scope, those of us with useful experience helped brainstorm other solutions that could meet the LAC faculty member’s immediate need.

During the Bamboo Planning Project, there were participants from LACs in every working group, and they actively contributed to efforts such as the Scholarly Narrative Repository. I never had the impression that LAC participants were viewed any differently than R1 participants-- everyone was working towards a common set of goals. That said, “what about pedagogy?” became an oft-repeated rallying cry, and ways of facilitating connections between scholars and making the case for digital humanities at one’s local institution were not in the final proposal. If the Planning Project were run again, with the goal of being an ideal LAC/R1 partnership, I think there would need to be some safeguards in place, perhaps in the form of a mandate that the final project plan must contain a pedagogy component.

This is the digital humanities under consideration. Consortia, centers, institutions, organizations all play an important role, providing centralized hubs for knowledge and resources, organizing gatherings, being the clout behind grant proposals, etc., but they're not the level where meaningful, sustainable collaboration happens. Successful collaboration takes place between humans with mutual interests and a common goal, and I hope upcoming attempts to foster “LAC/R1 collaboration” don’t lose sight of that.


What can you do with Project Bamboo? A 3-year history of ideas

Workshop 1bThis week marked the 3-year anniversary of Project Bamboo Workshop 1b in Chicago, which kicked off my involvement with the project that would be the defining project for the next year and a half of my life.

In its current incarnation as the Bamboo Technology Project, Project Bamboo promises to "roll out easy-to-use, highly scalable environments for digital scholarship... develop shared web services, platforms, and frameworks – underlying infrastructure – that higher education institutions can use collectively to sustain and connect research applications and collections... [and] define how e-research environments can evolve to support increasingly complex and large-scale forms of corpora scholarship across disciplines." (Project Bamboo About page, 5/17/11). If you compare the direction and scope of the Bamboo Technology Project with the future-use scenarios defined in the Bamboo Planning Project proposal [PDF] submitted to the Mellon Foundation in January 2008, there's a clear and coherent narrative:
From Bamboo Planning Project to Bamboo Technology Project

A closer look at what actually transpired during the community design process of the Bamboo Planning Project paints a different picture, best conveyed-- as so many complex ideas were during those workshops-- through a very complicated diagram. (Click to see it full size on Flickr.)
Project Bamboo: A 3-Year History of Ideas

The project started with a focus on cyberinfrastructure and tools, but the Workshop 1 series "listening tour" made it clear that these things were far from the minds of many faculty, who were more concerned about how to find people, projects, tools, and content. They wanted to demonstrate the value of digital humanities to skeptical colleagues, and fight for the legitimacy of their digital humanities projects within the tenure and review process. They wanted to engage tech-curious undergraduates so they might consider majoring in the Humanities rather than Computer Science, and create an environment where it would be safe for graduate students to reach beyond traditional printed articles to disseminate their scholarship. Having useful tools interoperate better would be nice, but how useful would it be to a lone digital humanist at a liberal arts college, without anyone to collaborate with, whose colleagues are skeptical of his methodologies and who needs to find a way to incorporate digital tools into a course on Henry James?

When developing "future directions" for Bamboo to present at Workshop 2, our continuing focus on the technology was evident in spite of our inclusion of "Social Networking" and "Education" components. Our diagrams of service stacks failed to resonate with them and, feeling excluded from the "community" doing the "design", they called for the inclusion of what was initially called the "Stories" working group, later renamed to "Scholarly Narratives". This movement to shift the focus away from tools and towards enabling digital humanities scholarship through a variety of means, technical and otherwise, became a powerful force in the project during Workshops 3-5.

I've gone through all the public notes, working group comment threads, and scholarly narratives collected during Workshops 1-5, organized them thematically, and posted them all for anyone to read. While they provide a rich view of the conversations, insights, and needs expressed during the workshops, they are the proverbial trees, and that collection of the data makes it hard to see how various themes were manifested at different points in the project. To that end, I've put together the chart above, and the brief summary of the major themes that follows.

Funding (gray)

Emerged: Workshop 1
Eliminated: Future Directions, re-emerged in Working Groups with a focus on Bamboo

Workshop 1 participants hoped Bamboo would improve the funding situation for digitizing and processing analog materials, and provide the funding necessary to motivate researchers to work together. These aspirations were not captured in any of the possible future directions, perhaps because Bamboo would first need to fund itself. The topic of funding re-emerged during the Working Groups, when the proposed "Institutional Partnerships and Support" direction morphed into the "Strategic Communications" working group, shifting its focus from exploring models of partnerships between organizations on a campus (scholars, DH centers, libraries IT units, etc.) to how to make the case for Bamboo on campus. This developed into the "Build and Sustain
the Bamboo Consortium" aspect of the Straw Implementation Proposal discussed at Workshop 3.

Advocacy (orange)

Emerged: Workshop 1
Eliminated: Workshop 2

Sometimes it seemed that Project Bamboo was a digital humanities Rorschach test-- it would solve everyone's problems, but if you asked three people what problems it would solve, you'd hear three different answers. Workshop 1 participants hoped that Bamboo would legitimize digital humanities in the eyes of tenure and review boards, reduce the power of publishers, and increase the number of venues for publishing multimedia content.

The advocacy-related themes were captured in a number of community-submitted possible future directions, and presented as a future direction at Workshop 2, "Advocacy: Publication, Academic Recognition, Intellectual Property". As the Workshop 1 participants who were focused on large-scale advocacy were poorly represented at Workshop 2, this direction was tabled indefinitely with little objection.

Transparency (yellow)

Emerged: Workshop 1
Eliminated: Bamboo Technology Proposal

There were a number of assorted topics that largely coalesce around the idea of transparency. These include:

  • Exposing existing filtering processes to scholars so they can understand the underlying assumptions
  • Increasing the uptake of metadata standards
  • Making it easier for scholars to submit their own data for archiving and reuse
  • Helping scholars promote projects they've developed
  • Developing guidelines for tool development to make them intuitive for scholars

Some of these topics were, to varying extents, incorporated into other areas of work defined during the Bamboo Planning Project. The idea of increasing the uptake of metadata standards can be found in the Direction "A body to further common protocols, standards and principles"; there were some ideas about including "guidelines for tool development to make them intuitive for scholars" as part of the Tool & Content Guide, which was incorporated into the Bamboo Atlas.

Increasing the uptake of metadata standards is implicit in the Collections Interoperability area of work in the Bamboo Technology Project, but the remainder of these ideas have little to no influence in the project.

Pedagogy (purple)

Emerged: Workshop 1
Eliminated: Bamboo Technology Proposal

The Liberal Arts College participants in the workshops did Project Bamboo a service by continually raising the same questions: "What about undergraduates? How can this be used in the classroom?" Led by two R1 institutions, the Bamboo workshops featured a lot of discussion of research contexts for using tools and content, with far less focus on pedagogy. The Liberal Arts College participants in particular wanted to see Bamboo collect and share good examples of undergraduate curricula making use of digital humanities tools and methodologies, and generally provide support for scholars who want to encourage their students to explore the digital humanities.

These concerns were reflected in each step of the Bamboo Planning Project, though not always as prominently as some participants would have liked. The Bamboo Technology Proposal does not address pedagogy.

Scholar-focused education (turquoise)

Emerged: Workshop 1
Eliminated: Bamboo Technology Proposal

Scholars wanted to be able to leverage knowledge gained through others' previous projects, show their colleagues what was possible through digital humanities tools and methodologies, and learn how to use new tools. Sometimes during the Planning Project, these themes were combined with the themes related to pedagogy, despite some participants' objections that showcasing the best digital humanities research in order to help legitimize the methodologies used was entirely different than teaching an undergraduate course using those methodologies. Scholarly Narratives and Recipes (Workflows) were major manifestations for this theme, eventually combined in the Bamboo Atlas">Bamboo Atlas. Neither the Atlas nor its constituent parts were included in the Bamboo Technology Proposal.

Community building (blue)

Emerged: Workshop 1
Eliminated: Bamboo Implementation Proposal

One of the most striking things about the Workshop 1 series was how it brought together faculty, librarians, and IT staff from the same institution, who all shared common interests but who had never met before. The impact of those newly-formed relationships was felt in the months that followed, generating interest in finding some way to continue making those kinds of connections and convening en masse to commiserate, contemplate, and tackle shared problems. Scholars also wanted to find some way to connect with faculty members at other institutions-- or even increasing the scope to include independent scholars and the public at large-- who share their interests. To some extent it was already happening on existing social networks, following in-person introductions at conferences; these meetings took place during the last dying days of MySpace, and the rise of Facebook, but before the Digital Humanities Twitter community was as robust as it is today.

The Scholarly Networking thread in the Bamboo Planning Project involved some degree of infrastructure-building-- as early as the Working Group, there was talk of developing scholar-centric "widgets" or "plug-ins" for existing scholarly networks, rather than building a new network for scholars or relying on existing social network or VRE infrastructure as-is.

Between the description of Scholarly Networking in the Bamboo Program Document and the description of the "same thing" in the Bamboo Implementation Proposal, there is a significant shift away from community and towards services and infrastructure. This is most evident in how Scholarly Networking is introduced; in the Program Document, Scholarly Networking is "The virtual place for people to discover, explore, and connect with other people and groups across the Bamboo community. The Bamboo Scholarly Network may be implemented through interconnecting existing social networking tools, including the use of plug-ins and/or widgets based on open interface standards that will allow the Scholarly Network to be easily incorporated into existing portals, virtual research environments, or other research workflow systems and tools."; in the Bamboo Implementation Proposal, it is described as follows: "In collaboration with institutions, scholarly societies, and other development projects, the Scholarly Networking area of work will create two types of software: a set of small components, which we're calling "gadgets", that will plug in to and enhance existing research environments, social platforms, and collaborative forums; and a group of new services, which will filter information from several sources (including the Atlas) and supply relevant and interesting material to the gadgets." Arguably, this change in focus marks the end of the community-centric "Scholarly Networking" theme from earlier in the project. The idea of social networks as an interface for Bamboo content persists in the Bamboo Technology Proposal, called out as a potential User Interface in the Bamboo Architecture Layers (p. 43), but this is a far cry from the "community building" theme that played a significant role prior to the Bamboo Implementation Proposal.

Humanities marketplace (light blue)

Emerged: Workshop 1
Eliminated: Workshop 2 Proposed Directions; re-emerged in the Program Document and was functionally eliminated again in the Bamboo Implementation Proposal

The theme of a "Craigslist for the Digital Humanities" was uniquely resilient. Originally mentioned in Workshop 1, it was not captured by any of the Future Directions or Working Groups, but strong interest particularly from Liberal Arts Colleges contributed to its inclusion in the Program Document. During Workshop 4, the straw polls indicating interest in various sections of the Program Document showed mixed results for the "Bamboo Exchange". Between Workshop 4 and Workshop 5, Rick Peterson of Washington & Lee-- one of the most outspoken proponents of the Exchange-- put together a demonstrator showing how such a system might work, but by that point the Bamboo Exchange had been subsumed into the broad and amorphous Bamboo Atlas.

Content, services and platforms (green)

Emerged: Bamboo Planning Project Proposal
Eliminated: Varies; some are being executed in the Bamboo Technology Proposal

The Bamboo Planning Project began with content, services and platforms, and those threads live on in the Bamboo Technology Proposal. Some phases of the Planning Project brought tools and services closer to the pedagogy, education, and community areas of work. The Tools and Content Partners Working Group aimed to work with the Scholarly Narratives Working Group to distill those narratives into repeatable workflows. The "Tool and Content Guide" in the Program Document would have allowed users "... to both publish information about, and to discover, tools and content sources that are of value for research and teaching." Still, the idea of shared services running on a service platform has persisted, essentially unchanged, from the Bamboo Planning Project Proposal. Similarly, content and tool interoperability find their roots in the Planning Project Proposal.

Of the major facets of the Bamboo Technology Proposal, Work Spaces is one of the more interesting to trace. There were relatively little discussion of workspaces as such, in the workshops. After an initial appearance in Workshop 1 (and a nod towards it in the future direction of A Social Networking Tool or Environment for the Arts and Humanities) it fell off the radar for an extended period of time. The Program Document referred to a "Bamboo Community Environment, "where Scholarly Network, Narratives, Recipes, Tools/Content Guide, Educational Materials can be found. The environment may take two general forms: (1) as a user interface that Bamboo develops and is run for the community and/or (2) by developing each of these elements as information widgets/gadgets that can be incorporated into existing Virtual Research and Collaborative Environments." While there's still a discrepancy between the Bamboo Community Environment and the Work Spaces of today's Bamboo Technology Proposal, the Community Environment in the Program Document (along with, arguably, the re-conceived non-community "Scholarly Networking" of the Bamboo Implementation Proposal) is the closest thing to a predecessor for Work Spaces in the Planning Project.

What happened to everything else?

The Mellon Foundation's merging of the Research in Information Technology Program (RIT)-- which funded the Bamboo Planning Project and was the intended recipient of the Bamboo Implementation Proposal-- with the Scholarly Communications program resulted in a thorough rewrite of the grant proposal. The "human-focused" pieces of community, pedagogy and education were eliminated. Even in the Bamboo Implementation Proposal drafts, these elements were already being sidelined through reframing in the case of Scholarly Networking, and merging everything else into an amorphous Bamboo Atlas.

For myself, a lesson learned from the Bamboo Planning Project is that in order to accomplish anything, it's better to refrain from ever talking about doing everything. Be targeted with your scope, and specific with what you plan to accomplish. While the community design process built valuable relationships and produced copious data, it also raised hopes that Bamboo would be a panacea for scholars, librarians and IT staff working in the digital humanities, leading to inevitable disillusionment with the project when it failed to deliver the impossible things it suggested it could do.

In his introductory remarks to Workshop 4, Chad Kainz said, "Over time, we'll continue to refer to [the Program Document] and look back on [it]. 'Back in 2008, we had this idea...' Hopefully in 2011 we'll look back at that. Things have changed, but maybe we can evolve that concept." Now, in 2011, looking back on the aspirations laid out in the Program Document, it's heartening to see what has changed, if slowly, and through little or no influence from Project Bamboo. There's a flourishing community of digital humanists on Twitter-- some of whom have met via that medium. DHAnswers is a forum for leveraging others' knowledge, including knowledge gained from previous projects. More and more digital humanities courses are being taught at the undergraduate level. There are active discussions currently underway regarding developing a project registry, a people-and-projects matching service, a virtual digital humanities center, and consultancy/expertise sharing. And Bamboo might be in a position to partner with these projects through a nascent Affiliate/Consortium program.

Project Bamboo began with services, platforms and infrastructure, and that's what it's building. If nothing else, the community design process shed light on the state of digital humanities, the challenges and needs of scholars, librarians and IT professionals, and ideas for a path forward. It produced a data set that could be used to make the case for funding initiatives outside the scope of the Bamboo Technology Project, but in the absence of a Bamboo panacea, the future of those threads lies in the hands of the digital humanities community.


The data for each stage in the chart was gathered from the Bamboo Planning Project wiki, with the exception of the Bamboo Planning Project Proposal (PDF here) and the Bamboo Technology Project Proposal (PDF).


Assorted April updates

It's been a busy month, albeit without one particular thing that would merit its own post. So, in summary...

Digital humanities white paper

NITLE published a white paper I co-authored with Rebecca Davis, "Divided and Conquered: How Multivarious Isolation Is Suppressing Digital Humanities Scholarship" [pdf], which may be the first publication drawing extensively on the data collected during the Bamboo Planning Project (social isolation of scholars, siloed tools, difficulties involved in finding tools and content, etc.).

Bamboo data sorting done

I spent today doing the last data sorting for the Bamboo Planning Project. When I published the data and summaries on January 1st, I hadn't had time to include the Scholarly Narratives, or data from Workshops 3 and 4. Today, I finished posting the last of the Scholarly Narratives and Workshop 4, which wraps up the Bamboo Planning Project data. I'm considering trawling through the current Bamboo Technology Wiki for additional data at some point.

Slavic linguistics wiki

I gave a talk on the Slavic linguistics wiki at the Midwest Slavic Workshop last Friday with Monica Vickers, a first-year grad student at The Ohio State University who used the wiki last quarter in an MA prep class. Slides are available on Google Docs here. One of the concerns I've heard from professors about the wiki is that it provides students with a way to get out of doing their class reading. Interestingly, Monica noted that-- while the students did try that-- the wiki was a great way to refresh your memory about an article you've already read, but was no substitute for doing the reading when it came to giving you the ability to actively participate in a class discussion.

Birchbark letters XML

Giant cuddly oversized birchbark lettersFor a few months, David Birnbaum and I have been working on a way to batch convert the birchbark letter transcriptions available online into Unicode. We've finally gotten the XML file with the PUA/Unicode correspondences right, and he's mostly done with a clever bit of XSLT to actually do the conversion.

Relatedly, in sewing news, I turned some Spoonflower fabric scraps into giant cuddly oversized birchbark letters.


In the last month, I've built a Drupal VRE-style site for a friend working on a project to analyze Facebook posts from Tunisia and the Tunisian diaspora, and I have some sketchy notes for a write-up. At work, I built a Drupal service catalog in less than two hours (if you exclude the 30+ hours spent cleaning up messy data, doing multiple imports, etc.) A write-up is about half done. I'm dabbling with a new site that uses Feeds to pull in weekly reports from major IT projects and display them in a way that's much more accessible than what we currently provide.

I'm also working on building a VRE for Bulgarian dialectology that I did a proof-of-concept for in February. Actually doing a batch import of all the pre-existing data from Word files (as opposed to manually entering data, as I did for the proof-of-concept) is going to be a task-- over 7,000 word nodes, plus maybe a thousand sentence nodes, and a handful of others. Getting the data into a form where it can be imported has also been a challenge, between cleaning up inconsistencies and human error, and figuring out the XSLT to pull the right data out of the XHTML generated by Word2CleanHTML.

Cocoon running on an Ubuntu server

Thanks to Gerry Siarny, there's a proof-of-concept running on Slicehost. A blog post on how to do it will be coming soon.



Subscribe to Blog