NITLE 2010: Medieval Slavic wiki meets Project Bamboo

For the NITLE Summit 2010 poster session, I combined my Medieval Slavic wiki with the work I've been doing as part of Project Bamboo. You can download the poster I put together here (PDF, 650 kb); the abstract is below:
Even in a time when digital information can be accessed, searched, and filtered quickly, a number of academic disciplines rely heavily on print-only reference works and associated articles, many of which are not available on-line. Pulling together various scholars' assessment of any topic is painstaking work, taking up time that could be better spent on analysis. At the same time, article summaries are a common class assignment, but the students' work may never be seen by anyone other than the instructor. What if we could reduce the amount of scut work necessary for scholars, while making student assignments more meaningful? is designed to be a working model of what such a system might look like. Using the Mediawiki platform, I am in the process of dissecting commonly-used reference works in medieval Slavic studies into cross-linked articles, and incorporating dissenting views, supporting evidence, and other insights fromWikifying Reference subsequent scholarly articles. Heavy page-level citation of the sources, just as one would find in a scholarly article, both ensures that all the information can be verified, and allows the scholar to cite the source--rather than the wiki--to avoid criticism from traditionally-minded colleagues. I also plan to illustrate how this project aligns with the future directions for digital scholarship mentioned during the Project Bamboo workshops, with the hope that someone in a larger field might be interested in trying something similar with a class of current students.


Installing Cocoon on Ubuntu

10/11/10 -- This guide was written for Intrepid, and doesn't work on the latest Ubuntu releases. An updated and working version of the guide is available here.
This guide was prepared with help from a guide written on the GSLIS wiki by Wendell Piez. If you try it and something doesn't work, please e-mail me (quinnd -at- uchicago +dot+ edu). This document is licensed Creative Commons Attribution.
No knowledge of Ubuntu or Unix is assumed; the intended audience is someone who's managed to install Ubuntu and isn't too intimidated by the Terminal.

Step 1: Installing Java SDK

  1. Open the Terminal (Applications > Accessories > Terminal)
  2. Type: sudo apt-get install sun-java6-jdk
    (Hint: You can copy and paste, but in Terminal, pasting is Ctrl + Shift +C)
  3. There'll be some downloading, you'll have to scroll through a long TOS and agree to it, but then it will install on its own.
  4. Close Terminal.

Step 2: Installing Maven
Derived from Maven in Five Minutes.

  1. Download Maven (choose tar.bz2)
  2. Open the file; by default, it probably saved to the Desktop
  3. Extract it to the Desktop
  4. Open the Terminal again and type: cd /usr/local

    sudo mkdir apache-mavenAt this point, the Terminal will ask you for your sudo password. It's the same as the password you use to log in to Ubuntu. Then:
    cd /home/YOUR_USER_NAME/Desktop (be sure to replace YOUR_USER_NAME with your user name)

    sudo mv apache-maven-2.0.9 /usr/local/apache-maven

    export M2_HOME=/usr/local/apache-maven/apache-maven-2.0.9

    export PATH=$M2:$PATH

  5. Cross you fingers and type:
    mvn --version
  6. If everything worked right, it should display information about the version of Maven you have installed.
  7. Close Terminal.

Step 3: Installing Cocoon

Derived from Your first Cocoon application.

  1. Make a directory for your Cocoon install.
    • Using the GUI: go to Places > Home Folder, then in that new window, File > Create Folder.
    • Or, open the Terminal:mkdir cocoon
    • In the following text, I'm assuming you make a folder called cocoon in your Home Folder; if you give it a different name or put it somewhere else, you'll have to change the commands accordingly.
  2. Open the Terminal and change directory to your cocoon folder:cd cocoon

    mvn archetype:generate -DarchetypeCatalog=

  3. This begins the install process.
    • For archetype, choose 2
    • Define value for groupId: - This should be a unique value. A classic value to use is, if you own the namespace, you could type com.myurl
    • Define value for artifactId: cocoon
    • Define value for version: 1.0-SNAPSHOT: 1.0.0
    • Define value for package: - groupID.cocoon (i.e. com.myurl.cocoon)
  4. After everything's done installing, you should see [INFO] BUILD SUCCESSFUL
  5. Make sure you're in your cocoon directory in Terminal (does it say ~/cocoon$ right before the cursor?), and type mvn jetty:run
  6. There'll be a lot more installing, but it should conclude with [INFO] Started Jetty Server
  7. Open a browser and go to http://localhost:8888/cocoonTest - you should see a message saying Apache Cocoon: Welcome

Step 4: Cocoon Add-ons
There are a couple add-ons for Cocoon that are essentials-- like generators for HTML. If you want to use XSLT 2.0, Saoxn 9 is also critical. Posibly less important are the FOP processor (to generate PDFs from XSL-FO), Batik (for SVG) and Forms (to genrate forms). If you don't need to use XSLT 2.0, you can skip the first part of this section.

  1. Installing Saxon 9 - a good idea
    1. Open your cocoon directory and navigate to src/main/resources/META-INF/cocoon
    2. Create directory avalon
    3. Create the following files in Text Editor (Applications > Accessories > Text Editor), and place them in the avalon directory:
      1. File named cocoon-core-xslt-saxon.xconf


      2. File named sitemap-transformers-saxon-transformer.xconf


    4. Download
    5. Extract the zip file to you Home folder; you can delete everything but saxon9.jar
    6. Open a new Terminalcd cocoon

      mvn install:install-file -DgroupId=net.sf.saxon -DartifactId=saxon -Dversion= -Dpackaging=jar -Dfile=../saxon9.jar

    7. Go to cocoon and open pom.xml
    8. At the bottom of , add:


    9. If for some reason you only want Saxon 9 and not the ability to generate HTML, skip to the bottom of this section
  2. Installing HTML support
    • Still in pom.xml, at the bottom of , add:



  3. Installing FOP (for PDFs)
    • Still in pom.xml, at the bottom of , add:


  4. Installing Batik (SVG)
    • Still in pom.xml, at the bottom of , add:



  5. Installing Forms
    • Still in pom.xml, at the bottom of , add:


  6. There's a list of all blocks, and the syntax for the dependency code is in there someplace.
  7. Once you're done adding dependencies:
    1. If you have a Terminal open with [INFO] Started Jetty Server, close it.
    2. Open a new Terminalcd cocoon

      mvn compileAfter it's done...
      mvn jetty:run

Redirecting the Sitemap
You can add your pipelines to the sitemap.xmap in cocoon/src/main/resources/COB-INF, or (more conveniently) you can tell that base sitemap to look elsewhere for your files.
I'm assuming here that you have a folder called myproject in your Home folder where you have all your files and your sitemap. Please change that, and your user name, accordingly.

Included here is also the code to generate more useful error messages than a blank pages.
In sitemap.xmap in cocoon/src/main/resources/COB-INF, at the bottom of the

src="/home/YOUR_USER_NAME/myproject/sitemap.xmap" reload-method="synchron"/>

In this case, your project will be found at http://localhost:8888/cocoonTest/myproject/[things that match your pipelines]. But it doesn't have to match the folder name with your files. You can change the URL by chanigng to

Hints and Tips
Every time you restart Ubuntu, you have to restart Cocoon:
cd cocoon

mvn jetty:run
Be sure to keep that Terminal window open while you're working with Cocoon. You can always check if Cocoon is working by going to: http://localhost:8888/cocoonTest.


Bulgarian Dialect Atlas at the 17th Meeting of the Balkan and South Slavic

Andy and I will be demoing the Bulgarian Dialect Atlas on April 17th at the 17th Balkan and South Slavic Conference at the Ohio State University. The slides and handout will be posted after the talk.

While there hasn't been much additional development on the project since its presentation at Balisage last summer, this will be the first presentation for a Slavist audience.

Below is the abstract we submitted for the talk:

An XML-­Based Approach to Dialectological Data: The Development of Syllabic Liquids in Bulgarian

The reflexes of syllabic liquids (hereafter CrC) in East South Slavic are strikingly diverse and therefore of interest for linguists working on a wide range of topics. In particular, the distribution of CrC reflexes in standard Bulgarian has been a recurring topic in the phonological literature, due to the empirical observation that the place of the vowel (/ăr/ versus /ră/ or /ăl/ versus /lă/) is conditioned by the syllable structure (Scatton 1974, Scatton 1976, Petrova 1993, Barnes 1997). In this paper, we present a tool to facilitate the examination and analysis of CrC reflexes across the dialects of Bulgarian.

This tool builds upon the word lists in the Bulgarian Dialect Atlas (BDA) by providing more accessible interfaces to the data. The words have been transcribed and marked up using XML to indicate lexeme, reflex, and place of stress (where applicable). Each site is listed with its associated words and geographic coordinates. This metadata is leveraged using XSLT stylesheets to generate views onto the data that would not previously have been possible. Each site has its own profile that shows what percentage of the tokens have which reflex, lists all tokens, and notes tokens of the same lexeme that have different reflexes. The profile for each reflex shows what percentage of sites have that reflex, which reflexes co­-occur with it, and which lexemes have the given reflex and a different reflex within a single site. One of the views onto the lexemes is a sort based on how many reflexes are attested for a given lexeme, which provides insight into the lexical diffusion of reflexes. The token view identifies where a token is the unique carrier of its reflex. Dynamically generated maps are provided for most views, using color­-coded location markers that better capture the nuances of the data than those found in the printed atlas.

This allows for an extremely detailed micro-analysis of the dynamics of lexical diffusion involved in the development of Bulgarian CrC reflexes, while providing macro-analytic tools that facilitate the identification of larger­-scale trends in the data. The enhanced ability that this tool provides to identify locally divergent geographical points enables the easier identification of areas that may be of interest for more in­-depth research. The ability to compare CrC reflexes in different environments makes it more feasible to track regional variation not just in the specific tokens attested in the BDA, but also, when multiple reflexes are found, to characterize the functioning of each reflex within the overall grammatical structure of any given dialect. These features will be of use in future research on this topic by enabling the inclusion of Bulgarian dialect data to an extent that was previously not feasible. We will also discuss the applicability of similar markup schemes to other types of data sets.


  • Barnes, Jonathan. 1997. “Bulgarian Liquid Metathesis and Syllabification in OT.” in Bošković, Željko, Steven Franks, and William Snyder, eds. Annual Workshop on Formal Approaches to Slavic Linguistics: the Connecticut Meeting: 38­-53.
  • Petrova, Rossina. 1993. “Prosodic Theory and Schwa Metathesis in Bulgarian.” in Avrutin, Sergey, Steven Franks, and Ljiljana Progovac, eds. Annual Workshop on Formal Approaches to Slavic Linguistics: the MIT Meeting: 319-­340.
  • Scatton, Ernest. 1974. “Metathesis of Liquids and [Ъ] and the Bulgarian Verb.” in V Pamet na Prof. Dr. St. Stojkov – Ezikovedski Izsledvanija: 87­-90.
  • Scatton, Ernest. 1976. “Liquids, schwa, and vowel­-zero alternations in modern Bg.” in Butler, ed. Bulgaria Past and Present. Columbus: 323­-327.
  • Stojkov et al., ed. 1964­1975. Bălgarski dialekten atlas. BAN: Sofia.



Subscribe to Blog