Early results from a birchbark letter proof-of-concept
The frustrating thing about attempting a proof-of-concept for the birchbark letters is that many of the calculations I find most interesting need to be run over the entire corpus-- which means a lot more data preparation than I currently have finished. The sample "names in context" (NIC) XML only has entries for 42 documents, so there's a lot of work yet to be done.
Characterizing the context for different kinds of names
I've written the XSLT to do a number of calculations-- how often are women mentioned in the context of some kind of financial transaction? When women are mentioned in a financial context, are they more likely to be referred to by a derivative of their husband's name than when they are mentioned in other contexts? Are there contexts where certain kinds of names-- "church" names, non-Slavic names, female names-- appear with particular frequency, or not at all? Unfortunately, the output is meaningless until the full NIC XML is ready.
Building charts and graphs
I can see writing code to enable the user to generate-- and, more importantly, compare-- charts and graphs like those found on p. 26-33 of Zaliznjak 2004, but with greater detail. Within a given date range, is there a statistically significant difference between how frequently the jer merger, or cokan'e, occurs when the writer has a non-Slavic name, vs. writers with Slavic names? It didn't seem worth doing for this particular proof-of-concept, though; the only data set I could draw on for this would be the index of all the documents and their dates, and a chart showing how many documents were found in each 50-year time span or what-have-you doesn't seem worth the effort.
Actual results: names and genders
For some usable-- if not the most exciting-- results, I've turned to the fully-prepared index of names, which does allow for certain kinds of calculations. Keep in mind, all of these below refer to unique names, not instances of use ("Khrestina" appears 5 times, but is counted once for these calculations; I don't have the NIC data ready yet to account for multiple instances.)
- 10.7% of the names with an identified gender are female (87 names).
- 35.6% of the women's names that are used are derived from a male name, presumably their husband (31 names). There's an opportunity here to go fishing for the equivalent male names from the same time period and location, but my initial guess at reconstructing the male names suggests that there's no attestation-- contemporary with the woman's name or not-- of the male name in about 25% of the cases.
- 12% of all names have no identified gender (110 names). One of my goals is to assign probable genders to as many of these as possible, after establishing the contexts where men and women are more likely to appear.
Actual results: Boris
When deciding how to build the sample NIC data set, I chose to include all documents that reference the name Boris. I wanted to see how I could organize the available data to facilitate the task of determining how many Borises are represented in the birchbark letter corpus, and my initial pass at 12th century documents surfaced 5 documents mentioning "Boris".
What information is useful to help differentiate one Boris from another? I've listed each document where the name occurs, and sorted them using the earliest conditional date proposed. I've included the location where the document was found, but to put that information into the right context, it's important to know Boris's role in the document. Particularly where he's a sender or 3rd party, documents referring to the same Boris could easily be found in different locations. The value of the additional information concerning Boris's role varies; one could argue that whether he owes money or not is fairly inconsequential, but information indicating that the reference is to Saint Boris, or that the Boris in question has recently died are crucial the task at hand.
- Борисъ (m)
- 906
- 1075 - 1100
- 3p: Троицк, Е
- personal: religious
- 742
- 1100 - 1120
- to: Троицк, К
- financial: owes
- orders: receiving
- 237
- 1160 - 1180
- from: Нерев, И
- personal: complaint
- 806
- 1160 - 1180
- 3p: Троицк, Е
- 581
- 1180 - 1200
- to: Троцк, Е
- financial: gen
- 671
- 1180 - 1200
- 3p: Троцк, Г
- financial: gen
- 819
- 1180 - 1200
- to: Троцк, Е
- financial: owed
- orders: receiving
- 935
- 1180 - 1200
- 3p: Троицк, Е
- financial: gen
- 714
- 1200 - 1220
- from: Троицк, К
- personal: gen
- 343
- Борисъ Милославовъ
- 1280 - 1300
- 3p: Нерев, Д
- 263
- Борисъ Пѧнтелѣѥвъ
- 1360 - 1380
- 3p: Нерев, Е
- financial: owes
- 579
- 1360 - 1380
- from: Нутн
- orders: giving
- 701
- Борисъ Петаревъ
- 1360 - 1380
- from: Троицк, П
- orders: giving
- 744
- 1360 - 1380
- 3p: Федоровск
- 43
- 1380 - 1400
- from: Нерев, А
- orders: giving
- 49
- 1410 - 1420
- 3p: Нерев, Г
- personal: death
- 906
So how many Borises are there? Probably somewhere between 10 and 12.
- Saint Boris - BBL 906
- Boris of the early 12th century - BBL 742; 40-60 years between this Boris reference and the next one make it unlikely that it's the same Boris
- Boris(es) of the late 12th century - BBL 237, 806, 581, 671, 819, 935 all range from 1160-1200. Two letters to a Boris (581, 819) and two where Boris was a third party (806, 935) were found in Troick. E, which may be a reason to connect those with a single individual. Other than BBL 237, from Nerev., all the other documents from this time period are from Troick.
- Boris of the turn of the 13th century - BBL 935 could plausibly be grouped in with the above if positioned on the earlier side of its 1200-1220 date range.
- Boris Miloslavov - BBL 343, 1280-1300
- Boris Pjantelejev - BBL 263, 1360-1380
- Boris Petarev - BBL 701, 1360-1380
- Boris(es) of the late 14th century - BBL 579 (Nutn., from), 744 (Feodorovsk., 3rd person); could be any of the two Borises above, the one below, or someone else.
- Boris and Nastas'ja - BBL 43 (1380-1400) and BBL 49 (1410-1420); BBL 43 features him giving orders to his wife, and BBL 49 is his wife complaining about his death.
The next step-- assuming we had a full data set-- would be to look at the names that co-occur in documents, to try to build up a "social network". (The results may also help fine-tune our identification of individual Borises.) Some kind of "point system" would likely be involved to weight the connections between people, say, 5 points between a writer and an addressee, 3 points between the writer/addressee and each of the third parties in a document, and 1 point between each of the 3rd parties. I haven't by any means worked out the details, and am enthusiastically open to ideas and suggestions for how to handle that part, but there's a lot of data left to be entered before the project reaches that point.