Bulgarian Dialect Atlas

This project built upon the word lists in the Bulgarian Dialect Atlas (BDA) by providing more accessible interfaces to the data. The words were transcribed and marked up using XML to indicate lexeme, reflex, and place of stress (where applicable). Each site was listed with its associated words and geographic coordinates. This metadata was leveraged using XSLT stylesheets to generate views onto the data that would not previously have been possible. Each site had its own profile that showed what percentage of the tokens have which reflex, lists all tokens, and notes tokens of the same lexeme that have different reflexes. The profile for each reflex shows what percentage of sites have that reflex, which reflexes co-occur with it, and which lexemes have the given reflex and a different reflex within a single site. One of the views onto the lexemes was a sort based on how many reflexes are attested for a given lexeme, which provided insight into the lexical diffusion of reflexes. The token view identified where a token is the unique carrier of its reflex. Dynamically generated maps were provided for most views, using color-coded location markers that better capture the nuances of the data than those found in the printed atlas.

This theoretically allowed for an extremely detailed micro-analysis of the dynamics of lexical diffusion involved in the development of Bulgarian CrC reflexes, while providing macro-analytic tools that would facilitate the identification of larger-scale trends in the data. The tool was supposed to help identify locally divergent geographical points that may be of interest for more in-depth research. The ability to compare CrC reflexes in different environments would make it more feasible to track regional variation not just in the specific tokens attested in the BDA, but also, when multiple reflexes are found, to characterize the functioning of each reflex within the overall grammatical structure of any given dialect.

Ultimately, the project could not continue due to deficiencies in the source data from the BDA. The inconsistency in the data set (there was no common questionnaire, resulting in a haphazard assortment of tokens in each site) made it impossible to draw meaningful conclusions across the corpus.