Google Ngrams and religion
Google Ngrams has been the death of my productivity since Friday morning. Google Ngrams are the new word clouds, but more exciting because the output is a chart-- and, as I discovered when I did a mock scientific analysis of university library graffiti, people take charts seriously. It's a first step towards a potentially insightful analysis, but by itself it's not much more than eye candy.
That said, I've found some great eye candy.
It's important to not forget that the corpus is books. As I was using it, I found myself thinking of it as if it were the entirety of Google's "corpus"-- all scanned books + the entire internet. I tried pirate vs ninja vs zombie (1900-2008); I tried World Wide Web vs the Web vs information superhighway (1990-2008); I tried comparing a variety of web standards and programming languages (1992-2008); on a whim, I tried salad bar vs. wine bar vs. sushi bar (1965-2008), and the less-healthy X-and-Y food combos, like "bacon and eggs" (1950-2008). But I began to wonder if these popular culture and technology topics are optimal for a cross-linguistic corpus of books spanning a couple of centuries. A topic that spans both time and language would be ideal.
I decided on religion.
(If you want to jump to a particular section, you can choose from soul/faith/church, New vs. Old Testament, Heavenly Virtues, Deadly Sins, beyond Christianity, of nomenclature and spelling, or other religious texts).
Soul, faith, church
I checked out the words "soul" (blue), "faith" (red) and "church" (green), from 1800-2008 in English and Russian, and 1900-2008 in Spanish, French, and German. A few conclusions:
- In English, "church" was used significantly more than "soul" or "faith" in the early 19th century, but by 1860 it was only slightly more common than "faith" and "soul". "Soul" was more common than "faith" from approximately 1890-1940. Since ~2005, "soul" has regained some ground, and they're now very close-- and on the rise.
- In Russian, "church" dominated "soul" (which, itself, dominated faith") until the Russian Revolution*. During the Soviet Union, "church" and "soul" were neck-and-neck, until the fall of communism when "church" shot ahead again. Since 2000, "church" has been declining more rapidly than "faith" and "soul", which have largely leveled off.
- In Spanish, at the start of the 20th century, "soul" shot past "faith". "Soul" began plummeting in the mid-1960's, bottoming out significantly lower than "faith" (but still higher than "church", which has been consistently lower during the whole 20th century) in the late 1980's. As of about 2005, "faith" and "soul" are about even.
- In French, like in Spanish, "church" has always been lower than "soul" or "faith" (though it's had more ups and downs over the 20th century). "Soul" was more common until about 1970, with peaks around 1925 and 1945. Since 1970, "soul" and "faith" have been going back and forth in popularity.
- In German, "soul" had two distinct peaks in the early 1920's and mid-1940's. But other than during the first peak of "soul", "church" has always dominated. "Church" and "faith" shared the post-WWII peak with "soul", and after "soul" began to decline in the late 1960's, it's been almost as infrequent as "faith". "Church" had another peak in the mid 1990's, but has been declining since.
Old Testament vs. New Testament
In these corpora from countries with traditions of Christianity, is the Old Testament (blue) or the New Testament (red) referenced more often?
- English: In the 19th century, the New Testament was referenced much more often than the Old Testament. As the absolute number of references started to decline in the late 19th century, the discrepancy between the number of references to the two Testaments decreased. From about 1950 to the late 1970's, the number of references was almost identical. Since then, the number of New Testament references has been increasing more quickly than the number of Old Testament references.
- Russian: References to the New Testament have always been more numerous than references to the Old Testament-- though not always by much. When communism fell, the number of New Testament references increased much faster than references to the Old Testament; the absolute number of references has been declining since 2000, and the gap is narrowing.
- Spanish: Unlike in the English, German, and Russian corpora, where the absolute number of references to the Testaments has risen and fallen at various points, the trend has consistently been upward for both since about 1905. What's more, there has never been much of a discrepancy between the two, except perhaps from around 1975-1990.
- French: Like Spanish, there's a general upward trend and the two Testaments are fairly close to the each other. The New Testament pulled ahead of the Old Testament between 1930-1945 and 1980-1995.
- German: The absolute numbers of references are more like the English and Russian corpora, with highs and lows throughout the 20th century. However, the New Testament and Old Testament have largely the same ups and downs, with references to the New Testament being significantly higher except towards the end of WWII.
Heavenly virtues
The early 19th century was a good time for talking about virtues. The absolute number of references has been declining steadily since the 1840's, and while it bottomed out around 2000, virtues have been making a comeback since around 2002.
- Chastity has always been very unpopular.
- Temperance has been almost as unpopular as chastity, except during the temperance movement in the late 19th century and during Prohibition
- Diligence has been about as unpopular as chastity and temperance since the 1940's, although it has been referred to more commonly since 2003-- perhaps due to the phrase "due diligence"?
- Humility was matching the decline of diligence from the 1840's to 1920's, but has remained largely steady since then, instead of declining to the level of chastity and temperance.
- Kindness was the most commonly referenced virtue until about 1920. By 1940, it settled firmly into third place. In mid-2006, it overtook charity.
- Charity always played second fiddle to kindness (even during a spike in the 1870's), but from the 1940's - 1960's it was tied with patience for first place. Since then, it has trailed only slightly behind patience.
- Patience was solidly #3 for a long time, but by 1940 it was sharing first place with charity, and has pulled ahead slightly since the 1960's.
- Gluttony has never been referenced much.
- Sloth was a greater concern from 1800-1830, before declining to a gluttony-like level of unpopularity.
- Until about 1860, greed was the least referenced deadly sin. Since 1880, it's been #5-- noticeably more referenced than sloth and gluttony.
- Lust has been a pretty middle-of-the-road sin, staying fairly steady since about 1880.
- Envy has reliably been #3, before converging with wrath around 1940. It pulled ahead around the 1970's, but has experienced a smaller post-2000 bump than wrath.
- Wrath was #2 until the 1940's, fell behind envy in the 1970's, and is making one of the strongest post-2000 comebacks.
- Pride is quite literally off the chart. (If you want a chart that includes pride here it is.) Its comeback started earlier, in 2000, even though phrases that commonly come to mind involving pride (American pride, national pride, ethnic pride) have been steady or declining since then.
- Hinduism: From 1800-1840, both Hinduism and Buddhism were referenced very rarely. While Buddhism took off, Hinduism has been increasing slowly but steadily, and has lagged behind the other religions.
- Buddhism: Interest in Buddhism developed steadily from around 1835 onward, but after the publication of Edwin Arnold's book The Light of Asia (a poetic depiction of the life of the Buddha) in 1879, references to Buddhism to surpass references to Islam until the late 1910's. From the late 1950's until mid-1980's, Buddhism was referenced more often than Judaism.
- Judaism: Until 1900, it was generally the most commonly-referenced non-Christian religion. Since then, it's been going back-and-forth with Buddhism. Since 1984, it's been ahead.
- Islam: Other than a period of 30 years when Buddhism was (at times, only barely) ahead of Islam, it has always been the most referenced major non-Christian religion. Since the 1950's, it's been far ahead of all the others.
Deadly sins
The seven deadly sins have, overall, experienced a pattern similar to that of the heavenly virtues, albeit with a smaller post-2002 comeback.
Beyond Christianity
Even in corpora from traditionally Christian countries, there's plenty to be said about religions other than Christianity.
Of nomenclature and spelling
Transliteration is a tricky business-- particularly, it seems, from Arabic. There have been two major spelling variations for the holy book of Islam (note: I did try a number of other ones, but their results were fairly negligible):
This pales in comparison to the variations in terminology (and spelling) for adherents of that faith; the standard term used today only had the majority of references after 1940:
Today's spelling for someone who follows the predominant religious tradition of South Asia only began to dominate in the 1870's; before that, you're more likely to see references to Hindoos:
Karma, Dharma and Nirvana provide a good example of how capitalization can have a huge effect on the the general shape of the graph:
Most striking here is karma, which has increasingly been used in a more general way, not particularly tied to Buddhist doctrine:
Other religious texts
Let me conclude with the frequency of references over time for three religious texts that are dwarfed by the Upanishads, let alone the Koran and Bible: the Kama Sutra, the Tao Te Ching, and the Mahayana Sutras:
All three have generally increased over time, but the periods of particularly sharp increase for the Kama Sutra align with periods of expanding sexual freedom.
* One other interesting effect of the Russian Revolution: "god" dominated "God" for much of the 20th century. Compare to the English equivalent.