Presentation is loading. Please wait.

Presentation is loading. Please wait.

Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf’s Law and Word Co-Occurrences Reginald Smith August 10,

Similar presentations


Presentation on theme: "Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf’s Law and Word Co-Occurrences Reginald Smith August 10,"— Presentation transcript:

1 Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf’s Law and Word Co-Occurrences Reginald Smith August 10, 2006 Sudan Studies Association Conference Rhode Island College

2 Meroitic is the language of the ancient kingdom of Kush Used for almost six hundred years from 2 nd century BCE to 4 th century CE Phonetic language written right to left (like Arabic) Transliteration made possible by work of British archaeologist FL Griffith around 1910

3 Meroitic remains largely undeciphered and an enigma No complete vocabulary is available Some words such as place names, loan words, or simple concepts are known –For example or “qore” means king –Perhaps or “qes” is Kush Many attempts have been made to understand Meroitic using phonology or comparative linguistics –Scholars have tried in vain to find a known language that is a relative (see sources in paper) –We wish we had a bilingual text like the Rosetta stone to guide us

4 A new method could use mathematics and linguistics Statistical natural language processing analyzes the properties of language using a mix of statistics and linguistics There are several properties of languages that are the same in all human languages Certain techniques can also help us possibly infer meanings of words (by relating them to other known words)

5 Zipf’s Law: Frequencies of Words If you rank order words in a text by how frequent (# of times a word appears) they are (#1 being most frequent) and then relate this to the frequency of the word, you get Zipf’s Law Zipf’s Law: where F is the frequency of a word, C is a constant, R is the rank, and α is known as the power law exponent For all languages α ≈ 1

6 Zipf Law Graphs When you graph the frequency vs. the rank on a log-log graph (graphing the logarithm of frequency vs. the logarithm of rank) you get a straight line whose slope is α Picture Source: University of Helsinki CS department Zipf line fit on data. The red line is the fitted slope on the data points

7 Does Meroitic follow Zipf’s Law? The two graphs below show log-log plots of frequency vs. rank for the Meroitic words in 69 texts. The slopes are shown for each –The normal plot counts the words as is. The morpheme out plot split out suffixes like –lowi as the separate words “lo” and “wi” –Since it has a slope of nearly -1 the morpheme out model of Meroitic seems to follow Zipf’s Law Normal plot Slope = -0.81 Morpheme out plot Slope = -1.03

8 So what does this show us (besides graphs) Despite the apparently low amount of texts available, our sample of Meroitic is structured just like all other human languages (English, Chinese, etc.) Therefore, even though we don’t know the meaning of the words, we know that the language we have is representative –Even though most of our samples are redundant funeral stelae We can then proceed to use other statistical techniques on Meroitic and also compare its statistical features to other languages

9 Step Two: Word Co-occurrence When words occur together in a text, they are said to co-occur –“I am here” has co-occurrence between “I-am” and “am-here” Co-occurrences can tell us about the words if we have enough of them –Words that co-occur with the same words often have similar parts of speech or even meanings –Can we use word co-occurrence in Meroitic to analyze classes of words?

10 What I did with Meroitic I analyzed Meroitic by matching together words that co-occurred with the same types of words For example if you have two sentences: “I eat horses” and “We eat lizards” –I match “I” and “We” because they both co-occur with “eat” –I also match “horses” and “lizards” because they also co-occur with “eat” (in the opposite direction*) I then graph connected words together and analyze them with software –What happens? *Technical note: I actually used undirected edges for co-occurring words in the graph shown on the next page

11 Meroitic Words Graph Four main groups of words form that correspond well to Meroitic categories including positions and titles, verbs, places, and miscellaneous nouns Group 4 Group 2 Group 1 Group 3

12 Results Techniques like the word co-occurrence matching can help us categorize Meroitic words that we previously guessed on by mapping them against words we already know the part of speech for Similar statistical techniques may allow us to match words with a similar “meaning” to infer the meanings of some words –This is still speculative though

13 Conclusion Statistical natural language processing is a new approach to Meroitic that could supplement other current efforts in the language Much more work remains to be done, but this new avenue may help us move closer to the goal of understanding this beautiful and mysterious language Acknowledgements: I give my boundless appreciation to Dr. Richard Lobban and Dr. Laurance Doyle for the help and advice they gave me on this paper’s topics


Download ppt "Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf’s Law and Word Co-Occurrences Reginald Smith August 10,"

Similar presentations


Ads by Google