The Markov Model of Indus Script

There is a school of thought which believes that the Harappan seals convey something linguistic; after all they had extensive trade contacts with literate Mesopotamia. Thus it is possible that these symbols — found on seals, pottery, terracota tablets — convey data regarding the origin of the consignment or owner. Then there is another school which believes that, yes, the seals had some meaning, but definitely not linguistic; maybe they were political or religious symbols.

As this battle continues — much like the one over the Aryan homeland — a new paper has been published, which analyzes the sequential dependencies between the symbols. In English we know that the the letter “s” is most likely to be followed by “e” or “o” or “u” than “x” or “z”. Similarly in the Harappan seals it was found that given a symbol, only a subset of the symbols could follow it. This order can happen only if there are some rules regarding the placement.

To find out if this model could predict the missing or illegible symbols in a damaged seal, a known data set was intentionally damaged. The model could predict the missing symbol with 74% accuracy. Also analysis of Harappan seals found in Mesopotamia and West Asia found that they were of a different encoding; maybe they represent different subject matter.

Our results appear to favor the hypothesis that the Indus script represents a linguistic writing system. Our Markov analysis of sign sequences, although restricted to pairwise statistics, makes it clear that the signs do not occur in a random manner within inscriptions but appear to follow certain rules: (i) some signs have a high probability of occurring at the beginning of inscriptions whereas others almost never occur at the beginning; and (ii) for any particular sign, there are signs that have a high probability of occurring after that sign and other signs that have negligible probability of occurring after the same sign. Furthermore, signs appear to fall into functional classes in terms of their position within an Indus text, where a particular sign can be replaced by another sign in its equivalence class. Such rich syntactic structure is hard to reconcile with a nonlinguistic system. Additionally, our finding that the script may have been versatile enough to represent different subject matter in West Asia argues against the claim that the script merely represents religious or political symbols [A Markov model of the Indus script]

Now it turns out that Soviets and Finns had done such studies in the 60s and reached the same conclusion: there is a positional order in Indus symbols. What’s unique about this new study is that it uses the Markov model for the first time.

This paper does not decipher the script, but is work which hopefully will lead to an acceptable decipherment. The word “acceptable” is used because there are many decipherments right now, but without scholarly consensus. But what the paper suggests is that the symbols, most likely, encode a linguistic system and not religious or political symbols.

This work, like the previous one , has got extensive media coverage, with even the Time, writing about it. But anything connected to the Indus is controversial and this paper is no different: first the authors were accused of being Tamil/Dravidian nationalists and once that was found to be incorrect, it was about the deteriorating editorial standards in various journals. In response Prof. Dilip K Chakrabarti wrote, “There is a conscious attempt in certain quarters to disassociate this civilisation from the later mainstream tradition of Indian/ Vedic culture.”

Historically, the beginning of this attempt can be traced to the period around India’s Independence when Mortimer Wheeler proposed that the impetus for this civilisation came from Mesopotamia. Earlier, when India was a jewel in the British crown, there was no compulsion to depict it as an offshoot of Mesopotamian or other contemporary civilisations. The early excavators had no problem hypothesising that this civilisation was deeply rooted in the Indian soil and that many of its features could be explained with reference to the later Indian civilisation. [From Indus to India]

Besides this political side show, there is a serious question: is this method sufficient to show that the Indus seals represent a linguistic system? Can such statistical studies prove or disprove that the symbols represent a language.?  The answer depends on whom you ask.

On the other hand, if you believe that the symbols are non-linguistic, there is another question: why would the Harappans send non-linguistic symbols on seals, created on hard to work materials, with a certain syntax to their trading partners to the West and North. What was the relevance and what non-linguistic information did it convey to someone in Mesopotamia?

    Very nice and interesting post. Thanks.

    Fascinating! Thanks for all the link references. Funny T-shirt, I gotta get one of those…

    One would think deciphering the Indus script is the most interesting problem in Indian Archeology. But I don’t think enough resources especially govt/academic resources are being channeled in this direction.

    Later part of your post, points to vested interests at work. A shame indeed.

      Balaji, there have been some Indian decipherments also, but none of them have generated the excitement the new ones have. These new researchers are media savvy and that helps.

    “Can such statistical studies prove or disprove that the symbols represent a language?”

    The cited paper offers empirical evidence (based on Mahadevan’s data of 1977 vintage) for patterns in the sequence of symbols, and rigorous empirical studies are statistical, what else? And, I suppose the reviewers of the paper at the Proceedings of the National Academy of Sciences would have made sure that the modeling was grounded in theory, well-specified, and the authors have interpreted the results conservatively.

    Whether the statistically significant patterns that the authors found in the inscriptions reflect an underlying language or not, is the job of a linguist. I am not one, but it appears from the references in the paper that Hidden Markov Models are quite legit in this kind of application.

    Why would the Harappans send non-linguistic symbols on seals to their trading partners Mesopotamia? Perhaps to convey royal patronage or the position of the trader in the political/social hierachy? My limited reading of the literature on the Harappan civilization, however, had me on the side of the linguistic interpretation of the symbols, and this paper adds to my confidence in the position. Thank you for pointing me to it.

    The rest of the stuff in the babllosphere is political drivel, as usual :(

      The Rational Fool,

      It seems there is another paper coming out soon, which is going to argue that such statistical methods are insufficient to prove that the symbols represent a language.

    Hidden Markov Models, eh? Interesting. I use Markov models a lot in my research — I’ll have to look into this.

      Hari, So here is the question to ponder: The Markov Model can prove that symbol have a positional order. But can it take it from there to prove if the system is linguistic or not?

    The Markov Model demonstrates that the Indus symbols have a more or less standardized position, relative to one another. The same can be said of a previous paper by the same authors (in Science magazine) using a somewhat different statistical test. However, language is not the only human-contrived set of symbols that has positional regularities. That’s the real point of the whole exercise, not the political/nationalist sideshow. For example, Farmer, Witzel, and Sproat (the defenders of the nonlinguistic theory), have produced the same sort of statistical “proof” that medieval heraldic symbols are “language.” That is, the statistics show positional regularities in these symbols, which we KNOW are NOT linguistic. The same can be said of the symbols used in Navaho sand paintings. These are not linguistic but are religious. Positional regularities by themselves demonstrate very little.

    Witzel has published a paper showing that the Dravidian loans in Sanskrit derive from a later time than other loans, most often from the Munda family. Thus, there is indirect evidence that the Indo-European branch of Indian languages, found mostly in the north, first encountered speakers of Munda languages, only later encountering Dravidian languages when they spread farther south. This proves nothing except that Dravidian may or may NOT be the language encoded in Indus script — if any language is indeed encoded there.

    Research on very early Egyptian seals suggests to some researchers a nonlinguistic function, essentially an elaborate anti-pilfering mechanism. These symbols have positional regularity, may or may not encode any linguistic information, may or may not identify an owner or a location of origin of the goods upon which they are found. The same might have been true of Indus seals and their “script.”

