Greenberg, J. H., “A Quantitative Approach to the Morphological Typology of Language,” International Journal of American Linguistics 26 (1960): 178-94.
Greenberg suggests that for a discipline to properly be considered “science”, it must move beyond mere description to comparison and classification. In linguistics this has been achieved through the respected discipline of “comparative linguistics” using the historical-genetic method to group languages into families with common ancestors. However, there is another method called “typological”, which (up to Greenberg’s day) was not so well received. Greenberg argues that the typological method has been damaged by confusion between the two, especially where typological criteria have been used to establish historical-genetic relationships.
The historical-genetic method classifies languages into families based on shared features, usually common forms related by sound and meaning. If two languages agree in a considerable number of these forms (not through borrowing) then they can be tied to a common ancestor. However, one can also compare languages that cannot be said to have a common ancestor. For instance, all languages must express comparison, but the number options for this are actually limited: a special comparative inflection of the adjective (English great-er), use of the preposition “from” (Semitic), use of a verb meaning “surpass” (African languages), etc. Some of these are more common than others, and the geographic distribution crosses genetic boundaries. Thus typological comparison also groups languages into “families”, but these are based only on the absence or presence of a given feature. However, the grouping is arbitrary and will change depending on which features are selected.
Some of these classification schemes are useless, but some can be very useful such as the classic 19th century division of languages into isolating, agglutinative, and inflecting. While the scheme itself has many weaknesses, it has hit upon something of fundamental importance to language – the morphological structure of a word. Sapir took up this classification as the foundation of his book Language and attempted to more carefully define the terms. In the end he saw a very basic form to language – every language has a stock of roots with concrete meanings (ie table, eat) as well as forms to express abstract relational ideas between the concrete terms (such as case endings). This opposition of concrete (I) and abstract (IV) marked the ends of the scale. Between the two are derivational elements (II) which impact the meaning of the root, but do not relate syntactically to the whole sentence (such as the –er in farmer). Lastly, concrete-relational elements (III) primarily affect sentence syntax, but have a concrete meaning. All languages use I and IV, but II and III are dispensable.
The technique of how II, III, and IV (all the elements which are somehow marked for syntax) are used is described as a) isolating (based on significant order of elements, John hit Bill marks John as subject, Bill as object), b) agglutanitive (like good + ness > goodness), c) fusional (like deep + th > depth), and d) symbolic (internal changes such as drink/drank/drunk). Note that the contrast between aglutanitive and fusional is itself somewhat superficial. The Semitic languages are labeled Complex Mixed-relational because all four concepts are present. In concept II they use techniques d) then c). In concept III they use c) then d). For IV they use a), though this is a weak development.
Greenberg has basically adapted Sapir’s model and developed quantitative measures. First, Greenberg introduces an index to measure the degree of synthesis based on the number of morphemes per word. The lower limit is obviously 1.00, and the upper limit while theoretically infinite seems to be practically 3.00. Analytic languages will give low ratios, synthetic higher, and polysynthetic the highest of all.
Secondly, Greenberg measures “agglutinavity” by the ratio of agglutinative constructions to morph juncture. A construction is considered agglutinative when both morphemes are automatic, which is to say that any allomorphic variation is predictable and regular. Thus leaves is made of two morphemes /leaf/ + /s/ both of which vary predictably to /leav/ and /z/ respectively. Thus the ratio is 1.00. A language with a high value for this index will be agglutinative, a low value fusional.
The third index analyzes the type of morphemes related to the number of words: root morphemes, derivational morphemes, and inflectional morphemes. This measures the basic contrast of Sapir’s categories I and IV. A language which can have more than one root per word is a compounding language, an idea that Sapir nowhere mentions. The fourth analyzes the ratio of prefixes and suffixes. The last index measures the use of word order, inflection, and concord (matching of gender, number, etc) to indicate syntactic relationships. Note that morphemes such as the Latin case ending –um, which indicates case, number, and gender, is counted separately for each function it performs.
Greenberg next spends some time discussing the problem of finding suitable definitions for concepts like morphs, morphemes, roots, etc. For instance, some things are obviously morphemes, some are obviously not, but some are in between. For example, is /deceive/ a single morpheme or is it made from /de/ + /ceive/? Greenberg suggests the model of a square, that is if there are four meaningful sequences in a language AC, BC, AD, BD then each may be considered a morpheme. An example is the English eating : sleeping : : eats : sleeps, where A is eat-, B is sleep-, C is –ing and D is –s.
Lastly, Greenberg calculates the indices for eight languages: Sanskrit, Anglo-Saxon, Persian, English, Yakut (related to Turkish, but with less borrowing from Arabic), Swahili, Annamite, and Eskimo. Based on an admittedly small sample, Greenberg’s results seem to corroborate the usual non-quantitative descriptions of these languages as synthetic or polysynthetic and agglutinative or non-agglutinative.