Some Preliminary Results

April 6, 2010

My biggest fear for my dissertation was to get a year into the project and find that the data does not support my hypothesis at all. Thankfully, it seems like that will not be the case. After some preliminary research, everything seems to be lining up quite nicely.

As you may recall, I am interested in definiteness, information structure, and the particle את in BH. Largely following Christopher Lyons, I have described definiteness as a semantic and pragmatic category that has to do with identifiability. As I detailed here, identifiability can be considered a scalar which Ellen Prince has described with a four-part scale:

evoked > unused > inferrable > new

The identifiability of a referent can be encoded by the use of pronouns, proper nouns, and determiners, which can be arranged into an implicational hierarchy:

pronoun > proper noun > definite NP > indefinite NP

Identifiability especially interacts with grammar in the expression of subjects and objects, and the object marker את is an interesting example. Scholars have long noted that the particle את is used with objects that can be considered definite, but beyond this, its distribution has been perplexing. As I mentioned in the first post cited above, however, את is a very typical example of the phenomenon termed differential object marking. In DOM languages the frequency of marking is generally tied to the parameters of animacy and definiteness. Subjects tend to be high in animacy and definiteness, thus the thinking is that object marking is motivated by the desire to distinguish more subject-like objects.

There seems to be a general correlation of את-marking with the implicational hierarchy listed above. Pronouns are obligatorily marked in BH, and in his dissertation on valency, Michael Malessa found that proper nouns were marked 97% of the time. Further, while definite objects where marked 73% of the time, those with human and animate referents were marked 90% and 83% respectively, thus there was an obvious effect of animacy on marking. Malessa, however, did not explore the impact of identifiability on marking.

In my preliminary study, I began by randomly pulling 650 finite verbs from the corpus of SBH. From these, I whittled down a group of 291 simple mono-transitive clauses (no compound objects). Pulling out the proper nouns, I had the following results:

Identifiability Animacy et no et total %
evoked animate 25 1 26 0.96
evoked inanimate 41 7 49 0.84
unused or inferrable animate 16 3 19 0.84
unused or inferrable inanimate 57 33 91 0.63
Totals: 139 44 185 0.75

The sample size is small at this point, but the results are in-line with what I would expect. Objects that are most topic-worthy, being both animate and evoked, are marked 96% of the time. Objects that are high in animacy but low in identifiability (or vice-versa) are marked about 84% of the time, while objects low in both identifiability and animacy are marked only 63% of the time. Overall, definite noun phrases were marked 75% of the time as objects, similar to Malessa’s finding of 73%.

Now I need to scale up the data and investigate some of the other interesting relationships. I will also be investigating the more complex cases such as di-transitive verbs, compound objects, and the use of את with the subject of passives.


Malessa, Michael. Untersuchungen Zur Verbalen Valenz Im Biblischen Hebräisch. Assen: Van Gorcum, 2006.

A definiteness scale

November 12, 2009

Consider the set of sentences in example 1:

1 a. I went to see a doctor today.

1b. I went to see the doctor today.

1c. I went to see your doctor today.

1d. I went to see Dr. Brown today.

1e. I went to see that doctor today.

1f. I went to see her today.

All of the highlighted phrases share the same referent, but notice that there are many more choices for referring expressions beyond indefinite and definite.  This is because these other types of expression also align with definiteness.

While the binary indefinite-definite opposition can be used to code the basic distinction between identifiable and unidentifiable referents, these other types of noun phrases can be used to code other points on the scale in the form of an implication hierarchy. Sticking with Prince’s identifiability scale, a general definiteness hierarchy can be proposed such as 2:

(2) evoked > unused > inferable > brand new >
proper noun definite noun phrase
possessive phrase
indefinite noun phrase

On an implication hierarchy, if a form can be used for one point on the scale, this implies that it can also be used for any higher point on the scale. On the other hand, conversational implicature suggests that if their is a more informative grammatical form for a higher point on the scale, this will be used instead. This tends to limit a particular form to a particular range on the scale, though there can also be reasons for over- or under-specifying a discourse referent.

Personal and demonstrative pronouns are usually taken to be highest on the scale because they refer directly and always express evoked referents. Pronouns can refer either through anaphora, which Prince labels textually evoked, or deixis, in which case the referent is situationally evoked. Demonstratives can also function as determiners when used together with a noun phrase. They can be either deictic (Look at that guy!) or anaphoric (I read that article too.).

Proper nouns also refer directly and are therefore considered to be uniquely identifiable as in Russell’s analysis. Certainly proper nouns can be ambiguous, ie “John Smith”, but unmodified proper nouns tend only to be used to introduce a discourse referent if it is one that the hearer is assumed to be familiar with, and therefore they correlate to the unused category.

Noun phrases marked with the definite article generally cover all discourse referents that can be considered identifiable. Definite noun phrases that are co-referential to an already introduced discourse referent are anaphoric and can be considered textually evoked as in 3:

(3) The neighbor’s dog got out last night. The cur knocked over our garbage cans.

More complicated is the case of first-mention definite noun phrases. In some situations a definite article can function similarly to a demonstrative as in 4 (assuming that a hammer is present in the immediate situation):

(4) Pass me the hammer please.

In this case the definite phrase could be considered situationally evoked. Definites which have unique referents within the relevant social context are comparable to proper nouns and correlate to the unused category. Examples include the sun, the president, the river, etc.  However, most first-mention definites are inferables as I discussed in the last post (like the driver).

Possessives are a more complicated case which I will discuss in my next post. Usually, however, they can be considered an inferrable that includes its anchor.

Additional Bibliography

Fraurud, Kari. “Definiteness and the processing of NPs in natural discourse.” Journal of Semantics 7 (1990): 395-433.

The identifiability scale

November 11, 2009

In my last post, I noted that the choice between “a car” and “the car” has to do with the assumption of whether the particular car is identifiable by the hearer. Thus the use of an indefinite phrase tips the hearer that they need to create a new “record”, while the definite noun phrase causes them to search for an existing record in their mental database.

However, identifying a referent is not always as simple as whether a record exists or not. Though some modify it slightly, the identifiability scale suggested by Ellen Prince remains the starting point for most studies (see also Gundel, et al for a more complicated scale). Prince multiplies the given-new distinction into four basic categories similar to 1:

(1) evoked > unused > inferable > brand new

An evoked discourse referent correlates basically to a given status. It has either been mentioned already or is self-evident from the extra-linguistic situation and can therefore be referred to through anaphora or deixis. For instance, if we are standing in a museum looking at a painting, I might say, “I like the painting.” I don’t need to introduce the painting into the discourse, but can evoke it directly.

On the other end of the scale, brand new discourse referents correspond to what is normally termed new information since they are unfamiliar to the hearer before being mentioned in the discourse. Brand new referents can be made slightly more identifiable by anchoring them. For instance, I might say, “a man I work with” instead of just “a man”.

The middle categories cover referents which have not been introduced into the discourse, but are identifiable based on the hearer’s broader knowledge, so-called first-mention definites. An unused referent is new to the discourse, but it is already familiar to the receiver. This can also be termed discourse-new and hearer-old information. Rather than creating a new referent from scratch, the existing record can be copied from long-term storage directly into the discourse model, along with any existing attributes or links. For instance, take sentence 2:

(2) I went down to the river yesterday.

Presumably, there is only one prominent river within the relevant speech community so that it can be introduced with a definite noun phrase.

Inferrables deal with cases like the cover in my previous post. An entity is inferable “if the speaker assumes that the hearer can infer it via logical or plausible reasoning based on other evoked or inferable entities.” Thus, because it can be inferred that books have covers, there is no reason to introduce a cover as an indefinite noun with a third sentence. A more efficient discourse would be 3:

(3) I bought a book yesterday. The cover was torn.

Givón notes that this can also be called “double-grounding”, since it requires both association and anaphora, or “frame-based reference”. A frame is the set of general knowledge that can be connected to a particular entity or situation.  Often this type of reference is based on whole-part relations or possession as in the case of the book. However, Chafe also gives the example of a classroom which can be inferred to have a teacher, blackboard, and students, but can also be connected to things like homework, books, quizzes, etc., and all of these can be introduced with definites.

In this post I have continued to give examples with indefinite or definite noun phrases, but in my next post I will explore the options for tipping off the hearer more precisely as to where a referent falls on the identifiability scale.

Additional Bibliography

Givón, Talmy. Syntax: An Introduction. Amsterdam: Benjamins, 2001.

Prince, Ellen. “Toward a Taxonomy of Given-New Information.” Pages 223-55 in Radical Pragmatics. Edited by Peter Cole. New York: Academic Press, 1981.

Prince, Ellen. “The ZPG Letter: Subjects, Definiteness, and Information-status.” Pages 295-325 in Discourse Description: Diverse Analyses of a Fund Raising Text. Edited by S. Thompson, and W. Mann. Philadelphia & Amsterdam: John Benjamins, 1992.

Definiteness within Discourse

November 9, 2009

The idea of definiteness as a scalar rather than binary category comes from its correlation with identifiability. As mentioned in my previous post, identifiability has to do with the hearer’s ability to identify the referent of a particular expression. Thus the choice between “a car” and “the car” by a speaker has to do with their assumptions about whether this particular car can (or should) be identified by the hearer.

First, I need to introduce the concept of a discourse model. The processing of a text by the hearer can be conceptualized as the creation of a temporary discourse model in the hearer’s mind.  This can be conceived of as a relational database consisting of the discourse referents represented by noun phrases in the text (Heim actually uses the analogy of file cards, but I took the liberty to update it a bit). Discourse referents can be individuals, classes, concepts, etc, and can be given attributes and links to other discourse referents based on the information in the text.

So, for instance, in our car example the hearer would have a discourse referent labeled Car to which they could add the attribute that it was bought by me. There can also be a link to me, since I am another discourse referent, with the attribute that I bought the car.

Referent: Pete Car
Attribute: bought car bought by Pete
Links: Car Pete

The most important clue to a hearer when processing a noun phrase in a text is whether they should create a new record for the discourse referent or find an existing record to update. This distinction between new referents and old referents is also termed givenness, expressed as a distinction between given and new information. Irene Heim noted that there is a basic relationship here with definiteness. An indefinite noun phrase triggers the creation of a new discourse referent, while a definite noun phrase usually implies that a discourse referent is given.

So, if the car is unfamiliar to my hearer, then I introduce it as an indefinite, but if I want to refer to it again later I switch to a definite phase to tip the hearer that they already have a record created as in 1:

(1) I bought a car today…. The car is a metallic black color.

The Car record can now be updated with the attribute metallic black.

This view of definiteness tends to highlight the anaphoric use of definiteness – the car is identifiable because it was mentioned previously. However, if identifiability was only a binary category, we would have discourses such as in 2 (from Prince and Walker):

(2) I bought a book yesterday. The book had a cover. The cover was torn.

This sequence feels redundant because it seems likely that the speaker can assume that the hearer knows that books have covers, thus it violates Grice’s Maxim of Quantity:

1. Make your contribution to the conversation as informative as necessary.
2. Do not make your contribution to the conversation more informative than necessary.

However, this sort of general and situational knowledge – that books have covers – is not captured in the basic given-new or definite-indefinite distinctions. Therefore, in my next post I will look at the idea of an identifiability scale and how definiteness can also be considered a scalar.

Additional Bibliography

Heim, Irene. “File Change Semantics and the Familiarity Theory of Definiteness.” Pages 164-89 in Meaning, Use, and the Interpretation of Language. Edited by R. Bauerle, C. Schwarze, and A. von Stechow. Berlin: Walter de Gruyter, 1983.

Kartunnen, Lauri. “Discourse Referents.” Pages 363-85 in Syntax and Semantics 7: Notes from the Linguistic Underground. Edited by J. McCawley. New York: Academic Press, 1976.

Lambrecht, Knud. Information Structure and Sentence Form: Topic, Focus, and the Mental Representations of Discourse Referents. Cambridge ; New York, NY, USA: Cambridge University Press, 1994.

Walker, Marilyn, and Ellen Prince. “A Bilateral Approach to Givenness: A Hearer-Status Algorithm and a Centering Algorithm.” Pages 291-306 in Reference and Referent Accessibility. Edited by Thorstein Fretheim, and Jeanette K. Gundel. Amsterdam & Philadelphia: John Benjamins, 1996.

More definiteness – the familiarity approach

November 6, 2009

As I mentioned in my last post, in studying definiteness, logical and semantic approaches tend to concentrate on issues such as existence and uniqueness and the truth or falsehood of a proposition in the real world. Discourse approaches, however, have been more interested in the pragmatics of definiteness, particularly the dynamic between the speaker and the hearer. For instance, I began my last post with the following pair of sentences:

(1a) I bought the car today.

(1b) I bought a car today.

Russell was primarily concerned with naming expressions such as “The King of France” or “Mr Jones” which have only one possible referent, but the expression “the car” in the sentence above can have an almost limitless number of possible referents. On the other hand, in the context, the phrases “the car” and “a car” have the same unique referent. What influences the choice of one phrase over the other?

The familiarity theory is usually traced to Paul Christophersen, who argued that the distinction between definite and indefinite noun phrases has to do with whether the hearer was presumed to be acquainted with the referent. Thus the difference between the two sentences in example 1 is that the car under discussion is known to the hearer in the first, but not in the second.

Note that Chafe has suggested that the term identifiability is preferable to familiarity. The distinction is that the hearer may not necessarily know the referent, but definiteness signals that they are in a position to identify it. Identifiability can be related to the prior introduction of a referent in the discourse (anaphora), the presence of the referent in the immediate situation (deixis), or the general knowledge of the hearer.

However, identifiability may not always be an adequate explanation of definiteness either. Lyons notes that so-called associative uses are the most problematic. In the associative use, a noun phrase is considered definite by its relation to a previous referent as in 2:

(2) I took a taxi to the airport, but the driver got lost.

Here it is understood that the driver is connected to the previously mentioned taxi. However, other than linking him to the taxi, the hearer is in no position to identify the particular driver. In such a case, definiteness may indeed be more about quantification than identifiability. That is, the sentence merely expresses that the taxi had a driver.

Fraurud has suggested, however, that the individuation of the referent may also be a factor. The identifiability approach tends to treat individuals as the prototypical referent, but individuals are identifiable in a different way from other kinds of entities such as classes and types, of which we may have general-lexical, but not personal, knowledge. Thus all that is necessary is to identify the driver as the taxi driver (there will be more on this in a future post).

Identifiability seems to be the prototypical use of definiteness cross-linguistically, and therefore, Lyons suggests that in general definiteness grammaticalizes this pragmatic category.  In fact, in the majority of cases, definite articles develop from demonstratives. However, it is reasonable to assume that over time in some languages the category of definiteness could be extended to other related uses such as inclusiveness as we saw earlier with plural and mass nouns. In this regard, it is interesting to note that indefinite articles tend to develop after definite articles from quantifiers such as one.

Additional Bibliography

Chafe, Wallace. “Givenness, Contrastiveness, Definiteness, Subjects, Topics, and Point of View.” Pages 25-55 in Subject and Topic. Edited by Charles N. Li. New York: Academic Press, 1976.

Christophersen, Paul. The Articles: A Study of Their Theory and Use in English. Copenhagen: Munksgaard, 1939.

Fraurud, Kari. “Cognitive Ontology and NP Form.” Pages 65-88 in Reference and Referent Accessibility. Edited by Thorstein Fretheim, and Jeanette K. Gundel. Amsterdam: Benjamins, 1996.

A closer look at definiteness – the uniqueness approach

November 5, 2009

In languages with the grammatical category of definiteness, the prototypical definite noun phrase is one marked with the definite article. This is usually contrasted with an indefinite article or bare noun phrase such as in example 1:

(1a) I bought the car today.

(1b) I bought a car today.

Explanations for the various uses of the definite article are complex, and the subject has attracted the attention of philosophers and logicians besides linguists and grammarians. The two most common explanations are the uniqueness and familiarity theories.

The uniqueness theory has its roots in the logical tradition and is usually traced to Bertrand Russell, who argued that the definite article requires existence and uniqueness as in example 2:

(2) The King of France is bald.

According to Russell this sentence implies three things:

(i) There is a King of France.

(ii) There is only one King of France.

(iii) This individual is bald.

Thus the use of the indefinite article, as in (i), merely asserts the existence of an individual meeting the description King of France, but the definite article also asserts his uniqueness.

Hawkins extended the uniqueness theory by arguing that the definite article actually expresses inclusiveness. His argument is that the referent of a definite description must be part of a shared set. In the case of an individual entity, it can be considered a singleton which is realized as uniqueness, but for plurals and mass nouns it includes everything that meets the description. For instance, consider the sentences in example 3:

(3a) We put the beer in the cooler.

(3b) We put beer in the cooler.

(3c) We put a beer in the cooler.

What is implied by sentence 3a is that all of the beer is now in the cooler. Here the difference between the definite, bare, and indefinite clearly has to do with quantification. Sentence 3b can be read as some beer was put in the cooler, while 3c implies that a certain unit of beer is meant.

In this approach, definite descriptions are not semantically referring, but only quantificational. This contrasts with proper nouns which have no “sense” but are merely pointers to the referent which they name. This follows Frege/Quine, and see also Saul Kripke on naming.

However, David McCawley pointed out exceptions such as example 4 that don’t seem to be explained by uniqueness or quantification:

(4) The dog got into a fight with another dog.

In this example there are two fighting dogs involved, but nothing particularly unique is expressed about the first dog. Therefore, David Lewis has argued that definiteness must relate to salience here rather than uniqueness, that is the first dog must be somehow more prominent in the discourse than the second.

One weakness of the uniqueness approach is that its logical roots were only concerned with the truth or falsehood of a statement, which should remain the same regardless of where or when it is expressed. Thus the approach only treats the noun phrase at the sentence level, rather than considering the larger discourse context. In contrast, discourse approaches tend to focus on the anaphoric use of definiteness, largely relying on the familiarity theory which I will summarize in the next post.


Hawkins, John A. Definiteness and Indefiniteness : A Study in Reference and Grammaticality Prediction. Atlantic Highlands, N.J: Croom Helm Humanities Press, 1978.

Kripke, Saul A. Naming and Necessity. Cambridge, Mass: Harvard University Press, 1980.

Lewis, David. “Scorekeeping in a Language Game.” Journal of Philosophical Logic 8 (1979): 339-59.

Lyons, Christopher. Definiteness. Cambridge textbooks in linguistics. Cambridge: Cambridge University Press, 1999.

McCawley, David. “Presupposition and Discourse Structure.” Pages 371-88 in Syntax and Semantics 11: Presupposition. Edited by David Dinneen, and Choon-kyu Oh. New York: Academic Press, 1979.

Russell, Bertrand. “On Denoting.” Mind 14 (1905): 479-93.


