Presidential and Vice-presidential Candidates: Language beyond the Word

October 20, 2008

Art Graesser, Moongee Jeon, and Zhiqiang Cai, University of Memphis

It is popular these days to analyze the language of candidates.  We use language as signals on their persuasive impact, entertainment value, and eventually the votes.  This makes sense because language is the window to the thoughts and values of the candidates.

One popular recent approach is to analyze the words used by the candidates. Our colleague James Pennebaker at University of Texas has made the persuasive case that pronouns are important.  Another colleague, Jeff Hancock at Cornell University, has made the case that cognitive words signal deception.  These are all valid analyses.  But the point that we wish to make is that it is also important to dig deeper into the language, into sentence composition and the coherence of the message.  It is time to move beyond the word and into deep meaning. 

We have recently analyzed the nomination acceptance speeches of candidates to perform deeper computer analyses of language.   We used Coh-Metrix, the only computer tool free to the public that analyzes language on sentence composition and discourse coherence (that is, how ideas in sentences connect with other sentences in meaning). Coh-Metrix can be accessed via Google.  It was developed at the University of Memphis on a large grant from Institute for Education Sciences to analyze the language and coherence of textbooks (with Danielle McNamara, Art Graesser, and Max Louwerse).  

We analyzed the nomination acceptance speeches of the four nominees: Obama, McCain, Biden, and Palin.  We selected these speeches because they were all on an even playing field on importance and potential impact on the voters.  It is also perfectly obvious that the speeches are products of speech writers.  So we don’t know whether the conclusions are products of the nominees or their speech writers.  However, it is the candidate that is ultimately responsible for the messages.  We used Coh-Metrix to see how they are different.  So what did we learn?

Length Matters

There were differences in acceptance speeches on length of the speech and the sentences.  The length of all of the candidates was approximately 3500 words, with the two presidential speeches about 50% longer than the vice-presidential nominees. The length of the sentences is an important consideration.  Obama was the obvious leader on this dimension.  The mean number of words per sentence was approximately 20 words whereas the rest of the pack was about 15 words.  We know that the grade level of messages is determined by the length of sentences and the length of words: The greater length of words and sentences translates to a higher grade level (that is, greater difficulty).   We found that Obama was the leader in the grade level of the message according to the Flesch-Kincaid scale of readability (the most popular and accepted measure of readability of messages).  Obama’s speech had a10th grade level whereas the rest of the pack had a 7-th grade level. 

Content Words Don’t Matter Much

Pennebaker made the case that pronouns and function words are important indicators that differed among candidates and that were important.  True enough.  But what about content words?  These are nouns, main verbs, and adjectives.  We found that content words did not differ much among candidates.  Consider the 4 measures in the figure below – clearly no differences among candidates. There were no differences in the words’ concreteness, imagability, familiarity, and age of acquisition (defined as the age when most people learned the words) according to Coltheart’s MRC Psycholinguistics Database.  We suspect that nominees are coached on the words they use so that might explain why there is an even playing field on selection of content words in their speeches.  

Sentences Differ Somewhat Among Candidates

Let’s go beyond the words into sentences.  We analyzed the syntactic complexity of sentences and the noun-phrase complexity.  These are shown below.  The syntactic complexity was approximately the same, except that McCain’s speech was a bit lower.   Palin’s noun-phrase complexity showed a slight advantage, measured as the number of adjectives that modify the nouns. The “hockey moms” and the “six pack dads” are richer noun-phrases than those of the male candidates.  

We found that the Democrats had a higher incidence of questions in their speeches than the Republicans. However, the presidential candidates had a higher incidence of negations.  Questions and negations are flags of uncertainty, openness, skepticism, and other dimensions of complexity.

Coherence of the Messages Differ among Candidates

Coh-Metrix analyzes the coherence of messages on dozens of measures.  Each measure analyzes the extent to which ideas are connected to each other logically and conceptually.  One measure assesses the extent to adjacent sentences have common content words. The other measure assesses the extent to which adjacent sentences are semantically related.  The latter measure is based on latent semantic analysis, a statistical computation that is based on hundreds of dimensions of meaning (developed by Landauer, Dumais, and Kintsch).   The two presidential candidates had more coherent messages than the vice-presidential messages on these coherence measures. 





So what might we conclude from all of this?  One conclusion that there is much more going on than words.  It is easy to think about words because they are simple, easy to train, and sometimes flashy.  However, we live in a complex world of ideas and solutions to complex problems.  It is important to also consider levels of language and discourse that move beyond the word and into deeper levels of meaning. 

A second conclusion is that the nomination speeches of presidential candidates are a notch above the vice presidents.  They are longer and more coherent, perhaps with the coaching of the speech writers.  It will be interesting to see how the unprepared discourse segments differ among candidates.  This will be our next question, with the assistance of Coh-Metrix. 

A third conclusion is that the complexity of Obama’s language tends to rise to the top.  The speech length, sentence length, grade level, sentence syntactic complexity, noun-phrase complexity, questions, negations, and coherence were all at the top or among the top-two nominees. 

We will continue to analyze the speech of the four nominees.  Stay tuned.



3 Responses to “Presidential and Vice-presidential Candidates: Language beyond the Word”

  1. I would like to comment on the notion, expressed elsewhere on this site and repeated here, that “cognitive words signal deception.”

    Without being an expert on the methods of analysis used here (I am not a linguist — just a high school English teacher), I have to take issue in my own unscientific way with the idea that “deception” is the accurate term to use to describe the factor underlying cognitive language. In particular, the high incidence of cognitive words seems to be associated with Obama’s word choice, but might this be a reflection of carefully measured diction rather than outright deception? Obama is known for his pragmatism, and he seems to many supporters genuinely to be crafting his words and ideas to appeal to a broad audience; this involves in some cases forsaking one’s own personal attitudes in favor of attitudes that will reflect the values of the audience. This kind of pragmatic approach is different from mere pandering to the audience because the pragmatic approach represents an integral aspect of the speaker’s political stance, not a base compromising of that stance.

    “Deception” is a word loaded with negative connotations, but the intent with some speakers may not necessarily be dishonesty. It is true that something may be concealed by cognitive language in cases like this, but the word “deception” is too negatively loaded to be used without qualification if this kind of pragmatic concealment is the underlying factor.

    My overall point, I suppose, is that context and “deep meaning,” as this post suggests, need to be considered before the label of “deception” can legitimately be applied.

  2. Jeff Hancock Says:

    First, great post Memphis group. Very impressive, with some expected observations (Obama’s language is more complex) and some unexpected (Palin’s syntax seems intact!). How much do these more complex, discourse level analyses gain us over a word count approach?

    For Matt and your comment, I agree with your point about deception. I don’t think that cognitive words always signal deception. Far from it. But they do seem to be important when we know that the person is lying (because we made them in a lab). Like all issues with deception, I think more than one cue or aspect of a case needs to be considered.


  3. This is an emerging field of study, and one would do well to be extremely tentative about any conclusions. Along those lines, Matt Patterson’s comments are wise and instructive. Are they scientific? I don’t know. Is this field of study scientific? Again, I don’t know. I can tell you this. Some of the premises might be suspect. how? I’ll use readability as an example. More than 20 years ago, I was an editor at an educational publisher. We used various readability scales (Flesch’s, among them) for textbook manuscripts. I remember staff members saying, “Well, what grade level do you want?” because we knew we could find particular candidate passages to skew the grade level to where the publisher intended it. My point is: it was a highly unreliable gauge. I suspect the same of some of these parameters, even (perhaps more so) with the help of advanced computer programs. Let’s go the other way: what “score” do you get if the candidate uttered all monosyllabic words? (More radically, have your programs run tests using the rousing speeches of some of the last century’s most heinous dictators? With what findings?)

