States of the Union: Truman to Obama
January 28, 2010
by James W. Pennebaker
Most years since George Washington, the President of the United States has addressed the joint sessions of Congress along with leaders in the military, judiciary, and other parts of government in a public speech. The purpose of the address is to summarize the accomplishments and problems of the nation and to lay out plans and expectations for the coming years. Although the tone of the State of the Union addresses change from year to year, the occasion is generally a mixture of a sober analysis and political undertones.
The address is typically written, at least in part, by the president with help from experts, speechwriters, and aides. Nevertheless, it generally reflects the leader’s intentions, values, emotional and thinking styles, and personality. Unlike the inaugural address, which is delivered to the nation once every four years, States of the Union (SOU) talks are delivered annually to the country’s governing body. The SOU, then, is a more business-like and detail oriented communication intended to direct Congress to move in specified directions.
As has been discussed elsewhere, the words people use reflect their social and psychological states. When analyzing people’s communication, it is possible to separate what they are saying from how they are saying it. That is, different words reflect the content of the communication and others reveal the style of the message. Very broadly, linguistic content is conveyed through the use of nouns, regular verbs, and some adjectives and adverbs. Language style is apparent through a group of words variously referred to as function, style, or junk words. These style-related words include pronouns, prepositions, articles, conjunctions, and auxiliary verbs.
Style or function words are quite different from content words in that there are very few of them, they are used at high rates, are processed in the brain differently, and are quite social. For example, of the 50,000 to 100,000 words most English speakers have in their vocabulary, only about 500 are function words. Despite the small number of these words, we use them in almost every sentence. In fact, 50-60 percent of all the words we use are style words. Of particular significance, these style words are social in the sense that they require a shared understanding between speaker and listener.
Over the last several years, multiple studies have found that the analysis of function words can reflect psychological dimensions of speakers. Laboratory and real world studies indicate that pronouns and other style words predict a speaker’s honesty, social status, emotional state, social connections with others, dominance, and thinking style. Function words are linked to people’s immediate psychological state within a given context and also can provide a broader view of their personality across situations and time.
SOU addresses are a perfect opportunity to study the psychological features of the nation’s leaders within relatively formal contexts. Unlike most speeches, SOUs are generally given in the same location, to the same types of dignitaries, at the same time of the year. Although the speeches themselves have undoubtedly been shaped by others, they continue to reflect the personality and thinking of the president and his staff.
The Current Analyses – with special attention on Obama
All of the SOU addresses from Truman to Obama spanning from 1946 through 2010 were analyzed using the computerized text analysis program LIWC (Pennebaker, Booth, & Francis, 2007). LIWC analyzes each speech, calculating the rates at which over 70 categories of words are used. In addition, six broader categories of language are calculated based on previous research.
Social-emotional style. Many speakers work to establish a close personal relationship between themselves and their audience. Markers of this warm interpersonal style include the use of personal pronouns, high rates of positive and negative emotion words, and references to other people. In general, people scoring higher on the social-emotional style dimension are individuals who truly enjoy talking and connecting with others. As can be seen in Figure 1, there has been a fascinating evolution in social-emotional language over the last 65 years – from very low social-emotional language to the second Bush’s peak. Obama is reversing this trend. Not as emotionally or socially detached as Nixon and earlier presidents, his style is comparable to that of Reagan’s.
Figure 1: Social-emotional style. Higher numbers reflect use of more personal pronouns, references to other people, and emotional words.
Positive emotionality. Speakers differ in the degree to which they convey feelings of positive and negative feelings in their speeches. An overall positive emotionality index was computed by subtracting the percentage of negative emotion words from positive emotion words. The higher the number, the more the speaker conveys optimism and the less he uses words that convey feelings of sadness, anxiety, or anger. As can be seen in the second figure, Eisenhower, Carter, Reagan, and Clinton were consistently the most positive in their SOU addresses. Obama is striking in being the least positive.
Figure 2. Positive emotionality. The higher the number, the more the person uses positive emotion words relative to negative emotion words.
Complex thinking. An SOU address requires a certain degree of finesse to be effective. The president needs to convey complex ideas in ways that a broad audience can understand. Most issues facing a country – such as health care, national security, immigration – are composed of multiple dimensions that are often difficult to discuss in a simple way. Since Truman, presidents have varied tremendously in their attempts to talk about large issues in complex ways. Most opt to define problems simply and propose relatively straightforward solutions.
Function words allow for a nice metric to capture complexity of thinking. When people are dealing with complicated problems they must acknowledge multiple sides to an issue. Certain exclusive words – including but, except, without, or – signal that the speaker is making a distinction between what is and what is not included in the idea he is conveying. Similarly, other word categories such as negations (e.g., no, not, never) and causal words (e.g., because, cause, effect) also reflect more complex thinking.
Figure 3 is a striking graph in suggesting that two presidents have been extraordinarily high in complex thinking – John F. Kennedy and Obama. Nixon and George H. W. Bush are a distant 3rd and 4th. It is also interesting that both Bush-2 and Clinton are two of the least complex thinkers in their SOU addresses.
As a side note, the complex thinking dimension simply reflects the language that the president uses in the SOU address. He may actually be a very complex thinker in general so these numbers merely tell us how he is presenting ideas to the congress and the American people.
Figure 3. Complexity of thinking. The higher the number, the more complex and nuanced the language in the presentation of arguments.
Categorical versus dynamic thinking. There are multiple ways to break down a complex problem. Perhaps the most traditional method is to try to categorize the issue. For example, if asked to evaluate the current economy, a categorical thinker would likely identify the various components, then the subcomponents. In other words, the categorical thinker sees the first issue in approaching a new task as creating the relevant categories and the breaking down the problem to fit into the boxes that have been constructed. People who are high in categorical thinking tend to use a high rate of concrete nouns, articles, and prepositions.
A very different approach is called dynamic thinking. Dynamic thinking involves evaluating a new problem from a historical or developing perspective. Instead of first evaluating the categories or dimensions associated with the problem, the dynamic thinker tracks how we have arrived at the problem, thereby tracking the problem over time. If asked to evaluate the economy, the dynamic thinker may start with a point in the past and trace how historical forces have brought us to today’s economy. Dynamic thinking is generally measured by the high use of verbs. Interestingly, the more that people use verbs, the less they use nouns – suggesting that people tend to be either categorical or dynamic thinkers but not both.
Figure 4 reveals two fascinating trends. The first is the evolution of dynamic thinking over time. In the last 65 years, a striking shift in thinking emerged beginning in the 1980s. With the election of Reagan, presidents moved from displaying categorical thinking to being more dynamic in the ways they discussed complex issues. Every president since then has followed this trend. Obama is striking in being by far the most dynamic and least categorical thinker in the modern presidency.
Figure 4. Categorical versus dynamic thinking. Higher scores reflect categorical thinking whereas lower (or more negative) scores indicate dynamic thinking.
The Language and Personality of Obama’s State of the Union Addresses
Barack Obama thinks and relates to people differently from most of his predecessors. His thinking style is both highly complex and, at the same time, dynamic. Socially and emotionally, he is surprisingly cool and distant. The word “cool” is not ill-advised. In his SOU addresses, as well as his press conferences, he is detached. His use of both positive emotion and negative emotion words is much lower than recent presidents. Although his personal pronouns in his SOUs are slightly above average, they are actually quite low when talking informally in interviews or press conferences. His is the language of the confident leader as opposed to the close buddy.
Obama has now delivered two SOU addresses. Has his language changed much from a year ago? Very broadly, no. If anything, he is becoming more dynamic in his thinking and slightly less positive in his emotional tone. Overall, however, he maintains a remarkably even style in the ways he talks to his audiences.
References
Chung, C.K., & Pennebaker, J.W. (2007). The psychological functions of function words. In K. Fiedler (Ed.), Social communication (pp. 343-359). New York: Psychology Press.
Pennebaker, J.W. (August 9, 2009). What is “I” saying? (guest post). The Language Log. http://languagelog.ldc.upenn.edu/nll/?p=1651
Pennebaker, J.W., Mehl, M.R., & Niederhoffer, K.G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54, 547-577.
Slatcher, R.B., Chung, C.K., Pennebaker, J.W., & Stone, L.D. (2007). Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates. Journal of Research in Personality, 41, 63-75.
Relevant Websites
http://www.psy.utexas.edu/Pennebaker
Language of the Media — I
November 2, 2008
by Vera Vine and James W. Pennebaker
An important part of the 2008 election is the language of the mainstream media. Accusations of media bias fly from both sides of the aisle, including the supposed deep-seated liberal (or, sometimes, conservative) bias of television and newspaper reporting. But without a concrete metric for assessing media bias, most arguments about it often descend into partisan maneuvering. Our text analysis software program, Language Inquiry and Word Count (LIWC; see liwc.net), can help to quantify some of the media’s language. We focused on three major newspapers, the New York Times, the Washington Post, and the Wall Street Journal.
What we did:
Overall, 138 news reports were collected, comprising 46 topics covered by each of three newspapers, The New York Times (NYT), The Wall Street Journal (WSJ), and the Washington Post (WP), spanning the period beginning with the formation of the first presidential ticket on August 22, 2008, through the launching of the final week of campaigning on October 27, 2008. These newspapers were chosen because of their independence (each is owned by a different company), large readership, and reputations for influential and exemplary reporting. As of November 2, Obama has been endorsed by the NYT and WP; the WSJ has not endorsed anyone – although it has a conservative reputation.
To make comparison possible, news reports were selected so that each news story had counterparts with identical topics and similar dates to the other two newspapers. Thirteen articles from each paper were about Barack Obama’s campaign, thirteen were about John McCain’s campaign, eleven covered the U.S. economic crisis, and nine covered general election news concerning both parties equally (e.g., debates, shifts in polls).
What we found: Comparing the campaign coverage within each newspaper:
The New York Times:
The NYT articles about the McCain campaign were longer than those about the Obama campaign (on average almost 250 words longer). Pronoun use also differed: the NYT used significantly more impersonal pronouns when covering the McCain campaign, and more “you” when covering the Obama campaign.
The Washington Post:
The WP used shorter sentences when covering the Obama campaign than they did with the McCain campaign. When covering Obama, the WP also used more personal pronouns, particularly “I” and “you,” and more verbs. The WP’s coverage of the Obama campaign is also nearly significantly higher on the index of “immediacy,” a factor thought to indicate informal style (Pennebaker & King, 1999).
The Wall Street Journal:
The WSJ had the fewest differences between coverage of the two campaigns. Although no differences reached the level of significance, some trends suggest that the WSJ’s language when covering McCain’s campaign contain more negations, more anxiety words, more certainty words (“absolute,” “certainly”), and more exclusive words (“except,” “but”).
Taken together, these results suggest that the WSJ may actually be less biased than the NYT and WP in their political news reporting, despite a more conservative reputation. These results may be consistent with another study of media bias conducted by a group of political scientists (Groseclose, T. & Milyo, J. (2005). A Measure of Media Bias. The Quarterly Journal of Economics, 120, 1191-1237).
Comparing the mentions of candidates’ names:
Not unexpectedly, news stories in all papers said “Obama” and “Biden” more when reporting on the Obama campaign, and “Palin” more when reporting on McCain’s. What is somewhat surprising is that the newspapers referenced McCain much more freely, regardless of which campaign was the focus of the news report, which might suggest a preoccupation with McCain, or a tendency to consider news about Obama in light of McCain’s activities.
As for mentions of George W. Bush, considered by many to be the specter haunting this election, there was a trend suggesting higher rates of use of “Bush” when covering Barack Obama’s campaign, but only in the NYT. Obama sought to link McCain to the Bush administration, so perhaps the NYT has more coverage of these Obama talking points than the other newspapers do. Or perhaps this difference suggests that McCain’s attempts to distance himself from Bush may have been somewhat successful.
Comparing the newspapers with each other:
Despite the differences in language between the coverage of the campaigns, the overall styles of the three newspapers were fairly similar when news reports on all 4 topics were taken together. When the language did differ, it tended to be in the expected directions based on the papers’ respective areas of expertise. For example, the WSJ articles were the least personal in their writing style, using fewer social words and more quantifiers (e.g., “much,” “fewer”) and impersonal pronouns (“it,” “that,” “those”). The WSJ language also included shorter sentences, fewer function words (i.e., non-content words including pronouns, prepositions, and particles), less use of “we” and “they,” fewer verbs of almost all types, fewer exclusive words (such as “except,” “but”), and fewer cognitive mechanism words (“think,” “know”).
Long considered the “writers’ newspaper,” the WP used longer sentences, more “we,” more present tense and less past tense, fewer quantifiers, and somewhat fewer cognitive mechanism words.
The take-away:
The emotional tone of the coverage of the two candidates was surprisingly even handed across all three newspapers. There was a weak trend suggesting a more personal tone in reporting on Obama’s campaign by the WP and NYT.
The next step will be to tease apart the linguistic styles of the reporters. For example, does the more personal and dynamic quality of reporting on Obama come from the language the reporters bring to the table, or from the oratorical style of things Obama is quoted as saying? This is ultimately the dilemma in understanding any translation: Is the message an accurate account of the original speaker or does it reflect the psychological makeup of the translator?
by Molly Ireland
Most of us can probably recall times when we felt powerfully in sync with a person during a conversation, for better or worse. While in friendly situations synchrony often translates to simultaneous laughter and increased rapport, in less friendly contexts synchrony might take the form of synchronized suspicion and mutual outrage that the other person refuses to bend to our will.
Language Style Matching. In our lab at the University of Texas at Austin, we’ve been studying a specific kind of verbal synchrony which we call Language Style Matching, or LSM. Style matching is measured by comparing the way two sides of a conversation or two texts use function words, like pronouns, articles, and conjunctions. If two people use similar proportions of, for example, personal pronouns, then their LSM score in that individual LIWC category (see liwc.net) will be high. The comprehensive LSM metric we use to assess overall synchrony is the average of nine function word categories’ matching scores. As other entries on this site discuss, the way a person uses function words is often the key to predicting what they’re thinking and feeling at the time and how they are likely to behave in the future. So if LSM for a conversation is high, then odds are good that everybody’s in the same mindset – perhaps even if they don’t agree with or like each other.
Lab studies. In collaboration with Amy Gonzales and Jeff Hancock at Cornell, we found that in certain cooperative settings LSM is positively correlated with how much group members like each other and can help predict how well a group performs on a task. Recently we analyzed language in a more competitive setting, in transcripts from negotiation studies conducted at the University of Chicago by UT’s Marlone Henderson. Preliminary evidence indicates that LSM is not always a good thing. Higher LSM predicts poorer overall negotiation outcomes for participants who have been experimentally manipulated to approach the negotiation less objectively. For people with objective distance from the negotiation, LSM didn’t predict performance. (In general, objectivity leads to better performance, and being too close to an issue or a negotiation partner leads to poorer negotiation outcomes.)
As with all of the language research discussed here and elsewhere, it’s probably a bad idea to jump to conclusions about what, taken together, these two sets of findings mean for style matching that occurs in real life. But we do know that LSM is a reliable measure of function word synchrony, and we know that function words are themselves reliable predictors of psychology and behavior both in and out of experimental labs. Beyond that, we can safely guess that while style matching often leads to rapport, it can also indicate mutual stubbornness or distrust that, ironically, makes it harder to find common ground. And, more speculatively, to the degree that we can control when and how we style match, good communicators probably know when to follow another’s conversational lead and when to step out of sync.
The Candidates. Using online transcripts from news media websites, I looked at how the presidential candidates match their interviewers’ function word use. Hypothetically, LSM should be highest both when interviewer and interviewee are trying to make each other look good and when the two are at loggerheads. Low LSM might mean that the interviewer is trying to find the truth and the interviewee is focused on misdirection, or vice versa. Here’s how each presidential candidate matched with his interviewers (LSM scores in parentheses; 0 is perfectly out of sync, 1 is perfectly matched):

Barack Obama
1. Larry King (CNN host) (0.93)
2. Katie Couric (CBS anchor) (0.92)
3. Bill O’Reilly (conservative FOX News host) (0.90)
4. Michael R. Gorden and Jeff Zeleny (New York Times staff) (0.90)
5. Amanda Griscom Little (staff for Grist, environmental newspaper) (0.89)
6. Terry Moran (ABC News reporter) (0.89)
7. Chicago Sun-Times staff (0.87)
8. Cathleen Falsani (Chicago Sun-Times religious columnist) (0.86)
9. Jeffrey Goldberg (The Atlantic staff) (0.80)
10. Rick Stengel (TIME magazine editor) (0.76)

John McCain
1. Adam Nagourney and Michael Cooper (New York Times staff writers) (0.94)
2. Sean Hannity (conservative FOX reporter) (0.92)
3. Michal Reagan (conservative talk radio host) (0.91)
4. Pittsburg Tribune staff (0.91)
5. Peter Jennings (ABC reporter) (0.91)
6. Larry King (CNN host) (0.91)
7. Military Times staff (US Army newspaper) (0.90)
8. Martin Wisckol (Orange Co. online news reporter) (0.90)
9. Pastor Rick Warren (Evangelical minister) (0.90)
10. Larry Kudlow (economist, conservative CNBC host) (0.89)
11. George Stphanopoulos (liberal ABC reporter) (0.89)
12. Tim Russert (liberal NBC reporter) (0.86)
13. Financial Times (British financial newspaper) (0.85)
What these numbers might mean. On average, Obama matches slightly less than McCain, although both generally are highly synchronized with their interviewers. This could reflect Obama’s tendency to be more cool-headed and distant than McCain. Interestingly, both McCain and Obama matched with interviewers whose opinions were most diametrically opposed to their own as much as they matched with staunch allies. For example, one of Obama’s highest matches was Bill O’Reilly. The O’Reilly interview was not smooth: both often talked over each other and little headway was made by either side. Perhaps O’Reilly represents one of Obama’s rare failures to step back and regain objectivity when faced with conflict. Here’s an illustration from the September 4th interview (arguing about the success of the surge in Iraq):
SEN. OBAMA: … It has gone very well, partly because of the Anbar situation and the Sunni –
MR. O’REILLY: The awakening, right.
SEN. OBAMA: — awakening, partly because the Shi’a –
MR. O’REILLY: But if it were up to you, there wouldn’t have been a surge.
SEN. OBAMA: Well, look –
MR. O’REILLY: No, no, no, no.
SEN. OBAMA: No, no, no, no, no, no, no.
MR. O’REILLY: If it were up to you, there wouldn’t have been a surge.
SEN. OBAMA: No, no, no, no. Hold on.
MR. O’REILLY: You and Joe Biden — no surge.
SEN. OBAMA: No. Hold on a second, Bill.
McCain matched most with his New York Times interviewers, a newspaper frequently cited by conservatives as liberally biased. The New York Times recently officially endorsed Barack Obama for president. McCain also matched very highly (nonsignificantly lower than his most synchronized interview) with Michael Reagan, a radio talk show host who, despite his conservatism, managed to outrage McCain via a telephone interview on January 31st of this year. Here’s an example from that interview:
REAGAN: Senator, Senator, Senator, Senator, Senator…
MCCAIN (talking over Reagan): …well worth talking about as well…
REAGAN: Senator!
MCCAIN: I’m not…
REAGAN: Senator!
MCCAIN: I asked you, Michael, if I could finish, can I finish?
REAGAN: But you did finish–
MCCAIN: Can I Finish? Can I finish? Yes or no?
REAGAN: What else do you have to say?
MCCAIN: Can I finish or not, I mean otherwise…
REAGAN: Go ahead.
Art Graesser, Moongee Jeon, and Zhiqiang Cai, University of Memphis
It is popular these days to analyze the language of candidates. We use language as signals on their persuasive impact, entertainment value, and eventually the votes. This makes sense because language is the window to the thoughts and values of the candidates.
One popular recent approach is to analyze the words used by the candidates. Our colleague James Pennebaker at University of Texas has made the persuasive case that pronouns are important. Another colleague, Jeff Hancock at Cornell University, has made the case that cognitive words signal deception. These are all valid analyses. But the point that we wish to make is that it is also important to dig deeper into the language, into sentence composition and the coherence of the message. It is time to move beyond the word and into deep meaning.
We have recently analyzed the nomination acceptance speeches of candidates to perform deeper computer analyses of language. We used Coh-Metrix, the only computer tool free to the public that analyzes language on sentence composition and discourse coherence (that is, how ideas in sentences connect with other sentences in meaning). Coh-Metrix can be accessed via Google. It was developed at the University of Memphis on a large grant from Institute for Education Sciences to analyze the language and coherence of textbooks (with Danielle McNamara, Art Graesser, and Max Louwerse).
We analyzed the nomination acceptance speeches of the four nominees: Obama, McCain, Biden, and Palin. We selected these speeches because they were all on an even playing field on importance and potential impact on the voters. It is also perfectly obvious that the speeches are products of speech writers. So we don’t know whether the conclusions are products of the nominees or their speech writers. However, it is the candidate that is ultimately responsible for the messages. We used Coh-Metrix to see how they are different. So what did we learn?
Length Matters
There were differences in acceptance speeches on length of the speech and the sentences. The length of all of the candidates was approximately 3500 words, with the two presidential speeches about 50% longer than the vice-presidential nominees. The length of the sentences is an important consideration. Obama was the obvious leader on this dimension. The mean number of words per sentence was approximately 20 words whereas the rest of the pack was about 15 words. We know that the grade level of messages is determined by the length of sentences and the length of words: The greater length of words and sentences translates to a higher grade level (that is, greater difficulty). We found that Obama was the leader in the grade level of the message according to the Flesch-Kincaid scale of readability (the most popular and accepted measure of readability of messages). Obama’s speech had a10th grade level whereas the rest of the pack had a 7-th grade level.
Content Words Don’t Matter Much
Pennebaker made the case that pronouns and function words are important indicators that differed among candidates and that were important. True enough. But what about content words? These are nouns, main verbs, and adjectives. We found that content words did not differ much among candidates. Consider the 4 measures in the figure below – clearly no differences among candidates. There were no differences in the words’ concreteness, imagability, familiarity, and age of acquisition (defined as the age when most people learned the words) according to Coltheart’s MRC Psycholinguistics Database. We suspect that nominees are coached on the words they use so that might explain why there is an even playing field on selection of content words in their speeches.
Sentences Differ Somewhat Among Candidates
Let’s go beyond the words into sentences. We analyzed the syntactic complexity of sentences and the noun-phrase complexity. These are shown below. The syntactic complexity was approximately the same, except that McCain’s speech was a bit lower. Palin’s noun-phrase complexity showed a slight advantage, measured as the number of adjectives that modify the nouns. The “hockey moms” and the “six pack dads” are richer noun-phrases than those of the male candidates.
We found that the Democrats had a higher incidence of questions in their speeches than the Republicans. However, the presidential candidates had a higher incidence of negations. Questions and negations are flags of uncertainty, openness, skepticism, and other dimensions of complexity.
Coherence of the Messages Differ among Candidates
Coh-Metrix analyzes the coherence of messages on dozens of measures. Each measure analyzes the extent to which ideas are connected to each other logically and conceptually. One measure assesses the extent to adjacent sentences have common content words. The other measure assesses the extent to which adjacent sentences are semantically related. The latter measure is based on latent semantic analysis, a statistical computation that is based on hundreds of dimensions of meaning (developed by Landauer, Dumais, and Kintsch). The two presidential candidates had more coherent messages than the vice-presidential messages on these coherence measures.
Conclusions
So what might we conclude from all of this? One conclusion that there is much more going on than words. It is easy to think about words because they are simple, easy to train, and sometimes flashy. However, we live in a complex world of ideas and solutions to complex problems. It is important to also consider levels of language and discourse that move beyond the word and into deeper levels of meaning.
A second conclusion is that the nomination speeches of presidential candidates are a notch above the vice presidents. They are longer and more coherent, perhaps with the coaching of the speech writers. It will be interesting to see how the unprepared discourse segments differ among candidates. This will be our next question, with the assistance of Coh-Metrix.
A third conclusion is that the complexity of Obama’s language tends to rise to the top. The speech length, sentence length, grade level, sentence syntactic complexity, noun-phrase complexity, questions, negations, and coherence were all at the top or among the top-two nominees.
We will continue to analyze the speech of the four nominees. Stay tuned.
Debate 3: McCain and Obama word usage
October 15, 2008
by James W. Pennebaker
The third and final debate produced language patterns that were remarkably similar to the other two debates. As before, McCain was slighly more personal and emotional than Obama. McCain also used more future tense verbs. Obama used words that suggested he was more cognitively complex with longer words and more complicated sentences. In addition, he tended to use more exclusive words and tentative words (e.g., perhaps, maybe) which can also signal looking at the world from different perspectives.
As discussed in a previous blog, we have also found evidence to suggest that McCain and Obama have different thinking styles. Whereas McCain tends to be more categorical in his thinking, Obama is more fluid or contextual in the ways he approaches problems. Categorical thinking involves the use of concrete nouns and their associated articles (a, an, the) and suggests that the person is approaching a problem by breaking it down into its component parts and attempting to put it in meaningful categories. Fluid or contextual thinking involves a higher rate of verbs and associated parts of speech (such as gerunds and adverbs).
There were also a few departures in language use by the two candidates compared to their earlier debates. Obama, for example, used more 1st person singular pronouns than his opponent for the first time in any debate we’ve analyzed. This may be due, in part, to the fact that McCain only used his “my friends” only once. Obama also used more achievement words than McCain which has typically been a reliably high marker for McCain.
Using the LIWC computer program, the differences in language usage between the categories in the third debate were as follows:
| Category | Examples | McCain | Obama | Interpretation |
| Word count |
6596 |
7339 |
Obama talks more | |
| Words per sentence |
13.83 |
18.39 |
Obama longer sentences | |
| Big words (over 6 letters) |
17.77 |
18.72 |
Obama bigger words | |
| Personal pronouns |
10.22 |
9.22 |
McCain more personal in general | |
| 1st person singular | I, me, my |
2.99 |
3.08 |
|
| 1st person plural | We, our |
2.71 |
3.05 |
|
| 2nd person | You, yours |
1.91 |
1.39 |
McCain more pointed |
| 3rd person singular | He, she, her |
1.33 |
0.63 |
McCain more reference to others |
| 3rd person plural | They, them |
1.27 |
1.08 |
|
| Indefinite pronouns | It, those |
6.67 |
7.67 |
Obama more vague |
| Articles | A, the |
6.76 |
6.24 |
McCain more categorical thinking |
| Verbs | Walk, went |
15.74 |
16.65 |
Obama more fluid or contextual |
| Auxiliary verbs | Is, have |
10.29 |
10.40 |
|
| Past tense | Was, gave |
3.35 |
2.68 |
McCain talks about things in the past |
| Present tense | Am, is |
10.01 |
12.06 |
Obama more present oriented |
| Future tense | will |
1.39 |
0.91 |
McCain more future oriented |
| Common adverbs | Very, really |
4.05 |
4.39 |
|
| Prepositions | To, for, of |
13.22 |
13.35 |
|
| Conjunctions | And, or, whereas |
6.55 |
6.21 |
|
| Negations | No, not, never |
1.52 |
1.61 |
|
| Quantifiers | Much, few |
2.59 |
3.08 |
|
| Numbers | Six, 12 |
1.65 |
1.72 |
|
| Social references | Friend, we, talk |
11.75 |
10.19 |
McCain more references to others |
| Overall emotion words | Happy, hurt, kill |
5.43 |
5.01 |
McCain more emotional |
| Positive emotions | Happy, nice |
3.79 |
3.61 |
|
| Negative emotions | Sad, nasty, bad |
1.65 |
1.43 |
|
| Anxiety, fear | Worry, scared |
0.12 |
0.14 |
|
| Anger | Angry, hate |
0.59 |
0.31 |
|
| Sadness | Depressed, cry |
0.32 |
0.27 |
|
| Cognitive mechanisms | Think, should |
17.39 |
17.89 |
|
| Insight | Realize, know |
1.73 |
2.04 |
|
| Causal | Because, reason |
1.43 |
2.13 |
Obama more causal reasoning |
| DIscrepancy | Would,could |
2.11 |
2.00 |
|
| Tentative | Maybe, perhaps |
1.52 |
2.15 |
Obama perspective difference |
| Certainty | Absolute, certainly |
1.46 |
1.50 |
|
| Inhibition | Blocked, stop |
0.64 |
0.60 |
|
| Inclusive words | With, and |
6.78 |
6.13 |
McCain over inclusive |
| Exclusive words | Except, but |
2.24 |
2.47 |
|
| Relativity | Times, going, over |
11.61 |
12.11 |
|
| Motion | Went, fly |
1.65 |
2.02 |
|
| Space | Area, under |
5.81 |
5.78 |
|
| Time | Hour, clock |
3.70 |
4.05 |
|
| Content Categories | ||||
| Work | Job, paycheck |
3.99 |
4.86 |
Obama more references to work |
| Achievement | Try, succeed |
1.82 |
2.58 |
Obama higher in achievement words |
| Leisure | Games, tv |
0.47 |
0.44 |
|
| Home | Garage, yard |
0.53 |
0.41 |
|
| Money | Cash, debt |
3.21 |
3.19 |
|
| Religion | God, church |
0.06 |
0.08 |
|
| Death | Dead, cemetery |
0.09 |
0.01 |
Debate language
October 15, 2008
by James W. Pennebaker
I will try to post the language variables of tonight’s third debate as soon as the transcripts are available. In the meantime, several comments that have been posted in the last 24 hours that point to some misunderstandings:
Speech writers, trainers, and natural language. Some people have noted that we can’t determine if the language used by the candidates reflect their speech writers or the candidates themselves. Indeed, that is why we try to analyze only unscripted language. Debates are particularly good for this. Yes, all the candidates slip into canned phrases with some frequency but, in general, they are likely using more of the words they would naturally use than not.
Some of our other analyses compare the ways candidates talk in one-on-one interviews with debates as well. In general, candidates are fairly consistent in the ways they use words across these contexts. Obama and Biden tend to be more consistent than McCain and Palin but the differences are not striking. See the previous posts by Molly Ireland on this topic.
First person singular pronouns: I versus my versus me. The use of first person singular is psychologically fascinating. When people are engaged in everyday normal conversations, they use the word “I” at quite high rates (about 6% of all words) compared with “me” (about 0.5%) or “my” (about 0.7%). A couple of people have noted that McCain obviously uses first person singular pronouns at such high rates because of his use of “my friends”.
It’s true that McCain has been using “my” at higher rates across the two debates than Obama but he has also been using “I” at these elevated frequencies as well. This has been true for both candidates for interviews and debates for the entire election season. The average pronoun use for the two candidates across the first two debates (as a percentage of total word usage) is as follows:
| Candidate |
“I” |
“me” |
“my” |
| McCain |
2.67 |
0.19 |
0.48 |
| Obama |
1.86 |
0.14 |
0.15 |
It should be noted that the relative rates of all the pronouns were virtually identical from the first to the second debate except for McCain’s higher use of “my” in the second debate (0.20 in Debate 1 and 0.71 in the second).
What does it mean if a person uses high rates of I versus me? One of the founders of modern psychology, William James, made the strong assertion that the use of “I” implied a self in control whereas the use of “me” suggested the self was being acted upon by others. This, of course, makes perfect sense. Empirically, it’s probably wrong. People who are depressed, lower in status, and lower in self-esteem consistently use “I” at higher (not lower) rates than non-depressed, high status, and self-assured people. Ironically, the use of “me” doesn’t seem to be related to any of these qualities — or any qualities that we have studied so far.
It would be misleading to think that the use of “I” always signals depression and low self-esteem. People who use I tend to be more honest and are often more socially sensitive. They are more likely to say “I think it’s cold outside” instead of “It’s cold outside.” Saying phrases such as “I think”, “I believe”, etc are subtly indicating that they are aware that other perspectives exist and that theirs is only one of many.
The Meaning of Words: Obama versus McCain
October 12, 2008
By James W. Pennebaker
What does it mean when the candidates use language differently? Here and elsewhere, language experts have argued that word use can be associated with electability, sociability, and thinking patterns of the candidates. Although it is tempting to use text analysis programs to predict who will win, one should be wary. Perhaps a safer bet is to use language markers as correlates of people’s social, cognitive, and personality styles. That is, we should be thinking beyond electability and towards possible governing styles.
Predicting who will win. Over the years, several research teams have found that the degree to which candidates express optimism and positive emotion is linked to electability. We have found this as well. Bill Clinton and George W. Bush used more positive emotion words and future tense verbs than any of their rivals in the presidential debates and interviews. No other language dimensions have predicted voter preferences as well.
This year, in the primaries, John McCain was the most optimistic whereas Hilary Clinton was consistently more positive than Barack Obama. Since the conventions, McCain has continued to be far more positive in his language use than Obama. Interestingly, McCain has expressed more negative emotion as well.
Optimism may have its limits. Unlike virtually every other presidential election in memory, the 2008 contest is taking place in a highly threatening, anxiety-provoking economic time. A less emotional orientation could well be more appealing than it has been in the past.
Predicting how they will govern. Most language dimensions that we study are probably better markers of how people will lead than who will vote for them. Some dimensions that are relevant include:
Cognitive complexity. A particularly reliable marker of cognitive complexity is the exclusive word dimension. Exclusive words such as but, except, without, exclude, signal that the speaker is making an effort to distinguish what is in a category and not in a category. Those who use more exclusive words make better grades in college, are more honest in lab studies, and have more nuanced understanding of events and people. Through the primaries until now, Obama has consistently been the highest in exclusive word use and McCain the lowest.
Categorical versus fluid thinking. Some people naturally approach problems by assigning them to categories. Categorical thinking involves the use of articles (a, an, the) and concrete nouns. Men, for example, use articles at much higher rates than women. Fluid thinking involves describing actions and changes, often in more abstract ways. A crude measure of fluid thinking is the use of verbs. Women use verbs more than men.
McCain and Obama could not be more different in their use of articles and verbs. McCain uses verbs at an extremely low rate and articles at a fairly high rate. Obama, on the other hand, is remarkably high in his use of verbs and low in his use of articles. These patterns suggest that McCain’s natural way of understanding the world is to first label the problem and find a way to put it into a pre-existing category. Obama is more likely to define the world as ongoing actions or processes.
Personal and socially connected. Individuals who think about and try to connect with others tend to use more personal pronouns (I, we, you, she, they) than those who are more socially detached. Bush was higher than Kerry or Gore. McCain has consistently been much higher than any other candidate in this election cycle. His use of 1st person singular (I, me, my) is particularly high which often signals an openness and honesty. Obama uses personal pronouns at moderate levels – similar to Hillary Clinton and most other primary candidates of both parties.
Restrained versus impulsive. People vary in the degree to which they act quickly or shoot from the hip versus stand back and consider their options. Over the last few years, some have argued that the use of negations (e.g., no, not, never) indicate a sign of inhibition or constraint. Low use of negations may be linked to impulsiveness. Bush was low in negations whereas Kerry was quite high. Across the election cycle, Obama has consistently been the highest user of negations – suggesting a restrained approach – where as McCain has been the lowest – a more impulsive way of dealing with the world.
The limits of text analysis. Language use is highly dependent on context. Most people use far more personal pronouns when talking with friends than when giving a formal speech. Nationally televised debates, interviews, and stump speeches are highly unusual language settings. To the degree that the candidates are using their own words, they provide us a glimpse of the ways they are thinking and dealing with their worlds.
Because every election is different, it is important to weigh the cultural context of the time. During economic upheavals, wartime, or other unusual periods, normally-abnormal speaking patterns may now be quite normal and vice versa. One can imagine a number of new methodologies growing from this election. For example, it might be instructive to use natural language use in blogs or the media as an indicator of cultural context by which to compare future candidates’ word use.
Finally, no one should take any text analysis expert’s opinions too seriously. The art of computer-based language analysis is in its infancy. We are better than tea-leaf readers but probably not much.
Debate 2: Obama vs McCain
October 7, 2008
Personality and language are consistent across time and context. The ways that Obama and McCain used language in the second debate were remarkably similar to the ways they used language in the first debate. The bottom line continues to be that McCain is more socially connected, impulsive, and emotionally honest whereas Obama comes across as smarter, more cognitively complex, and more emotionally calm and detached.
Using the LIWC computer program, the numbers that came up are:
| Category | Examples | Obama | McCain | Interpretation | |
| Word count | 7111 | 6520 | * | Obama talks more | |
| Words per sentence | 17.78 | 14.78 | * | Obama longer sentences | |
| Big words (over 6 letters) | 17.17 | 17.88 | |||
| Personal pronouns | 9.60 | 11.15 | * | McCain more personal in general | |
| 1st person singular | I, me, my | 2.03 | 3.24 | ** | McCain more personal |
| 1st person plural | We, our | 4.11 | 4.02 | ||
| 2nd person | You, yours | 1.95 | 1.83 | ||
| 3rd person singular | He, she, her | 0.56 | 0.77 | ||
| 3rd person plural | They, them | 0.96 | 1.30 | ||
| Indefinite pronouns | It, those | 7.88 | 6.72 | * | Obama more vague |
| Articles | A, the | 6.17 | 5.98 | ||
| Verbs | Walk, went | 17.45 | 17.07 | ||
| Auxiliary verbs | Is, have | 11.08 | 10.66 | ||
| Past tense | Was, gave | 3.29 | 2.75 | ||
| Present tense | Am, is | 12.36 | 11.89 | ||
| Future tense | will | 0.84 | 1.50 | * | McCain more future oriented |
| Common adverbs | Very, really | 4.67 | 3.63 | * | Obama more “flowery” |
| Prepositions | To, for, of | 13.80 | 13.28 | ||
| Conjunctions | And, or, whereas | 7.21 | 7.56 | ||
| Negations | No, not, never | 1.74 | 1.35 | * | Obama censoring himself; McCain more impulsive |
| Quantifiers | Much, few | 2.59 | 2.78 | ||
| Numbers | Six, 12 | 1.95 | 1.23 | ||
| Social references | Friend, we, talk | 11.18 | 11.78 | ||
| Overall emotion words | Happy, hurt, kill | 5.20 | 6.33 | * | McCain more more emotional |
| Positive emotions | Happy, nice | 3.78 | 4.49 | * | McCain more positive |
| Negative emotions | Sad, nasty, bad | 1.48 | 1.99 | * | McCain more negative |
| Anxiety, fear | Worry, scared | 0.21 | 0.31 | ||
| Anger | Angry, hate | 0.53 | 0.84 | ||
| Sadness | Depressed, cry | 0.17 | 0.18 | ||
| Cognitive mechanisms | Think, should | 18.39 | 19.49 | * | McCain more social thinking |
| Insight | Realize, know | 2.01 | 2.27 | ||
| Causal | Because, reason | 2.48 | 1.67 | * | Obama more causal reasoning |
| DIscrepancy | Would,could | 1.73 | 1.47 | ||
| Tentative | Maybe, perhaps | 2.21 | 2.12 | ||
| Certainty | Absolute, certainly | 1.43 | 1.66 | ||
| Inhibition | Blocked, stop | 0.75 | 0.94 | ||
| Inclusive words | With, and | 6.37 | 7.81 | ** | McCain over inclusive |
| Exclusive words | Except, but | 2.88 | 2.29 | * | Obama more cognitively complex |
| Relativity | Times, going, over | 13.02 | 12.01 | * | |
| Motion | Went, fly | 2.52 | 2.10 | ||
| Space | Area, under | 6.09 | 5.92 | ||
| Time | Hour, clock | 4.15 | 3.96 | ||
| Content Categories | |||||
| Work | Job, paycheck | 3.68 | 3.44 | ||
| Achievement | Try, succeed | 2.94 | 2.79 | ||
| Leisure | Games, tv | 0.38 | 0.25 | ||
| Home | Garage, yard | 0.32 | 0.48 | ||
| Money | Cash, debt | 2.66 | 2.15 | ||
| Religion | God, church | 0.11 | 0.05 | ||
| Death | Dead, cemetery | 0.18 | 0.18 |
The numbers refer to the percentage of total words. So, for example, 3.24% of all of McCain’s words were 1st person singular pronouns. These numbers were generated by the LIWC text analysis program (see www.liwc.net).
by Molly Ireland
Over the last couple of weeks, I’ve been analyzing the way the candidates speak in interviews and speeches using our text analysis program, LIWC. Now that the individual analyses are finished, here’s a summary of how the candidates differ from their running mates and the opposing team.
Averaged across interviews and speeches, McCain and Palin are less linguistically in sync than Obama and Biden, with 19 significant differences between the Republicans and 13 differences between the Democrats. As McCain pointed out, “What do you expect of two mavericks, to agree on everything? Eh?”
Looking at the candidates’ language from a broader perspective, there are 16 major differences between the Democrats and Republicans. In the table below you can see how the two tickets differ (word categories in each column were used significantly more by those candidates):
“I” and “we.” All four nominees come across as high status and somewhat distant, using “I” at relatively low rates (about 3%) and “we” at relatively high rates (also about 3%). The Republicans appear more personable than the Democrats, using more I-words on average. In terms of “I” use in interviews and speeches, Biden is the least approachable and McCain is the most. In terms of “we” use, Palin sounds more like Joe Six-Pack than the other candidates in speeches. She comes across as colder (more “we”) than the others in interviews, however. Obama, Biden, and McCain all use “we” at about the same rate.
Conjunctions and negations. Negations and conjunction are both markers of self-restraint: negations indicate self-control and inhibition, and a high number of conjunctions is a hallmark of rambling. Palin appears to be the least restrained of the four candidates, linguistically and otherwise. She uses far more conjunctions and far fewer negations than the others, particularly in interviews. Biden, on the other hand, clearly likes to talk, but his language actually shows the most restraint: he uses the fewest conjunctions and the most negations. He says a lot, but he structures his sentences normally and is relatively self-controlled.
Cognitive mechanisms. Palin’s thought processes seem to be the least complicated of the four: she uses fewer words that refer to cognitive processes (insight, cause and effect relationships, inclusion, exclusion, and so on) than the other nominees. The only cognitive words she uses frequently are inclusive words (plus, and, with), which, like conjunctions, can indicate rambling. She uses particularly few inhibition and exclusive words. Using few exclusive words sometimes indicates dishonesty. Obama uses the most exclusive words of the four nominees, although the differences between Obama, Biden, and McCain are subtle.
Emotions. The Republican ticket is generally sunnier than the Democratic team (more positive emotion words, fewer mentions of negative emotions). McCain uses the most positive emotion words (happy, excited) of the four nominees. He also talks about sadness the most, however. Palin is the most unilaterally cheerful candidate: she talks about positive emotions more than both Democrats and refers to anxiety, anger, and sadness less than McCain, Obama, and Biden. Biden’s language is somewhat gloomy: he talks about positive emotions the least and negative emotions the most. Obama talks the most about anxiety, Biden language is the angriest.
Achievement. McCain appears to be more ambitious and focused on success than either Palin or his opponents. He uses words that refer to need for achievement (failure, win, success) much more than the others. In speeches, 4% of his words have to do with need for achievement. That’s very high. (Compare with Eliot Spitzer’s achievement language in his resignation speech.) Obama and Palin refer to achievement least often, and Biden is somewhere in the middle. All candidates used more achievement words in speeches. Similarly, McCain talks about money the most, nearly three times as much as Biden and Palin, while Obama talks about money a moderate amount.
The VP Debate: Biden vs Palin
October 2, 2008
It’s official. Biden and Palin speak differently — but not in the ways many people think. Biden uses language in a way that suggests he is more personal, honest (higher rates of I), and socially engaged (third person pronouns) whereas Palin is surprisingly emotionally distant ( more “we” words). Palin uses more positive emotion words more than Biden but doesn’t differ from him in the use of negative emotions.
The most striking differences appeared for a variety of cognitive dimensions. As a thinker, Biden proved to be more specific and concrete (higher use of articles and nouns), concerned with specific numbers, and showed signs of being more cognitively complex in talking about issues (exclusive words). Palin, on the other hand, proved to be someone who thinks more about the perspectives of others — especially her audience. Through her use of cognitive mechanism words (e.g., words like realize, think, believe), she was subtly acknowledging that there were different answers or approaches to the issues. She is not the narrow-minded true believer many of her critics were hoping to see. Not surprisingly, Palin was not as crisp or polished in her thinking and in her answers as Biden. The best evidence for this was her complicated sentence structure (prepositions) and high rates of conjuctions and inclusive words — general markers of rambling and vagueness.
More to follow. But in the meantime, here are the numbers:
|
Category |
Examples |
Biden |
Palin |
|
Interpretation |
|
Word count |
|
7372 |
7741 |
* |
Palin talks more |
|
Words per sentence |
|
16.13 |
19.50 |
** |
Palin longer sentences |
|
Big words (over 6 letters) |
|
16.05 |
16.81 |
|
Palin bigger words |
|
Personal pronouns |
|
9.06 |
8.68 |
|
|
|
1st person singular |
I, me, my |
3.02 |
2.31 |
* |
Biden more personal |
|
1st person plural |
We, our |
2.18 |
3.51 |
** |
Palin more formal, distant |
|
2nd person |
You, yours |
1.25 |
1.51 |
|
Palin more aggressive, pointed |
|
3rd person singular |
He, she, her |
1.34 |
0.89 |
* |
Biden more social |
|
3rd person plural |
They, them |
1.26 |
0.45 |
* |
Biden more social |
|
Indefinite pronouns |
It, those |
6.12 |
7.76 |
** |
Palin more vague |
|
Articles |
A, the |
7.21 |
5.65 |
** |
Biden more concrete, less abstract |
|
Verbs |
Walk, went |
16.02 |
15.22 |
|
Biden more dynamic |
|
Auxiliary verbs |
Is, have |
10.46 |
10.09 |
|
|
|
Past tense |
Was, gave |
3.89 |
3.02 |
|
|
|
Present tense |
Am, is |
9.63 |
9.97 |
|
|
|
Future tense |
will |
1.28 |
0.98 |
|
|
|
Common adverbs |
Very, really |
4.07 |
6.19 |
** |
Palin more “flowery” |
|
Prepositions |
To, for, of |
13.51 |
14.66 |
* |
Palin more detailed |
|
Conjunctions |
And, or, whereas |
5.64 |
8.16 |
** |
Palin more extended sentences |
|
Negations |
No, not, never |
2.20 |
1.51 |
* |
Biden censoring himself |
|
Quantifiers |
Much, few |
2.22 |
2.61 |
|
|
|
Numbers |
Six, 12 |
2.43 |
0.92 |
** |
Biden more specific |
|
Social references |
Friend, we, talk |
10.66 |
10.75 |
|
|
|
Overall emotion words |
Happy, hurt, kill |
4.69 |
5.58 |
|
|
|
Positive emotions |
Happy, nice |
3.35 |
4.25 |
* |
Palin more positive |
|
Negative emotions |
Sad, nasty, bad |
1.45 |
1.34 |
|
|
|
Anxiety, fear |
Worry, scared |
0.12 |
0.13 |
|
|
|
Anger |
Angry, hate |
0.64 |
0.78 |
|
|
|
Sadness |
Depressed, cry |
0.15 |
0.14 |
|
|
|
Cognitive mechanisms |
Think, should |
16.26 |
18.41 |
** |
Palin more social thinking |
|
Insight |
Realize, know |
1.55 |
1.87 |
|
|
|
Causal |
Because, reason |
2.02 |
2.07 |
|
|
|
DIscrepancy |
Would,could |
1.56 |
1.69 |
|
|
|
Tentative |
Maybe, perhaps |
1.75 |
1.38 |
|
|
|
Certainty |
Absolute, certainly |
1.84 |
1.49 |
|
|
|
Inhibition |
Blocked, stop |
0.56 |
0.44 |
|
|
|
Inclusive words |
With, and |
5.41 |
7.67 |
** |
Palin over inclusive |
|
Exclusive words |
Except, but |
2.62 |
2.36 |
* |
Biden more cognitively complex |
|
Relativity |
Times, going, over |
12.09 |
12.58 |
|
|
|
Motion |
Went, fly |
2.16 |
2.20 |
|
|
|
Space |
Area, under |
5.83 |
6.72 |
|
|
|
Time |
Hour, clock |
3.95 |
3.46 |
|
|
|
Content Categories |
|
|
|
|
|
|
Work |
Job, paycheck |
3.54 |
3.95 |
* |
Palin focuses on work and jobs |
|
Achievement |
Try, succeed |
2.03 |
2.79 |
* |
Palin higher in achievement motives |
|
Leisure |
Games, tv |
0.20 |
0.68 |
|
|
|
Home |
Garage, yard |
0.53 |
0.37 |
|
|
|
Money |
Cash, debt |
2.18 |
1.89 |
|
|
|
Religion |
God, church |
0.20 |
0.09 |
|
|
|
Death |
Dead, cemetery |
0.39 |
0.25 |
|
|
The numbers represent the percentage of total words that were spoken by the candidate. So, for example, 2.79% of all the words used by Sarah Palin were associated with achievement (words like try, succeed, win) compared with 2.03% of Biden’s words. These numbers were generated by the computerized text analysis program LIWC, or Linguistic Inquiry and Word Count (www.liwc.net).
James W. Pennebaker












