Language of the Media — I
November 2, 2008
by Vera Vine and James W. Pennebaker
An important part of the 2008 election is the language of the mainstream media. Accusations of media bias fly from both sides of the aisle, including the supposed deep-seated liberal (or, sometimes, conservative) bias of television and newspaper reporting. But without a concrete metric for assessing media bias, most arguments about it often descend into partisan maneuvering. Our text analysis software program, Language Inquiry and Word Count (LIWC; see liwc.net), can help to quantify some of the media’s language. We focused on three major newspapers, the New York Times, the Washington Post, and the Wall Street Journal.
What we did:
Overall, 138 news reports were collected, comprising 46 topics covered by each of three newspapers, The New York Times (NYT), The Wall Street Journal (WSJ), and the Washington Post (WP), spanning the period beginning with the formation of the first presidential ticket on August 22, 2008, through the launching of the final week of campaigning on October 27, 2008. These newspapers were chosen because of their independence (each is owned by a different company), large readership, and reputations for influential and exemplary reporting. As of November 2, Obama has been endorsed by the NYT and WP; the WSJ has not endorsed anyone – although it has a conservative reputation.
To make comparison possible, news reports were selected so that each news story had counterparts with identical topics and similar dates to the other two newspapers. Thirteen articles from each paper were about Barack Obama’s campaign, thirteen were about John McCain’s campaign, eleven covered the U.S. economic crisis, and nine covered general election news concerning both parties equally (e.g., debates, shifts in polls).
What we found: Comparing the campaign coverage within each newspaper:
The New York Times:
The NYT articles about the McCain campaign were longer than those about the Obama campaign (on average almost 250 words longer). Pronoun use also differed: the NYT used significantly more impersonal pronouns when covering the McCain campaign, and more “you” when covering the Obama campaign.
The Washington Post:
The WP used shorter sentences when covering the Obama campaign than they did with the McCain campaign. When covering Obama, the WP also used more personal pronouns, particularly “I” and “you,” and more verbs. The WP’s coverage of the Obama campaign is also nearly significantly higher on the index of “immediacy,” a factor thought to indicate informal style (Pennebaker & King, 1999).
The Wall Street Journal:
The WSJ had the fewest differences between coverage of the two campaigns. Although no differences reached the level of significance, some trends suggest that the WSJ’s language when covering McCain’s campaign contain more negations, more anxiety words, more certainty words (“absolute,” “certainly”), and more exclusive words (“except,” “but”).
Taken together, these results suggest that the WSJ may actually be less biased than the NYT and WP in their political news reporting, despite a more conservative reputation. These results may be consistent with another study of media bias conducted by a group of political scientists (Groseclose, T. & Milyo, J. (2005). A Measure of Media Bias. The Quarterly Journal of Economics, 120, 1191-1237).
Comparing the mentions of candidates’ names:
Not unexpectedly, news stories in all papers said “Obama” and “Biden” more when reporting on the Obama campaign, and “Palin” more when reporting on McCain’s. What is somewhat surprising is that the newspapers referenced McCain much more freely, regardless of which campaign was the focus of the news report, which might suggest a preoccupation with McCain, or a tendency to consider news about Obama in light of McCain’s activities.
As for mentions of George W. Bush, considered by many to be the specter haunting this election, there was a trend suggesting higher rates of use of “Bush” when covering Barack Obama’s campaign, but only in the NYT. Obama sought to link McCain to the Bush administration, so perhaps the NYT has more coverage of these Obama talking points than the other newspapers do. Or perhaps this difference suggests that McCain’s attempts to distance himself from Bush may have been somewhat successful.
Comparing the newspapers with each other:
Despite the differences in language between the coverage of the campaigns, the overall styles of the three newspapers were fairly similar when news reports on all 4 topics were taken together. When the language did differ, it tended to be in the expected directions based on the papers’ respective areas of expertise. For example, the WSJ articles were the least personal in their writing style, using fewer social words and more quantifiers (e.g., “much,” “fewer”) and impersonal pronouns (“it,” “that,” “those”). The WSJ language also included shorter sentences, fewer function words (i.e., non-content words including pronouns, prepositions, and particles), less use of “we” and “they,” fewer verbs of almost all types, fewer exclusive words (such as “except,” “but”), and fewer cognitive mechanism words (“think,” “know”).
Long considered the “writers’ newspaper,” the WP used longer sentences, more “we,” more present tense and less past tense, fewer quantifiers, and somewhat fewer cognitive mechanism words.
The take-away:
The emotional tone of the coverage of the two candidates was surprisingly even handed across all three newspapers. There was a weak trend suggesting a more personal tone in reporting on Obama’s campaign by the WP and NYT.
The next step will be to tease apart the linguistic styles of the reporters. For example, does the more personal and dynamic quality of reporting on Obama come from the language the reporters bring to the table, or from the oratorical style of things Obama is quoted as saying? This is ultimately the dilemma in understanding any translation: Is the message an accurate account of the original speaker or does it reflect the psychological makeup of the translator?
by Molly Ireland
Most of us can probably recall times when we felt powerfully in sync with a person during a conversation, for better or worse. While in friendly situations synchrony often translates to simultaneous laughter and increased rapport, in less friendly contexts synchrony might take the form of synchronized suspicion and mutual outrage that the other person refuses to bend to our will.
Language Style Matching. In our lab at the University of Texas at Austin, we’ve been studying a specific kind of verbal synchrony which we call Language Style Matching, or LSM. Style matching is measured by comparing the way two sides of a conversation or two texts use function words, like pronouns, articles, and conjunctions. If two people use similar proportions of, for example, personal pronouns, then their LSM score in that individual LIWC category (see liwc.net) will be high. The comprehensive LSM metric we use to assess overall synchrony is the average of nine function word categories’ matching scores. As other entries on this site discuss, the way a person uses function words is often the key to predicting what they’re thinking and feeling at the time and how they are likely to behave in the future. So if LSM for a conversation is high, then odds are good that everybody’s in the same mindset – perhaps even if they don’t agree with or like each other.
Lab studies. In collaboration with Amy Gonzales and Jeff Hancock at Cornell, we found that in certain cooperative settings LSM is positively correlated with how much group members like each other and can help predict how well a group performs on a task. Recently we analyzed language in a more competitive setting, in transcripts from negotiation studies conducted at the University of Chicago by UT’s Marlone Henderson. Preliminary evidence indicates that LSM is not always a good thing. Higher LSM predicts poorer overall negotiation outcomes for participants who have been experimentally manipulated to approach the negotiation less objectively. For people with objective distance from the negotiation, LSM didn’t predict performance. (In general, objectivity leads to better performance, and being too close to an issue or a negotiation partner leads to poorer negotiation outcomes.)
As with all of the language research discussed here and elsewhere, it’s probably a bad idea to jump to conclusions about what, taken together, these two sets of findings mean for style matching that occurs in real life. But we do know that LSM is a reliable measure of function word synchrony, and we know that function words are themselves reliable predictors of psychology and behavior both in and out of experimental labs. Beyond that, we can safely guess that while style matching often leads to rapport, it can also indicate mutual stubbornness or distrust that, ironically, makes it harder to find common ground. And, more speculatively, to the degree that we can control when and how we style match, good communicators probably know when to follow another’s conversational lead and when to step out of sync.
The Candidates. Using online transcripts from news media websites, I looked at how the presidential candidates match their interviewers’ function word use. Hypothetically, LSM should be highest both when interviewer and interviewee are trying to make each other look good and when the two are at loggerheads. Low LSM might mean that the interviewer is trying to find the truth and the interviewee is focused on misdirection, or vice versa. Here’s how each presidential candidate matched with his interviewers (LSM scores in parentheses; 0 is perfectly out of sync, 1 is perfectly matched):

Barack Obama
1. Larry King (CNN host) (0.93)
2. Katie Couric (CBS anchor) (0.92)
3. Bill O’Reilly (conservative FOX News host) (0.90)
4. Michael R. Gorden and Jeff Zeleny (New York Times staff) (0.90)
5. Amanda Griscom Little (staff for Grist, environmental newspaper) (0.89)
6. Terry Moran (ABC News reporter) (0.89)
7. Chicago Sun-Times staff (0.87)
8. Cathleen Falsani (Chicago Sun-Times religious columnist) (0.86)
9. Jeffrey Goldberg (The Atlantic staff) (0.80)
10. Rick Stengel (TIME magazine editor) (0.76)

John McCain
1. Adam Nagourney and Michael Cooper (New York Times staff writers) (0.94)
2. Sean Hannity (conservative FOX reporter) (0.92)
3. Michal Reagan (conservative talk radio host) (0.91)
4. Pittsburg Tribune staff (0.91)
5. Peter Jennings (ABC reporter) (0.91)
6. Larry King (CNN host) (0.91)
7. Military Times staff (US Army newspaper) (0.90)
8. Martin Wisckol (Orange Co. online news reporter) (0.90)
9. Pastor Rick Warren (Evangelical minister) (0.90)
10. Larry Kudlow (economist, conservative CNBC host) (0.89)
11. George Stphanopoulos (liberal ABC reporter) (0.89)
12. Tim Russert (liberal NBC reporter) (0.86)
13. Financial Times (British financial newspaper) (0.85)
What these numbers might mean. On average, Obama matches slightly less than McCain, although both generally are highly synchronized with their interviewers. This could reflect Obama’s tendency to be more cool-headed and distant than McCain. Interestingly, both McCain and Obama matched with interviewers whose opinions were most diametrically opposed to their own as much as they matched with staunch allies. For example, one of Obama’s highest matches was Bill O’Reilly. The O’Reilly interview was not smooth: both often talked over each other and little headway was made by either side. Perhaps O’Reilly represents one of Obama’s rare failures to step back and regain objectivity when faced with conflict. Here’s an illustration from the September 4th interview (arguing about the success of the surge in Iraq):
SEN. OBAMA: … It has gone very well, partly because of the Anbar situation and the Sunni –
MR. O’REILLY: The awakening, right.
SEN. OBAMA: — awakening, partly because the Shi’a –
MR. O’REILLY: But if it were up to you, there wouldn’t have been a surge.
SEN. OBAMA: Well, look –
MR. O’REILLY: No, no, no, no.
SEN. OBAMA: No, no, no, no, no, no, no.
MR. O’REILLY: If it were up to you, there wouldn’t have been a surge.
SEN. OBAMA: No, no, no, no. Hold on.
MR. O’REILLY: You and Joe Biden — no surge.
SEN. OBAMA: No. Hold on a second, Bill.
McCain matched most with his New York Times interviewers, a newspaper frequently cited by conservatives as liberally biased. The New York Times recently officially endorsed Barack Obama for president. McCain also matched very highly (nonsignificantly lower than his most synchronized interview) with Michael Reagan, a radio talk show host who, despite his conservatism, managed to outrage McCain via a telephone interview on January 31st of this year. Here’s an example from that interview:
REAGAN: Senator, Senator, Senator, Senator, Senator…
MCCAIN (talking over Reagan): …well worth talking about as well…
REAGAN: Senator!
MCCAIN: I’m not…
REAGAN: Senator!
MCCAIN: I asked you, Michael, if I could finish, can I finish?
REAGAN: But you did finish–
MCCAIN: Can I Finish? Can I finish? Yes or no?
REAGAN: What else do you have to say?
MCCAIN: Can I finish or not, I mean otherwise…
REAGAN: Go ahead.
Art Graesser, Moongee Jeon, and Zhiqiang Cai, University of Memphis
It is popular these days to analyze the language of candidates. We use language as signals on their persuasive impact, entertainment value, and eventually the votes. This makes sense because language is the window to the thoughts and values of the candidates.
One popular recent approach is to analyze the words used by the candidates. Our colleague James Pennebaker at University of Texas has made the persuasive case that pronouns are important. Another colleague, Jeff Hancock at Cornell University, has made the case that cognitive words signal deception. These are all valid analyses. But the point that we wish to make is that it is also important to dig deeper into the language, into sentence composition and the coherence of the message. It is time to move beyond the word and into deep meaning.
We have recently analyzed the nomination acceptance speeches of candidates to perform deeper computer analyses of language. We used Coh-Metrix, the only computer tool free to the public that analyzes language on sentence composition and discourse coherence (that is, how ideas in sentences connect with other sentences in meaning). Coh-Metrix can be accessed via Google. It was developed at the University of Memphis on a large grant from Institute for Education Sciences to analyze the language and coherence of textbooks (with Danielle McNamara, Art Graesser, and Max Louwerse).
We analyzed the nomination acceptance speeches of the four nominees: Obama, McCain, Biden, and Palin. We selected these speeches because they were all on an even playing field on importance and potential impact on the voters. It is also perfectly obvious that the speeches are products of speech writers. So we don’t know whether the conclusions are products of the nominees or their speech writers. However, it is the candidate that is ultimately responsible for the messages. We used Coh-Metrix to see how they are different. So what did we learn?
Length Matters
There were differences in acceptance speeches on length of the speech and the sentences. The length of all of the candidates was approximately 3500 words, with the two presidential speeches about 50% longer than the vice-presidential nominees. The length of the sentences is an important consideration. Obama was the obvious leader on this dimension. The mean number of words per sentence was approximately 20 words whereas the rest of the pack was about 15 words. We know that the grade level of messages is determined by the length of sentences and the length of words: The greater length of words and sentences translates to a higher grade level (that is, greater difficulty). We found that Obama was the leader in the grade level of the message according to the Flesch-Kincaid scale of readability (the most popular and accepted measure of readability of messages). Obama’s speech had a10th grade level whereas the rest of the pack had a 7-th grade level.
Content Words Don’t Matter Much
Pennebaker made the case that pronouns and function words are important indicators that differed among candidates and that were important. True enough. But what about content words? These are nouns, main verbs, and adjectives. We found that content words did not differ much among candidates. Consider the 4 measures in the figure below – clearly no differences among candidates. There were no differences in the words’ concreteness, imagability, familiarity, and age of acquisition (defined as the age when most people learned the words) according to Coltheart’s MRC Psycholinguistics Database. We suspect that nominees are coached on the words they use so that might explain why there is an even playing field on selection of content words in their speeches.
Sentences Differ Somewhat Among Candidates
Let’s go beyond the words into sentences. We analyzed the syntactic complexity of sentences and the noun-phrase complexity. These are shown below. The syntactic complexity was approximately the same, except that McCain’s speech was a bit lower. Palin’s noun-phrase complexity showed a slight advantage, measured as the number of adjectives that modify the nouns. The “hockey moms” and the “six pack dads” are richer noun-phrases than those of the male candidates.
We found that the Democrats had a higher incidence of questions in their speeches than the Republicans. However, the presidential candidates had a higher incidence of negations. Questions and negations are flags of uncertainty, openness, skepticism, and other dimensions of complexity.
Coherence of the Messages Differ among Candidates
Coh-Metrix analyzes the coherence of messages on dozens of measures. Each measure analyzes the extent to which ideas are connected to each other logically and conceptually. One measure assesses the extent to adjacent sentences have common content words. The other measure assesses the extent to which adjacent sentences are semantically related. The latter measure is based on latent semantic analysis, a statistical computation that is based on hundreds of dimensions of meaning (developed by Landauer, Dumais, and Kintsch). The two presidential candidates had more coherent messages than the vice-presidential messages on these coherence measures.
Conclusions
So what might we conclude from all of this? One conclusion that there is much more going on than words. It is easy to think about words because they are simple, easy to train, and sometimes flashy. However, we live in a complex world of ideas and solutions to complex problems. It is important to also consider levels of language and discourse that move beyond the word and into deeper levels of meaning.
A second conclusion is that the nomination speeches of presidential candidates are a notch above the vice presidents. They are longer and more coherent, perhaps with the coaching of the speech writers. It will be interesting to see how the unprepared discourse segments differ among candidates. This will be our next question, with the assistance of Coh-Metrix.
A third conclusion is that the complexity of Obama’s language tends to rise to the top. The speech length, sentence length, grade level, sentence syntactic complexity, noun-phrase complexity, questions, negations, and coherence were all at the top or among the top-two nominees.
We will continue to analyze the speech of the four nominees. Stay tuned.
Debate 3: McCain and Obama word usage
October 15, 2008
by James W. Pennebaker
The third and final debate produced language patterns that were remarkably similar to the other two debates. As before, McCain was slighly more personal and emotional than Obama. McCain also used more future tense verbs. Obama used words that suggested he was more cognitively complex with longer words and more complicated sentences. In addition, he tended to use more exclusive words and tentative words (e.g., perhaps, maybe) which can also signal looking at the world from different perspectives.
As discussed in a previous blog, we have also found evidence to suggest that McCain and Obama have different thinking styles. Whereas McCain tends to be more categorical in his thinking, Obama is more fluid or contextual in the ways he approaches problems. Categorical thinking involves the use of concrete nouns and their associated articles (a, an, the) and suggests that the person is approaching a problem by breaking it down into its component parts and attempting to put it in meaningful categories. Fluid or contextual thinking involves a higher rate of verbs and associated parts of speech (such as gerunds and adverbs).
There were also a few departures in language use by the two candidates compared to their earlier debates. Obama, for example, used more 1st person singular pronouns than his opponent for the first time in any debate we’ve analyzed. This may be due, in part, to the fact that McCain only used his “my friends” only once. Obama also used more achievement words than McCain which has typically been a reliably high marker for McCain.
Using the LIWC computer program, the differences in language usage between the categories in the third debate were as follows:
| Category | Examples | McCain | Obama | Interpretation |
| Word count |
6596 |
7339 |
Obama talks more | |
| Words per sentence |
13.83 |
18.39 |
Obama longer sentences | |
| Big words (over 6 letters) |
17.77 |
18.72 |
Obama bigger words | |
| Personal pronouns |
10.22 |
9.22 |
McCain more personal in general | |
| 1st person singular | I, me, my |
2.99 |
3.08 |
|
| 1st person plural | We, our |
2.71 |
3.05 |
|
| 2nd person | You, yours |
1.91 |
1.39 |
McCain more pointed |
| 3rd person singular | He, she, her |
1.33 |
0.63 |
McCain more reference to others |
| 3rd person plural | They, them |
1.27 |
1.08 |
|
| Indefinite pronouns | It, those |
6.67 |
7.67 |
Obama more vague |
| Articles | A, the |
6.76 |
6.24 |
McCain more categorical thinking |
| Verbs | Walk, went |
15.74 |
16.65 |
Obama more fluid or contextual |
| Auxiliary verbs | Is, have |
10.29 |
10.40 |
|
| Past tense | Was, gave |
3.35 |
2.68 |
McCain talks about things in the past |
| Present tense | Am, is |
10.01 |
12.06 |
Obama more present oriented |
| Future tense | will |
1.39 |
0.91 |
McCain more future oriented |
| Common adverbs | Very, really |
4.05 |
4.39 |
|
| Prepositions | To, for, of |
13.22 |
13.35 |
|
| Conjunctions | And, or, whereas |
6.55 |
6.21 |
|
| Negations | No, not, never |
1.52 |
1.61 |
|
| Quantifiers | Much, few |
2.59 |
3.08 |
|
| Numbers | Six, 12 |
1.65 |
1.72 |
|
| Social references | Friend, we, talk |
11.75 |
10.19 |
McCain more references to others |
| Overall emotion words | Happy, hurt, kill |
5.43 |
5.01 |
McCain more emotional |
| Positive emotions | Happy, nice |
3.79 |
3.61 |
|
| Negative emotions | Sad, nasty, bad |
1.65 |
1.43 |
|
| Anxiety, fear | Worry, scared |
0.12 |
0.14 |
|
| Anger | Angry, hate |
0.59 |
0.31 |
|
| Sadness | Depressed, cry |
0.32 |
0.27 |
|
| Cognitive mechanisms | Think, should |
17.39 |
17.89 |
|
| Insight | Realize, know |
1.73 |
2.04 |
|
| Causal | Because, reason |
1.43 |
2.13 |
Obama more causal reasoning |
| DIscrepancy | Would,could |
2.11 |
2.00 |
|
| Tentative | Maybe, perhaps |
1.52 |
2.15 |
Obama perspective difference |
| Certainty | Absolute, certainly |
1.46 |
1.50 |
|
| Inhibition | Blocked, stop |
0.64 |
0.60 |
|
| Inclusive words | With, and |
6.78 |
6.13 |
McCain over inclusive |
| Exclusive words | Except, but |
2.24 |
2.47 |
|
| Relativity | Times, going, over |
11.61 |
12.11 |
|
| Motion | Went, fly |
1.65 |
2.02 |
|
| Space | Area, under |
5.81 |
5.78 |
|
| Time | Hour, clock |
3.70 |
4.05 |
|
| Content Categories | ||||
| Work | Job, paycheck |
3.99 |
4.86 |
Obama more references to work |
| Achievement | Try, succeed |
1.82 |
2.58 |
Obama higher in achievement words |
| Leisure | Games, tv |
0.47 |
0.44 |
|
| Home | Garage, yard |
0.53 |
0.41 |
|
| Money | Cash, debt |
3.21 |
3.19 |
|
| Religion | God, church |
0.06 |
0.08 |
|
| Death | Dead, cemetery |
0.09 |
0.01 |
Debate language
October 15, 2008
by James W. Pennebaker
I will try to post the language variables of tonight’s third debate as soon as the transcripts are available. In the meantime, several comments that have been posted in the last 24 hours that point to some misunderstandings:
Speech writers, trainers, and natural language. Some people have noted that we can’t determine if the language used by the candidates reflect their speech writers or the candidates themselves. Indeed, that is why we try to analyze only unscripted language. Debates are particularly good for this. Yes, all the candidates slip into canned phrases with some frequency but, in general, they are likely using more of the words they would naturally use than not.
Some of our other analyses compare the ways candidates talk in one-on-one interviews with debates as well. In general, candidates are fairly consistent in the ways they use words across these contexts. Obama and Biden tend to be more consistent than McCain and Palin but the differences are not striking. See the previous posts by Molly Ireland on this topic.
First person singular pronouns: I versus my versus me. The use of first person singular is psychologically fascinating. When people are engaged in everyday normal conversations, they use the word “I” at quite high rates (about 6% of all words) compared with “me” (about 0.5%) or “my” (about 0.7%). A couple of people have noted that McCain obviously uses first person singular pronouns at such high rates because of his use of “my friends”.
It’s true that McCain has been using “my” at higher rates across the two debates than Obama but he has also been using “I” at these elevated frequencies as well. This has been true for both candidates for interviews and debates for the entire election season. The average pronoun use for the two candidates across the first two debates (as a percentage of total word usage) is as follows:
| Candidate |
“I” |
“me” |
“my” |
| McCain |
2.67 |
0.19 |
0.48 |
| Obama |
1.86 |
0.14 |
0.15 |
It should be noted that the relative rates of all the pronouns were virtually identical from the first to the second debate except for McCain’s higher use of “my” in the second debate (0.20 in Debate 1 and 0.71 in the second).
What does it mean if a person uses high rates of I versus me? One of the founders of modern psychology, William James, made the strong assertion that the use of “I” implied a self in control whereas the use of “me” suggested the self was being acted upon by others. This, of course, makes perfect sense. Empirically, it’s probably wrong. People who are depressed, lower in status, and lower in self-esteem consistently use “I” at higher (not lower) rates than non-depressed, high status, and self-assured people. Ironically, the use of “me” doesn’t seem to be related to any of these qualities — or any qualities that we have studied so far.
It would be misleading to think that the use of “I” always signals depression and low self-esteem. People who use I tend to be more honest and are often more socially sensitive. They are more likely to say “I think it’s cold outside” instead of “It’s cold outside.” Saying phrases such as “I think”, “I believe”, etc are subtly indicating that they are aware that other perspectives exist and that theirs is only one of many.
The Meaning of Words: Obama versus McCain
October 12, 2008
By James W. Pennebaker
What does it mean when the candidates use language differently? Here and elsewhere, language experts have argued that word use can be associated with electability, sociability, and thinking patterns of the candidates. Although it is tempting to use text analysis programs to predict who will win, one should be wary. Perhaps a safer bet is to use language markers as correlates of people’s social, cognitive, and personality styles. That is, we should be thinking beyond electability and towards possible governing styles.
Predicting who will win. Over the years, several research teams have found that the degree to which candidates express optimism and positive emotion is linked to electability. We have found this as well. Bill Clinton and George W. Bush used more positive emotion words and future tense verbs than any of their rivals in the presidential debates and interviews. No other language dimensions have predicted voter preferences as well.
This year, in the primaries, John McCain was the most optimistic whereas Hilary Clinton was consistently more positive than Barack Obama. Since the conventions, McCain has continued to be far more positive in his language use than Obama. Interestingly, McCain has expressed more negative emotion as well.
Optimism may have its limits. Unlike virtually every other presidential election in memory, the 2008 contest is taking place in a highly threatening, anxiety-provoking economic time. A less emotional orientation could well be more appealing than it has been in the past.
Predicting how they will govern. Most language dimensions that we study are probably better markers of how people will lead than who will vote for them. Some dimensions that are relevant include:
Cognitive complexity. A particularly reliable marker of cognitive complexity is the exclusive word dimension. Exclusive words such as but, except, without, exclude, signal that the speaker is making an effort to distinguish what is in a category and not in a category. Those who use more exclusive words make better grades in college, are more honest in lab studies, and have more nuanced understanding of events and people. Through the primaries until now, Obama has consistently been the highest in exclusive word use and McCain the lowest.
Categorical versus fluid thinking. Some people naturally approach problems by assigning them to categories. Categorical thinking involves the use of articles (a, an, the) and concrete nouns. Men, for example, use articles at much higher rates than women. Fluid thinking involves describing actions and changes, often in more abstract ways. A crude measure of fluid thinking is the use of verbs. Women use verbs more than men.
McCain and Obama could not be more different in their use of articles and verbs. McCain uses verbs at an extremely low rate and articles at a fairly high rate. Obama, on the other hand, is remarkably high in his use of verbs and low in his use of articles. These patterns suggest that McCain’s natural way of understanding the world is to first label the problem and find a way to put it into a pre-existing category. Obama is more likely to define the world as ongoing actions or processes.
Personal and socially connected. Individuals who think about and try to connect with others tend to use more personal pronouns (I, we, you, she, they) than those who are more socially detached. Bush was higher than Kerry or Gore. McCain has consistently been much higher than any other candidate in this election cycle. His use of 1st person singular (I, me, my) is particularly high which often signals an openness and honesty. Obama uses personal pronouns at moderate levels – similar to Hillary Clinton and most other primary candidates of both parties.
Restrained versus impulsive. People vary in the degree to which they act quickly or shoot from the hip versus stand back and consider their options. Over the last few years, some have argued that the use of negations (e.g., no, not, never) indicate a sign of inhibition or constraint. Low use of negations may be linked to impulsiveness. Bush was low in negations whereas Kerry was quite high. Across the election cycle, Obama has consistently been the highest user of negations – suggesting a restrained approach – where as McCain has been the lowest – a more impulsive way of dealing with the world.
The limits of text analysis. Language use is highly dependent on context. Most people use far more personal pronouns when talking with friends than when giving a formal speech. Nationally televised debates, interviews, and stump speeches are highly unusual language settings. To the degree that the candidates are using their own words, they provide us a glimpse of the ways they are thinking and dealing with their worlds.
Because every election is different, it is important to weigh the cultural context of the time. During economic upheavals, wartime, or other unusual periods, normally-abnormal speaking patterns may now be quite normal and vice versa. One can imagine a number of new methodologies growing from this election. For example, it might be instructive to use natural language use in blogs or the media as an indicator of cultural context by which to compare future candidates’ word use.
Finally, no one should take any text analysis expert’s opinions too seriously. The art of computer-based language analysis is in its infancy. We are better than tea-leaf readers but probably not much.
Debate 2: Obama vs McCain
October 7, 2008
Personality and language are consistent across time and context. The ways that Obama and McCain used language in the second debate were remarkably similar to the ways they used language in the first debate. The bottom line continues to be that McCain is more socially connected, impulsive, and emotionally honest whereas Obama comes across as smarter, more cognitively complex, and more emotionally calm and detached.
Using the LIWC computer program, the numbers that came up are:
| Category | Examples | Obama | McCain | Interpretation | |
| Word count | 7111 | 6520 | * | Obama talks more | |
| Words per sentence | 17.78 | 14.78 | * | Obama longer sentences | |
| Big words (over 6 letters) | 17.17 | 17.88 | |||
| Personal pronouns | 9.60 | 11.15 | * | McCain more personal in general | |
| 1st person singular | I, me, my | 2.03 | 3.24 | ** | McCain more personal |
| 1st person plural | We, our | 4.11 | 4.02 | ||
| 2nd person | You, yours | 1.95 | 1.83 | ||
| 3rd person singular | He, she, her | 0.56 | 0.77 | ||
| 3rd person plural | They, them | 0.96 | 1.30 | ||
| Indefinite pronouns | It, those | 7.88 | 6.72 | * | Obama more vague |
| Articles | A, the | 6.17 | 5.98 | ||
| Verbs | Walk, went | 17.45 | 17.07 | ||
| Auxiliary verbs | Is, have | 11.08 | 10.66 | ||
| Past tense | Was, gave | 3.29 | 2.75 | ||
| Present tense | Am, is | 12.36 | 11.89 | ||
| Future tense | will | 0.84 | 1.50 | * | McCain more future oriented |
| Common adverbs | Very, really | 4.67 | 3.63 | * | Obama more “flowery” |
| Prepositions | To, for, of | 13.80 | 13.28 | ||
| Conjunctions | And, or, whereas | 7.21 | 7.56 | ||
| Negations | No, not, never | 1.74 | 1.35 | * | Obama censoring himself; McCain more impulsive |
| Quantifiers | Much, few | 2.59 | 2.78 | ||
| Numbers | Six, 12 | 1.95 | 1.23 | ||
| Social references | Friend, we, talk | 11.18 | 11.78 | ||
| Overall emotion words | Happy, hurt, kill | 5.20 | 6.33 | * | McCain more more emotional |
| Positive emotions | Happy, nice | 3.78 | 4.49 | * | McCain more positive |
| Negative emotions | Sad, nasty, bad | 1.48 | 1.99 | * | McCain more negative |
| Anxiety, fear | Worry, scared | 0.21 | 0.31 | ||
| Anger | Angry, hate | 0.53 | 0.84 | ||
| Sadness | Depressed, cry | 0.17 | 0.18 | ||
| Cognitive mechanisms | Think, should | 18.39 | 19.49 | * | McCain more social thinking |
| Insight | Realize, know | 2.01 | 2.27 | ||
| Causal | Because, reason | 2.48 | 1.67 | * | Obama more causal reasoning |
| DIscrepancy | Would,could | 1.73 | 1.47 | ||
| Tentative | Maybe, perhaps | 2.21 | 2.12 | ||
| Certainty | Absolute, certainly | 1.43 | 1.66 | ||
| Inhibition | Blocked, stop | 0.75 | 0.94 | ||
| Inclusive words | With, and | 6.37 | 7.81 | ** | McCain over inclusive |
| Exclusive words | Except, but | 2.88 | 2.29 | * | Obama more cognitively complex |
| Relativity | Times, going, over | 13.02 | 12.01 | * | |
| Motion | Went, fly | 2.52 | 2.10 | ||
| Space | Area, under | 6.09 | 5.92 | ||
| Time | Hour, clock | 4.15 | 3.96 | ||
| Content Categories | |||||
| Work | Job, paycheck | 3.68 | 3.44 | ||
| Achievement | Try, succeed | 2.94 | 2.79 | ||
| Leisure | Games, tv | 0.38 | 0.25 | ||
| Home | Garage, yard | 0.32 | 0.48 | ||
| Money | Cash, debt | 2.66 | 2.15 | ||
| Religion | God, church | 0.11 | 0.05 | ||
| Death | Dead, cemetery | 0.18 | 0.18 |
The numbers refer to the percentage of total words. So, for example, 3.24% of all of McCain’s words were 1st person singular pronouns. These numbers were generated by the LIWC text analysis program (see www.liwc.net).
by Molly Ireland
Over the last couple of weeks, I’ve been analyzing the way the candidates speak in interviews and speeches using our text analysis program, LIWC. Now that the individual analyses are finished, here’s a summary of how the candidates differ from their running mates and the opposing team.
Averaged across interviews and speeches, McCain and Palin are less linguistically in sync than Obama and Biden, with 19 significant differences between the Republicans and 13 differences between the Democrats. As McCain pointed out, “What do you expect of two mavericks, to agree on everything? Eh?”
Looking at the candidates’ language from a broader perspective, there are 16 major differences between the Democrats and Republicans. In the table below you can see how the two tickets differ (word categories in each column were used significantly more by those candidates):
“I” and “we.” All four nominees come across as high status and somewhat distant, using “I” at relatively low rates (about 3%) and “we” at relatively high rates (also about 3%). The Republicans appear more personable than the Democrats, using more I-words on average. In terms of “I” use in interviews and speeches, Biden is the least approachable and McCain is the most. In terms of “we” use, Palin sounds more like Joe Six-Pack than the other candidates in speeches. She comes across as colder (more “we”) than the others in interviews, however. Obama, Biden, and McCain all use “we” at about the same rate.
Conjunctions and negations. Negations and conjunction are both markers of self-restraint: negations indicate self-control and inhibition, and a high number of conjunctions is a hallmark of rambling. Palin appears to be the least restrained of the four candidates, linguistically and otherwise. She uses far more conjunctions and far fewer negations than the others, particularly in interviews. Biden, on the other hand, clearly likes to talk, but his language actually shows the most restraint: he uses the fewest conjunctions and the most negations. He says a lot, but he structures his sentences normally and is relatively self-controlled.
Cognitive mechanisms. Palin’s thought processes seem to be the least complicated of the four: she uses fewer words that refer to cognitive processes (insight, cause and effect relationships, inclusion, exclusion, and so on) than the other nominees. The only cognitive words she uses frequently are inclusive words (plus, and, with), which, like conjunctions, can indicate rambling. She uses particularly few inhibition and exclusive words. Using few exclusive words sometimes indicates dishonesty. Obama uses the most exclusive words of the four nominees, although the differences between Obama, Biden, and McCain are subtle.
Emotions. The Republican ticket is generally sunnier than the Democratic team (more positive emotion words, fewer mentions of negative emotions). McCain uses the most positive emotion words (happy, excited) of the four nominees. He also talks about sadness the most, however. Palin is the most unilaterally cheerful candidate: she talks about positive emotions more than both Democrats and refers to anxiety, anger, and sadness less than McCain, Obama, and Biden. Biden’s language is somewhat gloomy: he talks about positive emotions the least and negative emotions the most. Obama talks the most about anxiety, Biden language is the angriest.
Achievement. McCain appears to be more ambitious and focused on success than either Palin or his opponents. He uses words that refer to need for achievement (failure, win, success) much more than the others. In speeches, 4% of his words have to do with need for achievement. That’s very high. (Compare with Eliot Spitzer’s achievement language in his resignation speech.) Obama and Palin refer to achievement least often, and Biden is somewhere in the middle. All candidates used more achievement words in speeches. Similarly, McCain talks about money the most, nearly three times as much as Biden and Palin, while Obama talks about money a moderate amount.
The VP Debate: Biden vs Palin
October 2, 2008
It’s official. Biden and Palin speak differently — but not in the ways many people think. Biden uses language in a way that suggests he is more personal, honest (higher rates of I), and socially engaged (third person pronouns) whereas Palin is surprisingly emotionally distant ( more “we” words). Palin uses more positive emotion words more than Biden but doesn’t differ from him in the use of negative emotions.
The most striking differences appeared for a variety of cognitive dimensions. As a thinker, Biden proved to be more specific and concrete (higher use of articles and nouns), concerned with specific numbers, and showed signs of being more cognitively complex in talking about issues (exclusive words). Palin, on the other hand, proved to be someone who thinks more about the perspectives of others — especially her audience. Through her use of cognitive mechanism words (e.g., words like realize, think, believe), she was subtly acknowledging that there were different answers or approaches to the issues. She is not the narrow-minded true believer many of her critics were hoping to see. Not surprisingly, Palin was not as crisp or polished in her thinking and in her answers as Biden. The best evidence for this was her complicated sentence structure (prepositions) and high rates of conjuctions and inclusive words — general markers of rambling and vagueness.
More to follow. But in the meantime, here are the numbers:
|
Category |
Examples |
Biden |
Palin |
|
Interpretation |
|
Word count |
|
7372 |
7741 |
* |
Palin talks more |
|
Words per sentence |
|
16.13 |
19.50 |
** |
Palin longer sentences |
|
Big words (over 6 letters) |
|
16.05 |
16.81 |
|
Palin bigger words |
|
Personal pronouns |
|
9.06 |
8.68 |
|
|
|
1st person singular |
I, me, my |
3.02 |
2.31 |
* |
Biden more personal |
|
1st person plural |
We, our |
2.18 |
3.51 |
** |
Palin more formal, distant |
|
2nd person |
You, yours |
1.25 |
1.51 |
|
Palin more aggressive, pointed |
|
3rd person singular |
He, she, her |
1.34 |
0.89 |
* |
Biden more social |
|
3rd person plural |
They, them |
1.26 |
0.45 |
* |
Biden more social |
|
Indefinite pronouns |
It, those |
6.12 |
7.76 |
** |
Palin more vague |
|
Articles |
A, the |
7.21 |
5.65 |
** |
Biden more concrete, less abstract |
|
Verbs |
Walk, went |
16.02 |
15.22 |
|
Biden more dynamic |
|
Auxiliary verbs |
Is, have |
10.46 |
10.09 |
|
|
|
Past tense |
Was, gave |
3.89 |
3.02 |
|
|
|
Present tense |
Am, is |
9.63 |
9.97 |
|
|
|
Future tense |
will |
1.28 |
0.98 |
|
|
|
Common adverbs |
Very, really |
4.07 |
6.19 |
** |
Palin more “flowery” |
|
Prepositions |
To, for, of |
13.51 |
14.66 |
* |
Palin more detailed |
|
Conjunctions |
And, or, whereas |
5.64 |
8.16 |
** |
Palin more extended sentences |
|
Negations |
No, not, never |
2.20 |
1.51 |
* |
Biden censoring himself |
|
Quantifiers |
Much, few |
2.22 |
2.61 |
|
|
|
Numbers |
Six, 12 |
2.43 |
0.92 |
** |
Biden more specific |
|
Social references |
Friend, we, talk |
10.66 |
10.75 |
|
|
|
Overall emotion words |
Happy, hurt, kill |
4.69 |
5.58 |
|
|
|
Positive emotions |
Happy, nice |
3.35 |
4.25 |
* |
Palin more positive |
|
Negative emotions |
Sad, nasty, bad |
1.45 |
1.34 |
|
|
|
Anxiety, fear |
Worry, scared |
0.12 |
0.13 |
|
|
|
Anger |
Angry, hate |
0.64 |
0.78 |
|
|
|
Sadness |
Depressed, cry |
0.15 |
0.14 |
|
|
|
Cognitive mechanisms |
Think, should |
16.26 |
18.41 |
** |
Palin more social thinking |
|
Insight |
Realize, know |
1.55 |
1.87 |
|
|
|
Causal |
Because, reason |
2.02 |
2.07 |
|
|
|
DIscrepancy |
Would,could |
1.56 |
1.69 |
|
|
|
Tentative |
Maybe, perhaps |
1.75 |
1.38 |
|
|
|
Certainty |
Absolute, certainly |
1.84 |
1.49 |
|
|
|
Inhibition |
Blocked, stop |
0.56 |
0.44 |
|
|
|
Inclusive words |
With, and |
5.41 |
7.67 |
** |
Palin over inclusive |
|
Exclusive words |
Except, but |
2.62 |
2.36 |
* |
Biden more cognitively complex |
|
Relativity |
Times, going, over |
12.09 |
12.58 |
|
|
|
Motion |
Went, fly |
2.16 |
2.20 |
|
|
|
Space |
Area, under |
5.83 |
6.72 |
|
|
|
Time |
Hour, clock |
3.95 |
3.46 |
|
|
|
Content Categories |
|
|
|
|
|
|
Work |
Job, paycheck |
3.54 |
3.95 |
* |
Palin focuses on work and jobs |
|
Achievement |
Try, succeed |
2.03 |
2.79 |
* |
Palin higher in achievement motives |
|
Leisure |
Games, tv |
0.20 |
0.68 |
|
|
|
Home |
Garage, yard |
0.53 |
0.37 |
|
|
|
Money |
Cash, debt |
2.18 |
1.89 |
|
|
|
Religion |
God, church |
0.20 |
0.09 |
|
|
|
Death |
Dead, cemetery |
0.39 |
0.25 |
|
|
The numbers represent the percentage of total words that were spoken by the candidate. So, for example, 2.79% of all the words used by Sarah Palin were associated with achievement (words like try, succeed, win) compared with 2.03% of Biden’s words. These numbers were generated by the computerized text analysis program LIWC, or Linguistic Inquiry and Word Count (www.liwc.net).
James W. Pennebaker
Language in Speeches vs. Interviews, Part 4: Sarah Palin
October 2, 2008
by Molly Ireland
As we’ve done for the other presidential and vice presidential candidates, I used our text analysis program, LIWC (www.liwc.net), to analyze about 20,000 words of Sarah Palin’s language in speeches and interviews. While the other three candidates’ language changed in most categories when they left the stage, Palin remains the same more often than not. It’s tempting to call this another example of Palin’s folksy, frontierswoman honesty, but the evidence suggests that may not always be the case.
Here’s a quick summary of the major differences between Palin’s language in interviews and speeches (words in each column are those that were used significantly more in that context; asterisks denote a nonsignificant but meaningful trend):
Exclusive words. People use exclusive words (versus, but, either) to divide the world into clear and distinct categories: right vs. wrong, us but not them, a burger without mustard, and so on. In speeches, Palin uses exclusive words at an extremely low rate, 2.1% — much lower than the average for spoken conversation, 3.3%.
But how exclusive is she when she’s being herself? The other three candidates this year all spoke more like normal people in interviews than in speeches. If Palin’s language follows the same pattern, she will use more exclusive words in interviews. Palin’s political positions (she’s for banning gay marriage and excluding polar bears from the endangered species list) give us even more reason to expect high – or at least normal – levels of exclusiveness in interviews.
That reasonable prediction, it turns out, is wrong. Palin — unlike Obama, Biden, and her running mate — uses slightly fewer exclusive words in interviews than in speeches. The other presidential and vice presidential candidates all use significantly more exclusive words in interviews than in speeches. Most people, in fact, use exclusive words more often in conversation than in formal contexts, like speeches. In general, Palin uses fewer exclusive words than Obama, Biden, and McCain in interviews (2.1% vs. 2.8%). She also uses exclusive words at about half the normal rate (Palin uses 1.9%; 3.3% is average).
What does this mean? It could simply mean that Sarah Palin doesn’t think about the world in terms of divisions and distinct categories. Exclusive words don’t only indicate exclusiveness, however. They also indicate cognitive complexity, and help us estimate the truthfulness of a statement or story. It’s easier to keep your story straight if it’s simple, so people tend to use fewer exclusive words and speak in simple sentences when they lie.
So which is she, deceptive or just not very exclusive? Looking at her debates and interviews, her lack of complexity seems more evasive than non-exclusive. For example, in the following excerpt from an interview with Katie Couric on September 29th, Palin responds to Couric’s uncontroversial question about where she gets her news:
PALIN: I’ve read most of them, again with a great appreciation for the press, for the media
COURIC: What, specifically?
PALIN: Um, all of them, any of them that have been in front of me all these years.
COURIC: Can you name a few?
PALIN: I have a vast variety of sources where we get our news. Alaska isn’t a foreign country, where it’s kind of suggested, “Wow, how could you keep in touch with what the rest of Washington, D.C. may be thinking when you live up there in Alaska?” Believe me, Alaska is like a microcosm of America.
In this excerpt and on average across all of her interviews, Palin’s language fits the basic deceptive profile. She doesn’t use a single exclusive word in the response above, and she uses “I” fairly rarely (4%; 6.3% is average), both indicators of dishonesty.
Logic backs up the linguistic evidence. The only reasonable explanation for her response is that Palin did not, for whatever reason, want to reveal her specific news sources. It is almost inconceivable that a vice presidential candidate – with a BA in Journalism, no less – could not recall a single specific news source. Palin’s response clearly show some degree of dishonesty. While evasive non-responses aren’t the same as outright lies, they do fall under the broad umbrella of deception.
First person singular and tentativeness: “I’ll get back to you.” For Palin, the similarities between speeches and interviews are as revealing as the differences. Interviews are usually a chance for politicians to step down from their pedestals and reveal to their constituency that they’re just like them. Obama, McCain, and Biden all take advantage of this opportunity by using first person singular pronouns (I-words like I, me, my, and mine) and tentative words (maybe, guess, careful) more often in interviews than in speeches.
Contrary to what you might expect for a relatively green politician in a series of difficult interviews, Palin used “I” and tentative words rarely in both interviews and speeches. In other words, Palin has been as unblinkingly confident in face-to-face conversations as she has been in speeches written by people who are paid to make Palin appear vice presidential. Using “I” an average amount communicates honesty and makes candidates seem more approachable; higher than average I-word use is associated with negative emotions, and extremely high “I” use is a sign of depression and neuroticism. Palin resists the pressure to become self-focused or tentative even during embarrassing exchanges like the following (first person singular pronouns in bold; there are no tentative words):
GIBSON: But it’s now pretty clearly documented. You supported that bridge before you opposed it. You were wearing a t-shirt in the 2006 campaign, showed your support for the bridge to nowhere.
PALIN: I was wearing a t-shirt with the zip code of the community that was asking for that bridge. Not all the people in that community even were asking for a $400 million or $300 million bridge.
Her low use of I-words is especially striking given that she’s a relatively young woman. Women and younger people tend to use “I” significantly more than men, yet Palin uses fewer I-words in interviews than either Obama or McCain (3.5% vs. 4.5% for Obama and 5.1% for McCain). Unusually low I-word use is usually interpreted as a sign of high status, deception, or both. Very high status people rarely refer to themselves because they’re primarily focused on managing subordinates. Deceptive people avoid using “I” in an attempt to psychologically distance themselves from their lie.
Whether Palin is more deceptive than other candidates this year is anyone’s educated guess. What is clearer is that failing to use “I” more often, especially in interviews, is a missed opportunity. More importantly, it’s an opportunity that Obama and Biden both seized. Using less “I” in interviews has probably isolated her further from supporters who have been disappointed by Palin’s recent interview fumbles. Palin electrified audiences at the Republican National Convention, and her audience identified with her at least in part because she used “I” slightly more often and “we” less often in her speech. In other words, she was more conversational, average, and approachable when she was reading a teleprompter. If she can somehow communicate in interviews the way she has in speeches, she may regain some of her lost ground.
First person plural: “We want to see that drilling.” Strikingly, Palin – unlike Obama, Biden, and McCain – used first person plural (we, our) significantly more in interviews than in speeches. Using “we” more often than “I” tends to put a chilly distance between a speaker and their audience. In speeches the royal “we” is sometimes warranted, but choosing “we” over “I” in less formal interview situations tends to alienate audiences. In her interview with CNBC’s Marie Bartiromo, Palin used we-words a whopping five times as often as she used I-words. In this excerpt from Bartiromo’s August 29th interview with Palin, “we,” the Alaskan people, approve of drilling on the Arctic National Wildlife Refuge (ANWR) (first person plural in bold):
No one but Alaskans will care more to make sure that we are preserving that pristine environment that is ANWR … And with Alaskans’ love and care for our environment and our lands and our wildlife, Alaskans are saying, “Yes, because we believe that it can be done safely, prudently, and it had better be done ethically also. Yes, we want to see that drilling.”
Rather than recalling specific people she has personally spoken with, she speaks for the citizens of Alaska in nonspecific blanket statements. Rather than citing evidence, she makes the simple argument that Alaskans wouldn’t support drilling in ANWR if it were unethical. The case is hardly closed, and it’s partly due to her ineffective use of pronouns.
Although Palin’s language changes very little when she transitions from speeches to interviews, she is making some progress. Palin used “we” much more often than “I” (5% vs. 2.8%) in the first of her three interviews with ABC’s Charlie Gibson. Palin was widely criticized for her rambling, scripted responses and inability to answer questions about the Bush Doctrine in the first Gibson interview. In the second and third installments she appeared more comfortable and less speechy. As she seemed to relax into her role she adopted a warmer and more personal speaking style, using “I” about as often as “we.” In her first interview with Gibson the royal “we” made up 5% of Palin’s words. Luckily for the McCain camp, her “we” use dropped to a more human 2.3% in the third Gibson interview and was 2.2% on average in the recent series of Couric interviews. Even with this improvement, in interviews she still uses “we” more than twice as often as average people (2.9%; 1.1% is average).
Summary. Looking at the candidates’ first person pronoun use we can see that Palin, unlike each of the other candidates we’ve analyzed, is more formal in face-to-face interviews than she is on stage. This pronoun pattern – using “we” more in conversation than in formal settings – is the opposite of what we find in the general population. In interviews, Palin uses “I” half as often as an average person and “we” nearly three times more than average. In speeches, on the other hand, her “I” use is fairly effective, and is similar to that of Obama, Biden, and McCain. Overall she uses very few exclusive words, which indicates less cognitive complexity, less exclusiveness, and, possibly, deception. Using less “I” and fewer exclusive words is a hallmark of deceptive language or spin. She might find more empathy than pity in her audience if, when cornered, she admitted her shortcomings rather than unsuccessfully evading the truth.










