Henry Kissinger’s Sentiments Are Not An Exception!

Posted on September 12, 2012 by





Henry Kissinger’s Sentiments Are Not An Exception!

Background

I have received many comments on the short article “Henry Kissinger vs. Sentiment Analysis” (Saba, 2012).  The main point of that article was that the sentence in (1) conveys an extremely positive sentiment about the United States, although a purely quantitative (statistical or machine learning) approach cannot make this inference due to the surrounding negative words in the context.

1. The US is the worst place to live in, until you try living anywhere else.

The argument made in (Saba, 2012) was this: sentiment analysis presupposes understanding ordinary spoken language, and is even much harder as it requires understanding metaphor, sarcasm, irony, etc. Since we do not yet have systems that can even understand simple and ordinary spoken language, there cannot be any system that does serious and meaningful sentiment analysis. While most comments received were in agreement, there were some comments that questioned the percentage of such examples in everyday language use. These (noteworthy) comments can be summarized as follows:

If the (Henry Kissinger, the corner table, and the iPhone) examples I provided are not an exception but actually constitute a large percentage of everyday language use, then the argument that there cannot be sentiment analysis before we have full natural language understanding is accepted. If, on the other hand, these examples are a small percentage of everyday language use, then statistical and machine learning approaches can indeed do a decent job in inferring positive and negative sentiment in text.

This, I confer, is a sound observation that is worthy of discussion.

In what follows I hope to show that examples involving sarcasm, metonymy, irony, metaphor, etc. are not rare or exceptional, but are in fact quite common in everyday language use. Programs that process and comprehend such sentences must therefore have access to a massive amount of “commonsense knowledge” and will need to make very complex inferences to decode what is hidden or missing from the utterances people make.

Once the argument is made that such utterances are not an exception in everyday language use, but are in fact quite common, there remains one argument that can support the claim that there are currently systems that can do meaningful sentiment analysis: to prove that there are systems that can understand one simple question posed in ordinary spoken language, which is a challenge that I will gladly accept.

Non-Literal Meaning In Everyday Language Use

 The sentence in (1), which expresses a positive sentiment towards the US, despite the apparent use of negative terminology, uses what is technically referred to in linguistics as “sarcasm”.  Other examples that require deep natural language understanding (involving the use of commonsense and logical reasoning) involve the use of “metonymy”. For example, consider the following, quite common and not at all exotic and rare utterances:

1. I don’t like Barcelona; Real Madrid was always my favorite.

2. Americans do not like to talk about Vietnam, it brings back so much bad memories.

Clearly, the (very ordinary and common!) utterance in (2) is not anyway conveying a negative sentiment about the city of Barcelona, but the football (soccer!) team of that city. The speaker could very well like the city of Barcelona, but it happens that his favorite team is Real Madrid. Similarly for (3). The negative context surrounding ‘Vietnam’ is no indication at all of American sentiment towards the country, but a particular event, namely the long and very bloody Vietnam War.

This use of one entity to refer to another is quite common. In fact, some researchers claim that the percentage of metonymy in ordinary language use is (alone!) somewhere between 17% to 20% (for an example, see Markert and Nissim, 2006). This percentage is much higher in other forms of text, such as political talk, poetry, the arts, etc. The same applies to sarcasm and irony, which Chin (2011) says is “practically the primary language” in modern society. Like metonymy, sarcasm is quite common yet it presents sentiment analysis with a monumental challenge. Consider these very ordinary and quite common utterances:

3. Yes, Porsche is too expensive, but let’s face it, it is one hell of a car.

4. There’s so much to curse and nag about in New York, but if I leave for one week, I miss it.

The context surrounding ‘Porsche’ and ‘New York’ in (4) and (5) seems to be quite negative, yet a full understanding that relies on background and commonsense knowledge would allow us to make the right inferences – that Porsche is a great car and, despite of all the negatives, New York city has a lot to offer!

Besides sarcasm, irony and metonymy, which alone account to more than 30% of ordinary language use, metaphor is also quite common in text, amounting to more than 50% of ordinary language use, according to some researchers (see, for example, Lakoff and Johnson, 1980; Paloma Úbeda Mansilla, 2003)[1]. Sentences that are full of metaphorical use, such as that in (6) are also quite common and are also beyond any quantitative and machine learning approaches to sentiment analysis.

6. Man, look at that wild and crazy thing, she is a knockout!

Concluding Remarks

 To summarize, studies indicate that, collectively, metonymy, metaphor, sarcasm, irony, and other forms of non-literal use of words in ordinary spoken language are not an exception but the norm in ordinary language use. In fact, ordinary, simple and straight to the point language is the exception (unless the target audience were young children that have not yet mastered the interpretation of such language). This is especially the case when it comes to social media jargon, where sarcasm, irony and other forms of non-literal language is the norm. Assuming these forms of language use are an integral part of ordinary text (say 60% of language use), then a sentiment analysis system that has a statistical and machine learning system with 80% accuracy can at best make around 60% accurate inferences regarding sentiment. This is only slightly better than making random picks between heads (negative sentiment), or tails (positive sentiment)!

One final word regarding this subject – Like we mentioned in an earlier article, we are not in any way questioning the value of work in natural language processing. We ourselves are actively working in this field and we have recently developed a semantic engine that is excellent at inferring the “aboutness” of a piece of text, producing an intelligent summary, identifying the key topics, extracting entities and inferring their type, as well as relating textual objects semantically.

Inferring the aboutness of a piece of text, and semantically/topically relating text, can be done, although it has not been perfected. However, saying that there are currently systems that can infer from what we write how we ‘feel’ about certain entities is not only inaccurate, but is harmful. When expectations are not met, it will not be easy to recover when the time comes to do “real” sentiment analysis.

References

Chin, R. (2011), The Science of Sarcasm? Yeah, Right, Science and Nature, Nov. 4, 2011

Gibbs, R. W. and Colston, H. L. (2001), The Risks and Rewards of Ironic Communication, In L. Anolli, R. Ciceri and G. Riva (Eds.), New perspectives on miscommunication, IOS Press, 2001

Lakoff, G. and Johnson, M. (1980), Metaphors we Live by, University of Chicago Press.

Markert, K. and Nissim, M. (2006), Metonymic Proper Names: A Corpus-based Account, In A. Stefanowitsch, editor, Corpora in Cognitive Linguistics. Vol. 1: Metaphor and Metonymy. Mouton de Gruyter, 2006.

Paloma Úbeda Mansilla (2003), Metaphor at work: a study of metaphors used by European architects when talking about their projects, IBÉRICA 5

Saba, W, (2012). Henry Kissinger vs. Sentiment Analysis. September 2012, SEO Journalist http://goo.gl/1eQqJ


[1] Note that some forms of metonymy are special cases of metaphor, so these two sets are not mutually exclusive.

Powered By DT Author Box

Written by Walid Saba

Currently the Chief Information Officer of Pragmatech, I received my PhD in Computer Science from Carleton University in 1999. I spent over 18 years in industry working at such places as the American Institutes for Research, AT&T Bell Labs, MetLife, Nortel Networks, IBM and Cognos.

I also taught Computer Science at the University of Windsor, the New Jersey Institute of Technology (NJIT), the American University of Beirut, and the American University of Technology. My research interests are in computational semantics, ontology and intelligent agents. I have over 30 publications including an award winning paper that was presented at the 31st Annual German Conference
on Artificial Intelligence (KI-2008).
walid.saba@pragma-tech.com

Author’s Website

Related Posts :

  • Those of us that work in natural language processing (NLP) know very well that understanding natural language requires massive amount of commonsense knowledge, knowledge that a five year-old has – e.g., tables don’t ...

  • Social media has had an enormous impact on everyday life and from a marketing standpoint, its a virtual goldmine for content marketing and advertisements. Mobile marketing is a whole other beast entirely and it revo ...

  • Online advertising is usually the main source or the most consistent source of income for your average website, including some bigger players. The main factor that contributes to the income stream is the CTR (Click ...





Do you need great web hosting for your page?
SEOjournalist.com recommendation are here:
dh468x60-b

Comments:

Comments (1)

 

  1. MIke McMaster says:

    This is an interesting article, and does indeed flag up vaild concerns about automated sentiment analysis. However, let’s not be too hasty in writing off the whole thing, just beacuse it is difficult…
    “Yes, Porsche is too expensive, but let’s face it, it is one hell of a car”
    So the argument is that you need lots of context to understand what the comment says, and automated Sentiment Analysis can’t handle it.
    Hmm, I’m not so sure.
    This comment seems to be saying two things:
    a) Porsche is too expensive
    b) it is one hell of a car
    So the comment is pretty positive about the car, but it is ALSO pretty negative about the price.
    Where most sentiment analysis breaks down is that it is trying to assess whether the comment is positive or negative overall, when this comment is clearly BOTH.
    At Rapide, we call these “Insights”, and we analyse verbatims to find them.
    Because even your customers who love you may have niggles and specifics that thye dislike, and even your customers who are unhappy may actually admit that your staff (for example) were great, but it it the policy they are forced to implement which is hopeless.
    Sentiment analysis is powerful when it digs deeper than an overall score and goes looking for actionable specifics, and it is dynamite when you combine it with human knowledge, experience and context.
    We use our Sentiment Engine (Rant and Rave) to do the heavy lifting and process huge volumes of real-time feedback, so that your team can focus on tackling the actions and driving value out of all these interactions.

Leave a Reply

You must be logged in to post a comment.