SocialSim researchers demonstrate deep learning model capable of accurately classifying sarcasm in textual communications, addressing online sentiment analysis roadblock

OUTREACH@DARPA.MIL5/6/2021

Computational Simulation of Online Social Behavior (SocialSim)

Sentiment analysis – the process of identifying positive, negative, or neutral emotion – across online communications has become a growing focus for both commercial and defense communities. Understanding the sentiment of online conversations can help businesses process customer feedback and gather insights to improve their marketing efforts. From a defense perspective, sentiment can be an important signal for online information operations to identify topics of concern or the possible actions of bad actors. The presence of sarcasm – a linguistic expression often used to communicate the opposite of what is said with an intention to insult or ridicule – in online text is a significant hindrance to the performance of sentiment analysis. Detecting sarcasm is very difficult owing largely to the inherent ambiguity found in sarcastic expressions.

“Sarcasm has been a major hurdle to increasing the accuracy of sentiment analysis, especially on social media, since sarcasm relies heavily on vocal tones, facial expressions, and gestures that cannot be represented in text,” said Brian Kettler, a program manager in DARPA’s Information Innovation Office (I2O). “Recognizing sarcasm in textual online communication is no easy task as none of these cues are readily available.”

Researchers from the University of Central Florida working on DARPA’s Computational Simulation of Online Social Behavior (SocialSim) program are developing a solution to this challenge in the form of an AI-enabled “sarcasm detector.” The researchers have demonstrated an interpretable deep learning model that identifies words from input data – such as Tweets or online messages – that exhibit crucial cues for sarcasm, including sarcastic connotations or negative emotions. Using recurrent neural networks and attention mechanisms, the model tracks dependencies between the cue-words and then generates a classification score, indicating whether or not sarcasm is present.

“Essentially, the researchers’ approach is focused on discovering patterns in the text that indicate sarcasm. It identifies cue-words and their relationship to other words that are representative of sarcastic expressions or statements,” noted Kettler.

The researchers’ approach is also highly interpretable, making it easier to understand what’s happening under the “hood” of the model. Many deep learning models are regarded as “black boxes,” offering few clues to explain their outputs or predictions. Explainability is key to building trust in AI-enabled systems and enabling their use across an array of applications. Existing deep learning network architectures often require additional visualization techniques to provide a certain level of interpretability. To avoid this, the SocialSim researchers employed inherently interpretable self-attention that allows elements in the input data that are crucial for a given task to be easily identified. The researchers’ capability is also language agnostic so it can work with any language model that produces word embeddings. The team demonstrated the effectiveness of their approach by achieving state-of-the-art results on multiple datasets from social networking platforms and online media. The model was able to successfully predict sarcasm, achieving a nearly perfect sarcasm detection score on a major Twitter benchmark dataset as well as state-of-the-art results on four other significant datasets. The team leveraged publicly available datasets for this demonstration, including a Sarcasm Corpus V2 Dialogues dataset that is part of the Internet Argument Corpus as well as a news headline dataset from the Onion and HuffPost.

DARPA’s SocialSim program is focused on developing innovative technologies for high-fidelity computational simulation of online social behavior. A simulation of the spread and evolution of online information could enable a deeper and more quantitative understanding of adversaries’ use of the global information environment. It could also aid in efforts to deliver critical information to local populations during disaster relief operations, or contribute to other critical missions in the online information domain. Accurately detecting sarcasm in text is only a small part of developing these simulation capabilities due to the extremely complex and varied linguistic techniques used in human communication. However, knowing when sarcasm is being used is valuable for teaching models what human communication looks like, and subsequently simulating the future course of online content.