Machine learning can measure mood on social media

Mood is a unique way for researchers to try to measure the impact of natural or unnatural disasters on people. However, it is simply impossible to ask every person in the world how they feel following a major event.

But scientists from the Massachusetts Institute of Technology, the Chinese Academy of Sciences and the Max Planck Institute for Human Development have found a workaround. They used machine learning techniques to analyze social media for shifts in sentiment after the first wave of COVID-19 in 100 different countries, and used it to get real-time readings on how pandemic-related events have made people happy or sad across the world. . Think of the process as an AI-powered mood ring, but for millions of people. Their findings were published last week in the journal Nature Human behavior.

Not surprisingly, the researchers found that the onset of the pandemic precipitated a dramatic drop in happiness. To put this plunge into perspective, consider that on a normal week, people tend to feel happier on the weekends and less happy on Mondays. The drop in happiness at the start of the pandemic around March 2020 was four to five times greater than the average drop in happiness from a normal weekend to Monday. The overall mood shift due to the pandemic is greater than the mood shift previously seen in response to a natural disaster like a hurricane or a sharp rise in temperatures. The countries that experienced the biggest mood swings were Australia, Spain, the UK and Colombia, while Bahrain, Botswana, Greece, Oman and Tunisia appeared to be the least affected by the pandemic, according to the researchers’ observations on social media. .

[Related: Can I offer you a nice meme in these trying times?]

How did machines learn to rate posts by mood?

For this study, the team used social media data from Twitter and Weibo collected by the Harvard Center for Geographic Analysis Geotweet Archive and the MIT Sustainable Urbanization Lab. In total, their dataset contained 654 million geotagged posts from 10.56 million individuals in the first five months of 2020.

To teach a machine to measure mood, the researchers started by creating a sentiment index, much like a facial pain scale at the doctor’s office. This sentiment index ranges from 0 (very unhappy) to 100 (very happy). Every post the team collected on Twitter and Weibo was judged on this index. Then, researchers can aggregate the post-specific emotions into a feeling profile for an individual, neighborhood, city, or country.

Unlike the Facial Pain Scale, individuals do not rate their own messages or respond to satisfaction surveys. Instead, the researchers used a machine learning method to assign each post a topic and sentiment score.

Next, they used a machine learning-based natural language processing technique called BERT, or Bi-Directional Encoder Representations of Transformers, to categorize posts by topic and sentiment. (BERT was developed by Google engineers.)

[Related: Artificial intelligence is everywhere now. This report shows how we got here.]

“We wanted to do this global study to compare different countries because they were hit by the pandemic at different times, and they have different cultures, different political systems, and different healthcare systems,” says MIT professor Siqi Zheng. . All of these factors could play a role in how people’s moods have been influenced by the pandemic.

Because they wanted to do a multilingual analysis, they couldn’t use their previous dictionary-based approach, which they had used in a 2019 study to quantify the emotional toll of air pollution in China. The dictionary approach assumes that words have connotations associated with a particular emotion. It relies on tools such as LIWC (the Linguistic Inquiry and Word Count software) and emoji dictionaries. The downside of this approach is that researchers have to compile long lists of words, and they have to come up with a different list for each language they want to study.

The advantage of using machine learning is that it is not language specific. Before applying this technique to the entire sample, the researchers trained it on a small sample of posts and had human researchers verify its work by having it predict sentiments on random posts and comparing its accuracy rates to the dictionary model.

This article on social media responses related to COVID-19 is just one of the results of a long-term project that Zheng’s lab is working on, called “Global Sentiment”, which aims to use treatment techniques natural language to extract information about subjective well-being from social media posts. His lab uses this analysis of social media mood to examine responses to various events, including wildfires, environmental hazards, natural disasters and political news.

“It’s a way of providing a unique angle, a different dimension to quantify the impact of shocks,” she says. Zheng and his colleagues put more detailed descriptions of the codes and methods used in their studies on the Global Sentiment website.

Sherry J. Basler