What Social Media Analysis can tell us about Early Detection of COVID-19, Legionnaires’ Disease and other Anomalies.
The analysis of social media data is a very powerful tool for the early detection of key events. Studying specific keywords on Twitter suggests the presence of a strange increase in cases of pneumonia in late 2019, most likely linked with the COVID-19 pandemic (an interesting idea outlined in a Nature paper in January 2021).
The analysis focuses on the number of tweets & users mentioning specific keywords during a certain period of time. To showcase the power of our tools we have reproduced and integrated part of that analysis by adding our natural language processing tools.
We have focused on two countries: Italy and France, with a total of about 14,000 unique tweets. For those interested in the statistical methods and datasets, we refer to the original paper.
Let’s first have a look at the (normalized) cumulative number of tweets reported in Italy and France from January 1, 2018 to April 1, 2020. Some steep spikes are seen in early 2020 for both countries, with Italy showing some strange knees with sudden increases in the number of tweets during the summers of 2018 and 2019. Nothing like that is seen in France. We will come back on those knees later on, but for now let’s focus on the spike in early 2020.
Cumulative Tweets: Italy
The plot shows the cumulative number of tweets reporting the word “polmonite” (pneumonia) from January 1, 2018, until April 1, 2020. Two knees are seen in summer 2018 and 2019, plus a steep spike in early 2020.
Cumulative Tweets: France
The plot shows the cumulative number of tweets reporting the word “pneumonie” (pneumonia) from January 1, 2018 until April 1, 2020. A steep spike is seen in early 2020 with no other anomaly.
To better understand what is happening, we first look at the number of unique users tweeting in Italy and France during the considered timeline. If nothing relevant is happening, we should expect a random fluctuation in the number of users. Also, we should see no specific difference between the number of users in 2018-2019 and 2019-2020. However, we detect a clear difference between the total number of users in early 2020 and those in early 2019. As early as November 2019, France actually shows a significant increase in the number of users mentioning pneumonia. Furthermore, something strange is detected again in Italy during the summers of 2018 and 2019 with two bumps (see figures below) that match the knees seen in the cumulative number of tweets in the plots above.
Unique Twitter Users: Italy
The number of unique users using the Italian language and mentioning the keyword “polmonite”. The x-axis indicates the month. The red line refers to the period 03-2019 until the end of 02-2020. The black line from 03-2018 until the end of 02-2019.
Unique Twitter Users: France
Number of unique users using the French language and mentioning the keyword “pneumonie”. The x-axis indicates the month. The red line refers to the period 03-2019 until end of 02-2020. The black line to 03-2018 until end of 02-2019.
According to the method outlined in the Nature paper (Lopriete et al. 2021), one should be able to detect these anomalies when looking at the p-values of the cumulative distributions of the number of tweets. We have repeated their analysis and the result we obtained is compatible with that reported by the authors. Note that we haven’t made any changes to their method of analysis until this moment. This analysis has been done only for the purpose of reproducing their results.
So far this seems to qualitatively reproduce the results of the paper. However, we want to go some step further and investigate what happened in Italy during the summers of 2018 and 2019. The first interesting result is that between July and October 2018 there were multiple Legionnaires’ disease outbreaks in Italy, broadly reported by the press. The first knee and bump seen on Twitter for Italy, seems to be roughly coincident with the occurrence of those cases.
What about 2019? An inspection of the tweets in August 2019 shows that the second knee and bump is coincident to the time when Maurizio Sarri, at the time coach of the Italian football team Juventus F.C., was reported with a severe case of pneumonia. The comparison of the cumulative distribution of tweets in the two years considered also shows an anomaly around the end of August 2019, in agreement with the findings.