The Strange Case of Pneumonia in 2019

PART I

What Social Media Analysis can tell us about Early Detection of COVID-19, Legionnaires’ Disease and other Anomalies.

The analysis of social media data is a very powerful tool for the early detection of key events. Studying specific keywords on Twitter suggests the presence of a strange increase in cases of pneumonia in late 2019, most likely linked with the COVID-19 pandemic (an interesting idea outlined in a Nature paper in January 2021).

The analysis focuses on the number of tweets & users mentioning specific keywords during a certain period of time. To showcase the power of our tools we have reproduced and integrated part of that analysis by adding our natural language processing tools.
We have focused on two countries: Italy and France, with a total of about 14,000 unique tweets. For those interested in the statistical methods and datasets, we refer to the original paper.

Let’s first have a look at the (normalized) cumulative number of tweets reported in Italy and France from January 1, 2018 to April 1, 2020. Some steep spikes are seen in early 2020 for both countries, with Italy showing some strange knees with sudden increases in the number of tweets during the summers of 2018 and 2019. Nothing like that is seen in France. We will come back on those knees later on, but for now let’s focus on the spike in early 2020.

Cumulative Tweets: Italy

The plot shows the cumulative number of tweets reporting the word “polmonite” (pneumonia) from January 1, 2018, until April 1, 2020. Two knees are seen in summer 2018 and 2019, plus a steep spike in early 2020.

Cumulative Tweets: France

The plot shows the cumulative number of tweets reporting the word “pneumonie” (pneumonia) from January 1, 2018 until April 1, 2020. A steep spike is seen in early 2020 with no other anomaly.

To better understand what is happening, we first look at the number of unique users tweeting in Italy and France during the considered timeline. If nothing relevant is happening, we should expect a random fluctuation in the number of users. Also, we should see no specific difference between the number of users in 2018-2019 and 2019-2020. However, we detect a clear difference between the total number of users in early 2020 and those in early 2019. As early as November 2019, France actually shows a significant increase in the number of users mentioning pneumonia. Furthermore, something strange is detected again in Italy during the summers of 2018 and 2019 with two bumps (see figures below) that match the knees seen in the cumulative number of tweets in the plots above.

Unique Twitter Users: Italy

The number of unique users using the Italian language and mentioning the keyword “polmonite”. The x-axis indicates the month. The red line refers to the period 03-2019 until the end of 02-2020. The black line from 03-2018 until the end of 02-2019.

Unique Twitter Users: France

Number of unique users using the French language and mentioning the keyword “pneumonie”. The x-axis indicates the month. The red line refers to the period 03-2019 until end of 02-2020. The black line to 03-2018 until end of 02-2019.

According to the method outlined in the Nature paper (Lopriete et al. 2021), one should be able to detect these anomalies when looking at the p-values of the cumulative distributions of the number of tweets. We have repeated their analysis and the result we obtained is compatible with that reported by the authors. Note that we haven’t made any changes to their method of analysis until this moment. This analysis has been done only for the purpose of reproducing their results.

The news reported in Italy about the Legionnaires’ disease outbreak in September 2018.

So far this seems to qualitatively reproduce the results of the paper. However, we want to go some step further and investigate what happened in Italy during the summers of 2018 and 2019. The first interesting result is that between July and October 2018 there were multiple Legionnaires’ disease outbreaks in Italy, broadly reported by the press. The first knee and bump seen on Twitter for Italy, seems to be roughly coincident with the occurrence of those cases.

What about 2019? An inspection of the tweets in August 2019 shows that the second knee and bump is coincident to the time when Maurizio Sarri, at the time coach of the Italian football team Juventus F.C., was reported with a severe case of pneumonia. The comparison of the cumulative distribution of tweets in the two years considered also shows an anomaly around the end of August 2019, in agreement with the findings.

Manifesto Z01

Artificial intelligence is one of the most powerful tools invented by humanity. If used correctly, it can unleash its tremendous potential to improve people’s lives in many respects. AI can accelerate human progress by orders of magnitude, with faster diagnoses of illnesses to smarter homes, increase safety and relieve people from the burden of repetitive tasks. AI embodies our deepest hopes for human dignity, freedom and equality, pushing the human condition towards broader horizons.

Over the last decade AI has started to show its potential and is now having an increasing and tangible impact in our lives. We have also seen its power used for darker purposes that conflict with th idea of progress and freedom. When the AI technology falls in the wrong hands it can be used to manipulate reality and accelerate the spread of misinformation, create malicious synthetic media (deepfakes) and generate more efficient cyber-attacks.

Here we set out our ambitions as a startup entering in the world of AI.


We are committed to fight the misuse of AI technology with countermeasures that reach the root of the problem.


We are committed to invest our time in finding AI solutions that better society.


We are committed to improve the existing technologies by exploring new tools together with diverse research communities.


We are committed to an ethical AI that improves the quality of life, respects the privacy and promotes knowledge.

en_USEnglish