How to Fight COVID-19 with Data

21 April 2020

Each day during the Coronavirus lockdown, most of us listen to the news and daily statistics around numbers of cases and death rates. We’ve also heard about the need to “flatten the curve” i.e. the need to take protective measures to slow the spread of COVID-19 so that the healthcare system can cope with the outbreak at its “peak”.

Our lives are currently dictated by data modelling, data visualisation, and a data-led decision making process that needs to follow a few basic principles to get the most from data.

1. Set clear goals

Every organisation is inundated with data and can easily get swamped unless data is used smartly to drive a clear purpose. For example, to reduce costs, increase sales, increase supply chain visibility, or expand into new sectors / geographic markets.

The clear COVID-19 goal has been to “flatten the curve” to enable the healthcare system to function within its current capacity at the peak of the pandemic. It means that relevant datasets can be collated, managed, analysed and visualised to gain a one true picture focused on that goal. 

The clear communication of that goal also means that the restrictive measures imposed are understood by the public who can (must) see for themselves, through the data visualisation, whether the measures taken by the Government are delivering success.

2. Know your data sources

The Government is often an authoritative source of information. This remains the case for COVID-19 but the reliability depends on a number of factors including what statistics are collected, who collects it, what method is used for collation, and inter-Department collaboration within Government.

For example, the number of cases reported. The reality is that no country knows the real statistics of COVID-19 cases unless testing has been completed and there has been a significant difference in the test coverage in different countries.

Without widespread testing, other forms of data input have to be used, including number of calls to 111 or 999. Even then, we know that the number of people showing symptoms is not the same as the number of total people who were infected by the virus (i.e., those not showing symptoms and not being tested).

The COVID-19 death rate only includes data from NHS hospitals but does not include other social care institutions that do not come under NHS England but would have been covered under Department of Health & Social Care. The Office of National Statistics has recently reported that 1 in 10 deaths occur outside the NHS and estimated that (as of 14th April) there are approximately 2,100 more deaths due to COVID-19 than reported by the Department of Health. Therefore the sombre occasion on 12th April when the UK passed 10,000 deaths was actually wholly inaccurate. Imagine, the scale of inaccuracy on a global level?

In the future, we need to create a single body within each country that is accountable for coordinating and holding near-real time national statistics that can be used as the single-point-of-truth for the entire country and can collaborate with international bodies. Furthermore, there should also be a recognised international body or academic institution tasked with maintaining the global statistics and sharing datasets to partner institutions and Governments.

We also need to change how we approach public health data. Crowd-sourced datasets will become more powerful and can enhance traditional sources. For example, mobile apps such as the C-19 COVID Symptom Tracker App (see below, 750,000 downloads in 24 hrs when launched) and individual health testing devices could play a greater role to gather data from individuals across the country.  These apps can help scientists conduct research as well as report cases and geographic locations of outbreaks.

3. Cleanse your data

Whilst not the sexiest of subjects, poor data quality is the number one failure of most data projects. If data is being used to drive decisions it must be accurate, free of inconsistencies, and provide the decision maker with confidence in that data. Not only will fake news and media / social media sensationalism create panic but so will the use of poor-quality data by public health officials.

Academics such as those from John Hopkins University (JHU), have been aggregating international COVID datasets (for use from their JHU Github page) to provide consistency and maintain currency through daily updates. Those data sets have been taken by Tableau and cleansed, reshaped, visualised and made data ready for analysis.

Data quality is so often ignored because it’s hard to measure the value of inaccurate data or missing data but there is a cost. If testing was available earlier or conducted at scale, and the real daily number of cases were known to the Government, how much earlier could preventative measures been put in place and how many lives would have been saved? 

Decision makers can only make the decisions based on the data that is available to them at any one point in time. There must now be an obligation on organisations to provide those decision makers with the best data to make those decisions and ensure that proper data cleansing is conducted. This needs to be subject to a future kitemark for data quality. 

4. Visualise near real-time data

We’ve all seen from the COVID-19 crisis that yesterday’s figures are pretty irrelevant other than an input for trends. Instead, there’s an importance for real time data to feed the predictive models upon which to base policy decisions and prioritise scarce resources, both people and equipment. 

We can only use data when the data sources have been fused together, cleansed so the data is accurate and complete, and visualised in a way that can be easily interpreted in order to make decisions.

There are many ways to visualise data simply and clearly through software such as PowerBI or Tableau, or through bespoke technology such as Envitia’s Data Discovery Platform and and visualisation toolkit. Many of these tools will have some cool analytics functions to really focus on insight. 

Johns Hopkins University (JHU) near real-time dashboard (pictured below) uses ESRI’s ArcGIS so the map pulls in information from a wide range of sources to be able to provide daily updates.

This means that everyone should be looking at the same data from the same platform which can then zoom into detail for local information and zoom out to gather national data, as well as dividing data to generate analytics such as age demographics. This will give us a true picture, insight, and trends (see below), rather than just numbers.

This will be the benchmark for many organisations post-COVID 19, i.e. the need to have relevant data available for the decision maker, easily viewable in near real-time with data from trustable sources and at the right quality to enhance the decision-making process.


written by Nabil, CEO

Give us a call on +44 (0)1403 273 173 to see how we can help

Related articles