Why are data and analytics important for understanding outbreaks?

As the COVID-19 outbreak progresses, people are looking for information they can trust. They want to know where the disease is spiking and how widely it's spreading; if they’ll be told to work from home or limit travel, as many around the world already have been; if their symptoms warrant getting tested; if they should stock up on supplies.

Editor’s note: Since published, we’ve updated the data source to make sure everyone has uninterrupted access to the latest data. Learn more about the latest data source.

Image credit: Anya A’Hearn/Tableau Public

As the COVID-19 outbreak progresses, people are looking for information they can trust. They want to know where the disease is spiking and how widely it's spreading; if they’ll be told to work from home or limit travel, as many around the world already have been; if their symptoms warrant getting tested; if they should stock up on supplies.

As public health officials develop guidance on these questions—and so many more—they are analyzing constantly evolving data about the disease. Analytics is at the core of the response to COVID-19. To explain how public health professionals are using data analytics to inform their recommendations and courses of action around the virus, Barry Chaiken, MD, MPH the clinical lead for Tableau’s healthcare division, shares his insights from over 25 years of experience in the fields of medical research, epidemiology, clinical information technology, and analytics. Prior to joining Tableau, among other roles Chaiken spent two years at the CDC working as an epidemic intelligence service officer on the front lines of gathering information about the spread of diseases. As the COVID-19 epidemic unfolds, he’s been analyzing the data pouring into sources like Johns Hopkins to understand what information exists and how its shaping the public response to the outbreak.

Tableau: Why is data and analytics important for understanding outbreaks, past and current?

Chaiken: Anytime you have the emergence of a novel disease—in this case it’s a novel virus—it’s important to understand not only the detailed genetic makeup of the disease and where the outbreak is occurring, but also how transmission is occurring, how the virus is acting, what types of people are being affected, and what their characteristics and symptoms are.

If you think about this disease in contrast to Ebola, it’s much more complicated. Ebola was really non-discriminatory in the sense that if you contracted Ebola, regardless of your characteristics or how healthy you were before, you would get very sick. With things like the seasonal flu and with this new COVID-19 outbreak, that’s not the case. We think the disease has a lower infection rate among people under the age of 30, and we think that might have to do with some cross-immunity with the common cold, but is that proven yet? No. And we are seeing, at least in the U.S., that the mortality is associated with older people and those who have some other health condition, but we don’t know if that will hold as the disease spreads through the community.

The bottom line is we have to keep collecting data. We need to be able to identify and track the people who've been infected, what happened to them, how sick they were, what their symptoms were, whether they died, and what type of treatment they obtained. But it’s important to keep in mind that even when we collect this data, we can’t count on it to give us a definitive picture of what’s going on. There will be biases in how the data is collected based on where we think the disease is most prevalent. We have to test much more than we have been, and we have to make more tests available.

Image credit: Kaiser Family Foundation/Tableau Public

How has data collection evolved recently to make this type of wide-scale analysis more possible?
Thankfully, data collection today is much easier than it was 20 years ago. Before, people would have to hand-distribute paper to individuals, who would have to fill it out and send it back in for information to be collected. Now, take the city of Boston—it could send out an email or text message to every single resident to gather an enormous amount of data. So the reason this outbreak seems to be moving so quickly is not just that it is infectious, but because of how interconnected we are digitally. We are kept more up-to-date on the current state of the pandemic. The upside is that if we get the data we need, and get testing going, we can respond much more quickly to an outbreak than we ever could before because we know who is in our communities and how they’re being affected.

With all this data being collected, why is visualization so important?
It's one thing to distribute data across rows and columns to try to make something meaningful out of it. It's a whole other thing to take the data and put it into a dashboard that tells a story. Right now, if you look at data on the outbreak in a spreadsheet, you’ll see lists of points like the location of the case, its latitude and longitude, the status of the person (whether they’ve recovered, died, or are still in active treatment), and the date recorded. Just looking at the spreadsheet, you would have such a hard time figuring out how many deaths, how many illnesses, where the cases are located. There are thousands of records, and you just couldn’t discern any pattern with what’s going on. But with a visualization, you can.

Right now, we’re seeing a number of different responses to the virus—some communities are under quarantine, and companies are choosing to implement work from home policies. How is data impacting these types of decisions?
There are no federal standards or guidelines set out by the CDC or other agencies that states, cities, or towns can follow to make decisions about what should or should not be done. There are so many questions along those lines. When should we recommend people work from home? At what point in the spread of the disease in a community do government officials take a particular action? When should we suspend transit service or close schools? Without standards to support those types of decisions, it’s very hard. But data and analytics are what is backing up the decisions that are made. Officials are constantly analyzing data on the number of cases, how quickly they’re growing, and how fast they’re spreading through a community. If somebody makes a recommendation, it’s based on the type of data we have available at the moment. That’s why it’s so important to have up-to-date data: As it changes, the recommendations will change too. And visualizations allow the decision-maker to quickly understand the information contained in the data and to take action.

What would public health officials need to see in the data in order to begin to roll back some of the preventative measures that have currently been implemented?
So first off, you’d want to have a sense of the distribution of the disease because that would help you determine if it’s localized in one very small area or if it’s spread. And if it’s spread, then you know what you need to look for is the trajectory of new cases, if they’re rising or tapering off. Assuming that there is adequate testing, an increase in positive tests indicates that the outbreak is spreading and a decrease in cases indicates that the outbreak might be dying out. What you’re looking for is the point at which testing is widespread enough to capture the majority of cases, and we begin to see the number of new cases drop. At that point you can begin to see that the epidemic is on a strong enough trajectory toward being over—even if the number of cases isn’t at zero—that it might be the time to reverse the steps taken to mitigate disease spread such as school closures. It’s like the aftermath of a hurricane—it takes a while for things to get going again.

Image credit: CNBC/Tableau Public

What would you say, looking at the available information and given your background in this field, is the number one takeaway people should have about where we’re at right now?
The main thing for people to know is not to panic. We are in the midst of a major health crisis, but it’s important that we do what we can to keep ourselves and communities safe.

So, what can we do? Wash your hands and practice social distancing, I know, it sounds so un-technical, but it’s one of the basic tenets of addressing any epidemic or pandemic. It’s easy, and it works. Especially for people who have other health concerns and comorbidities or who are in the groups we believe to be especially vulnerable, these practices are essential.

And the last thing—this is incredibly important. Our health care workers need their supplies, such as masks, gowns, and gloves. If the general public over-purchases them, our health care workers won’t have enough to protect themselves. If our healthcare workers cannot protect themselves, and they get sick, there will not be enough of these courageous and selfless professionals to take care of any of us if we are sick.

Finally, what do you think we might learn from this outbreak—in terms of data, in terms of best practices—that we might be able to apply to future scenarios?
There is nothing more important in public health than case definitions and case identification. If you don’t know what you’re looking for, and you don’t have a test or a process to look for it, we won’t find it. What we’re seeing in this outbreak is that if we believe there’s a threat, we need to go out there and immediately figure out how we’re going to identify those cases, track them, and collect as much information as possible on those cases so we can get a handle about what's going on using analytics. We need to collect as much data as we possibly can, and let's use the technologies that we have to do that.

We also need to use the fact that we can share information quickly around the world to our advantage. The U.S., for instance, was incredibly slow in responding to the warnings coming from China. I’m not saying we could have contained the virus or stopped its spread in the U.S...We could have gotten the tests in place and built up the capacity of health facilities to do testing before the disease began spreading—that would mean we would already have much more data and information to work with. Being better prepared would also enable us to flatten the peak of the outbreak so we don’t overburden our medical facilities with patients all at once. The most valuable thing we have is time to respond to an epidemic. The sooner we get prepared, the sooner we can collect information, and the better off we all will be.

To help further understanding of the rapidly evolving COVID-19 situation, Tableau has developed a free resource hub with connections to data from Johns Hopkins and jump-start visualizations to support analysis. Access the hub here.