Over the course of the pandemic, social media sleuths, epidemiologists and health nerds alike began noticing an interesting trend in the review section for Yankee candles on Amazon.
Whenever there was an influx of negative reviews citing no smell, there was usually a spike in COVID cases to go along with it.
Losing your sense of smell is one of the more recognized symptoms of an infection. After noticing this trend, people began to ask: could the reviews themselves be a reliable indicator of a surge in the virus?
That theory was put under the microscope, and has taken on new relevance amid concern at the lack of official data tracking infections across the U.S. heading into another winter.
How a review became a warning sign
Nick Beauchamp is an associate professor of political science at Northeastern University and first caught wind of the Yankee Candle theory late last year.
He decided it wouldn't be too difficult to find out if there was actually a link. And having focused on previous projects that attempted to predict COVID cases using social media data, he sought to create a model to test it.
"I just thought, well, it's easy enough to do. Maybe I'll just try scraping some Amazon reviews and see what the actual trends are, as opposed to just cutting and pasting a few reviews that mention a lack of smell," Beauchamp said.
To his surprise, the relationship was clear; COVID cases followed a very similar pattern to the frequency of the reviews.
Beauchamp's initial tweets on the findings in December 2021 went viral as well, and he scrambled to add more data to find a definitive answer. By mid-January, he had written a paper and submitted it to a journal, and by June of this year it had been published.
"It's a very small paper, but it's one that I think has caught a lot of people's interest, particularly because it's trying to do slightly more carefully something that a lot of people have been noticing qualitatively on Twitter," he said.
Ultimately, the results from the paper showed that COVID cases were predictive of the reviews, meaning that if there was a recorded surge in COVID cases, there would likely be an increase in the negative reviews. But could it work the other way around?
"The other thing that I was trying to find was, 'Can we predict COVID cases using the reviews?' And what we found was that at least up through December of 2021, not really. Using the past COVID cases to predict future COVID cases is pretty good, and you can't really do any better using the reviews."
But then something happened. After adding more months of data to his model in June this year, he found that the relationship between the reviews and COVID rates had swapped again: the reviews were now predictive of COVID rates.
In other words, the rise in negative reviews might actually be an earlier warning sign than the official COVID data.
"That is either due to lack of measurement of COVID, or worse measurements of COVID, or maybe something else changing. I presume the reviews themselves weren't changing very much," Beauchamp said.
One interesting reaction Beauchamp observed was the tweets and the study itself have evolved into their own meta-data sets, gaining popularity again when users are noticing a surge in COVID cases.
Some researchers refer to these trends as "digital breadcrumbs," because online activity, like searches, interacting with old Twitter threads, or in this case, leaving a review, can give unique insight into a person's real life circumstances.
As for Beauchamp, he maintains a healthy level of skepticism for the study, even with all of his controls.
Why some believe the official data is a "big mess"
These days, the quality of COVID tracking has become a cause for concern for Beauchamp and other experts working with public health data, especially as President Joe Biden declared the pandemic "over".
"The traditional data sources are getting worse. The CDC is sort of cutting back on its measurements. Everybody's measuring themselves less frequently. They're reporting these things to government agencies less frequently," Beauchamp said.
He also cited reduced wastewater measurements, and said the frequent attention on the Yankee Candle reviews was an example of how many people were still invested in tracking COVID numbers.
"Those of us who sort of still care about and worry about the pandemic, and don't think that it's over, are grasping around for other sources of data that can be used to track new waves and that sort of thing," he added.
Abraar Karan is an infectious disease doctor and researcher at Stanford University and said the evolving nature of the virus had made it difficult to pinpoint and sustain the most efficient ways of collecting and analyzing COVID data, especially three years into the pandemic.
"If we look back to the beginning of the epidemic, every case that we were documenting mattered a lot. And we were trying to figure out what to do with that data," Karan said.
As time passed, new issues presented themselves, like reinfections and how to document them. Karan also cited the reduction of testing and its decentralization as other hurdles. Many people have stopped testing frequently, if at all, and those choosing to test at home often do not report their results to public health departments.
But at this point in the pandemic, Karan said tracking some key sources, even if they were less robust than years prior, had proved to be an effective strategy, given the breadth of data that is available from past years.
He said observing trends in reported cases was the clearest method, as long as there was no recent shift in the amount of testing available.
"The most relevant question I get asked, as a doctor or epidemiologist, is, 'What is the risk of me doing X activity to contract SARS-CoV-2?' And frankly, you can really largely at this point, answer that based on the [trend] activity, and less so on what's going on around you, because the data is a big mess," he said.
Karan also noted that wastewater data could be immensely useful, even if it was not very precise in measuring case numbers.
Ultimately, Karan said a combination of data sources could help experts and regular folks make the best decisions for themselves in regards to their COVID safety.
"People are constantly weighing these risks and benefits based on limited data, but data nonetheless. So you can triangulate a lot of things, like all the things we just talked about to get somewhat of an assessment of where we are with new variants," he said.
And when it comes to including the Yankee candle data in the mix?
"These kinds of things are used in public health more for research. But at this point in COVID, I don't think candle reviews are going to change our public health strategy," he said.
Instead, it could be an indication that there is more untouched data online that could be useful for the common good. And if there is, Beauchamp is all for it.
"It's better to join together in some sort of movement here, if we can," he said. "So I'm happy to be a small part of that."