clock menu more-arrow no yes mobile

Filed under:

How to read a controversial preprint paper on Covid’s origins

A trio of researchers claimed they found likely evidence that the virus that causes Covid-19 was synthetic. And then scientists went to work picking the theory apart.

Four roughly circular blobs, each with a halo of spike proteins.
A magnified coronavirus.
BSIP/Universal Images Group via Getty Images
Kelsey Piper is a senior writer at Future Perfect, Vox’s effective altruism-inspired section on the world’s biggest challenges. She explores wide-ranging topics like climate change, artificial intelligence, vaccine development, and factory farms, and also writes the Future Perfect newsletter.

On October 27, Valentin Bruttel, a molecular immunologist at the University of Wurzburg, Germany, and co-authors Alex Washburne and Antonius VanDongen released a preprint — a scientific paper that has not yet gone through the process of peer review — with a shocking claim: The original SARS-CoV-2 coronavirus that caused the Covid-19 pandemic did not emerge in animals and jump over to humans, as most scientists assume, but had most likely been synthesized in a lab. And they had developed statistical tests that they said backed up their claim.

This was an all-time “big if true” claim. The ultimate origins of Covid are one of the biggest open questions in science, and if clear evidence emerged that a pandemic that has killed millions began with the work of researchers in a lab, the ramifications for science would be unimaginable.

I have now spent much of the last week looking into the work of Bruttel and his colleagues, and I think their analysis doesn’t hold up — meaning the paper doesn’t help resolve the question of how SARS-CoV-2 originated.

But while usually I wouldn’t bother writing about a preprint that doesn’t stand up to in-depth scrutiny and may never be peer reviewed and fully published at all — for one thing, there have been tens of thousands of preprints on Covid alone — in this case I think it’s worth it. That’s because the researchers’ original claim circulated widely and is worth thoughtfully answering, and because the preprint and the response represent both the best and the worst in how our scientific institutions and processes converge on truth.

First, some biology

Everything made out of RNA — including SARS-CoV-2 — is made up of strings of four nucleotides: adenine (A), uracil (U), guanine (G), and cytosine (C). There are about 30,000 of these nucleotides in the genome for SARS-CoV-2. Given how small the genetic alphabet is, that means lots of short strings of nucleotides will recur over and over just by coincidence.

When researchers are conducting lab work on viruses, they take advantage of certain short strings, called restriction sites, that appear repeatedly in the genome. Those strings will bind to certain enzymes the researchers use to cut and glue segments of the virus, which enables them to assemble whole genomes from short sequences, swap out different sequences to study different things, and more.

There are many different ways to do cutting and gluing work in viruses — different enzymes to use, different details of how to employ them. The researchers in the preprint argue that coronavirus research at labs — including the Wuhan Institute of Virology in 2017, 2018, and 2019 — involved working with two specific enzymes and adding restriction sites in strategic locations in order to enable further work. They then go on to claim that the genome for SARS-CoV-2 has unusually evenly spaced restriction sites, along with a few other characteristics, that statistical analysis suggests would be more likely to be the product of lab synthesis rather than the randomness of natural evolution.

This is an interesting approach, and I’m excited about the general concept of identifying the genetic fingerprints of synthesized viruses, given that the risk from engineered viruses will only rise in the future. But after talking to researchers in virology, microbiology, and comparative genomics, I don’t think the restriction site patterns in SARS-Cov-2 are suspicious.

Viruses recombine with each other constantly, sharing chunks of their genetic code each time. Each of the restriction sites identified in the paper is present in other SARS-CoV-2-like coronaviruses researchers have identified in the last few years. Critically, the sequences around the restriction site in SARS-CoV-2 also tend to match the surrounding RNA in the other coronaviruses —suggesting that the whole segment was lifted into SARS-CoV-2 all at once.

“This would have to imply that someone not only modified the RE [restriction] sites to match natural viruses, but also unrelated nearby sites as well — but it is unclear to me why anyone would do such a thing,” Alex Crits-Christoph, a genomics researcher at Johns Hopkins University, told me. And even if some researcher did do that for some reason, the statistical analyses in this paper wouldn’t detect it — they only make sense if you assume a different virus-synthesis strategy.

That doesn’t settle the question of the origins of Covid, of course. When I talked to Washburne, he pointed out that it’s possible these coronaviruses could have been recombined in a lab. But it means that this paper — which drew statistical inferences from the idea that a particular cloning strategy was used to modify the virus — doesn’t explain anything that needs explaining.

Even scientists who think a synthetic origin for Covid-19 is a very real possibility — such as Alina Chan, a molecular biologist at the Broad Institute of MIT and Harvard who has made the case we need a full investigation to determine whether Covid was naturally or synthetically occurring — told me they thought this paper couldn’t prove the strong claim it was making.

Next, some meta-science

So, that’s the answer — which scientific debate on the open internet arrived at quite quickly. At times, though, the process of getting there was a bit ugly.

While some researchers started out intrigued by the paper, others argued it was not just wrong but obviously, blatantly wrong — “poppycock dressed up as science, with a heavy dose of technobabble on the side,” Kristian G. Andersen, an immunologist at the Scripps Research Institute in San Diego, tweeted, adding, “it wouldn’t pass kindergarten molecular biology.” (As the parent of a kindergartener, I think he may be overestimating the rigor of our molecular biology curriculum.)

One of the few well-respected scientists who defended the paper deleted his Twitter account after an uproar. Washburne, one of the authors, posted his entire academic history in response to allegations he’s a fraud who didn’t actually have a PhD. (He does, in fact, from Princeton.)

Obviously, it’s frustrating to be a scientist when a flawed preprint is released and circulates widely, including a skeptical but serious treatment in the Economist, while the details of why it’s unconvincing are hard to explain. It’s understandably exasperating to scientists that it takes far more effort to refute work than to put it out there in the first place, and it’s shocking how far a piece can travel by the time researchers look into it more closely.

The Covid origins debate in particular has been suffused with such advocacy through preprints. This spring, the same thing happened for the other side: A New York Times feature article was timed to the release of a preprint making the case for natural origins, to the chagrin of scientists who felt the New York Times should have waited until there’d been more scientific engagement with the study. Substantial changes were made to the scientific study in the course of the peer review process, and the peer commentary process is still ongoing, but most of the audience doesn’t follow the peer review process. They just read the New York Times.

Does that mean we should give up on preprints (or at least that we, the media, should refrain from publishing articles about them)? I’m not quite sure. Firstly, peer-reviewed articles can also be blatantly flawed and full of holes. To write about science, you have to do due diligence; whether an article is a preprint or not can affect your calculus, but it shouldn’t be the sole determinant.

Secondly, the conversation about lab origins has taken on a distinctly conspiratorial bent. Many of those who believe that Covid originated in a lab also believe that the scientific community has organized to suppress the proof and punish anyone who speaks it. To counter that, sunlight really is the best disinfectant.

I reached out to many different researchers to understand this paper, and I think the people worried that lab origins are being unduly suppressed would be cheered by what I heard. Researchers gave initial impressions, read the paper, and refined their initial impressions. Some of them developed their own quick tests of the robustness of the statistical conclusions. People tweeted graphs, arguments, counterarguments, and yes, the occasional insult. We all got to see how the sausage got made, and honestly, it wasn’t that gross.

For all the social media furor, in this case I think that our scientific process basically did its job — which meant determining that this paper doesn’t really move our understanding of Covid origins. In the ages before preprint servers, all this anger and all the insults — but also all of this genuine truth-seeking and intellectual curiosity — would have happened behind closed doors. I’m not very sorry it now happens in the open instead — though I do think journalists have a lot of work to do to make sure that the truth can catch up with rumors.

A version of this story was initially published in the Future Perfect newsletter. Sign up here to subscribe!

Sign up for the newsletter Today, Explained

Understand the world with a daily explainer plus the most compelling stories of the day.