The Rise of Preprint Archives

As medical research continues to evolve, so do publication platforms.

The traditional peer review publishing model has operated in much the same way since its inception in the 18th century: Authors submit their research, anonymous reviewers critique the work, and the manuscript is ultimately either rejected or deemed worthy of publication. In the latter case, the work often goes through several revisions before being accepted by the journal.

This process ensures the integrity of the published research, but lengthy delays in publication due to the review and revision process have led critics to search for alternative avenues for disseminating research results.

“About 30 years ago, a physicist who was trying to figure out how to get comments on his work started taping [his papers] outside of his office door for people to come by and provide feedback,” recounted Harlan Krumholz, MD, of Yale University and Yale New Haven Hospital. “With the advent of the internet, the idea evolved first to sharing papers using email, and eventually to the creation of arXiv.”

The preprint server arXiv is operated and maintained by staff at Cornell University in Ithaca, New York, and encompasses articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, systems science, and economics.1 In 1991, arXiv launched as a central repository mailbox where scientists could upload their papers and colleagues could access the files from any computer. As of July 2020, it hosts more than 1.7 million submitted papers.

Following the model of arXiv, several other servers have emerged for sharing research results prior to publication in a peer-reviewed journal, including bioRxiv (a preprint server for the biological sciences operated by Cold Spring Harbor Laboratory) and ChemRxiv (a platform operated by five national chemistry professional societies).

“Preprint publishing started in fields where it can take years to get work published,” said Nancy Berliner, MD, of Brigham and Women’s Hospital and Dana-Farber Cancer Institute, and Editor-in-Chief of the American Society of Hematology’s Blood journal. “It was a way of establishing priority for the person who was first to put something out there, so they can take credit for it without waiting for it to be published by a journal months to years later. If such papers are then submitted, peer-reviewed, and published, the authors may have benefited both from suggestions made while on the server and also by having been established as the first investigators to report the work.”

Preprint publishing was slower to take hold in the area of medical literature, in part because of the perception that erroneous, non–peer-reviewed information might influence patient care. However, in June 2019, Cold Spring Harbor Laboratory, The BMJ, and Yale University partnered to launch medRxiv, a preprint server for medical, clinical, and health sciences research.2 MedRxiv is the latest in a group of free online archives designed to display and distribute unpublished scientific manuscripts.

Recently, preprint servers have gained prominence as scientists conducting research on the novel coronavirus rush to share knowledge about this global health crisis. Typically, early findings published on preprint servers would go through a vetting process by the scientific community, but with the world waiting anxiously for any information about the COVID-19 pandemic, the preliminary results from small studies are being amplified on a larger stage before researchers have had a chance to offer comments and criticisms.

According to Dr. Berliner, this scenario highlights one of the pitfalls of the preprint model – and the importance of traditional peer-reviewed publishing. “The problem arises with the work that never gets peer-reviewed and never gets formally published,” Dr. Berliner said. “Peer review serves to identify weaknesses and errors and facilitates putting the new data in context. That is why it is irreplaceable.”

ASH Clinical News recently spoke with several scientists and physicians about these and other potential advantages and disadvantages of preprint platforms, as well as how the model is affecting traditional research.

What Is a Preprint?

Preprint papers are preliminary reports of work that have not undergone formal peer review.

At its core, the idea behind preprint servers is to get the written version of science out to the public as it is being developed, explained Ross L. Levine, MD, Chief of the Molecular Cancer Medicine Service at Memorial Sloan Kettering Center in New York City and a contributor to bioRxiv.

“It is a way for the community to see your work, engage with it, and comment on it,” Dr. Levine said.

The submission process to a platform like bioRxiv or medRxiv is simple: Authors must upload their manuscript as a Word or PDF file.3 Acceptable content includes original research articles, systematic reviews and meta-analyses, data articles, and articles describing methodological research/investigations, or clinical research design protocols. The work can be submitted to a preprint server concurrently with submission to a journal, but the preprint will not be posted if it has already been accepted for publication, published, or posted elsewhere. Authors also must declare at submission that “all relevant ethical guidelines have been followed, all necessary Institutional Review Board and/or ethics committee approvals have been obtained, all necessary patient/participant consent has been obtained and the appropriate institutional forms archived.”

Per both platforms’ websites, all papers are screened by a group of volunteers in the scientific community for “offensive and/or nonscientific content” and for material that might pose a health or biosecurity risk. Papers also are checked for plagiarism. Once work is deposited on the server, it is citable and cannot be removed, but a revised version can be submitted at any time.

“In the clinical realm, we don’t necessarily have hard and fast rules when screening the papers, but we do look for several things,” explained Dr. Krumholz, who is a cofounder and reviewer for medRxiv. “We look to see if people have some publishing experience, if they have a medical degree, and are from a known institution. If we come across certain papers making questionable medical claims, we may decide that it needs to be vetted more before posting on the server, or that it is not appropriate for the preprint platform.”

There is no assessment of whether the work is “well done,” Dr. Levine added. “There are examples of preprints out there that the public immediately critiqued, and a few have been pulled because of concerns about methodology,” Dr. Levine said. “That is the nature of preprinting. It should be self-correcting.”

Preprint Versus Peer Review

The fundamental difference between preprint servers and traditional journals, even open-access journals, is that a study published in a traditional journal has some sort of seal of approval, explained Richard Sever, PhD, Assistant Director of Cold Spring Harbor Laboratory Press and a cofounder of bioRxiv and medRxiv.

“That seal comes with different levels of review,” he said, noting that journals such as the New England Journal of Medicine (NEJM) or Nature may have more stringent review and higher standards for publication, compared with a lesser-known journal. “The key thing, though, is that the journal is putting its name on the work and saying, ‘We have evaluated this work, sent it for peer review, and we think it is good enough to be in our vessel.’”

For example, Blood’s peer review policy details that manuscripts are judged on quality, novelty, and scientific importance.4 In addition, all manuscripts are “judged in relation to other submissions currently under consideration.”

Blood’s policy also allows for certain manuscripts reporting “exceptional findings that merit rapid publication” to receive Fast Track peer review. Similarly, the Journal of Clinical Oncology (JCO) has a Rapid Review process for manuscripts deemed “timely and late breaking.” In these cases, peer review could be completed within 72 hours of reviewer assignment, JCO’s website states.5

Despite these efforts to expedite the process, the aggregate effect of peer review is a delay in dissemination of information, according to Dr. Sever. “The delay is 8 months on average, but the range can go to 2 or 3 years,” he said.

In 2018, Dr. Krumholz and colleagues published a study examining the age of clinical trial data at the time of publication.6 Looking at results from 341 trials published across six journals in 2015, the median time from the completion of data collection to publication in a high-impact journal was almost 3 years. Long publication times are not the only factor preventing the rapid dissemination of trial results, the authors concluded, but represent an area with great opportunity for improvement.

“There are examples of preprints out there that the public immediately critiqued, and a few have been pulled because of concerns about methodology. That is the nature of preprinting. It should be self-correcting.”

Ross L. Levine, MD

A 2019 study looking at factors associated with publication speed for general medical journals showed a shorter, but still significant, time from submission to acceptance (4 months) and acceptance to publication (2 months).7

“All these delays occur at times when other scientists could be working on building off of the results,” Dr. Sever said. “The cumulative effect [of preprints] is that everybody has a head start, which has the potential to make science move faster.”

Dr. Krumholz compared the process to submitting data to a medical meeting. Research is submitted in the form of abstracts, except that with online prepublication platforms, scientists do not even have to wait the few months or more until the next major medical meeting.

The Pros of Preprints

Reducing the time to publication is just one of the advantages of preprint servers, according to advocates. Preprints also aid researchers in the grant application process, Dr. Levine said.

“Instead of writing an application and saying you have a paper in review at a journal, you can provide a link to the preprint,” he explained. “The grant reviewer can see the paper has been completed and look at the details of your work. Often, publications that provide important background are not yet through the traditional peer-review process when grants are reviewed.”

Another advantage: keeping up with scientists who are doing similar work.

“For many of us, reading preprints is how we pay attention to the state-of-the-art science,” Dr. Levine said. “Preprints influence us and our work before a published paper ever comes out. It allows us to adapt and change our science.”

In addition, scientists who post their work to preprint servers have access to instant feedback, which may help them respond to peer reviewers from traditional journals or optimize design of the next phase of their research.

Within 3 months of its June 2019 launch, medRxiv had more than 200 preprints available on its server; at the beginning of August 2020, it had more than 9,400.

“There is no doubt that bioRxiv for laboratory-based science and medRxiv for clinical work have become dominant vehicles,” Dr. Levine said. “They are very user-friendly, they link well to social media, and people have found them to be great venues to get science out.”

When viewing a paper on the website, readers can see comments and related Twitter posts just by scrolling down on the page where the article appears.

“When I am working on something, I want to know if someone else has done the same thing,” Dr. Krumholz said. “Now, by posting on medRxiv, I can see other people who are working on the same thing. Maybe I can collaborate with them or learn from them about what works and what doesn’t.”

Perilous Preprints

Critics of preprint servers, on the other hand, worry that this easy access makes the platforms dangerous – especially when it comes to preprints that tackle clinical practice issues.

David M. Maslove, MD, of Queen’s University in Ontario, addressed this concern in a 2018 editorial published in JAMA. “Clinicians know well the exuberance with which patients search the internet for information regarding health conditions, digesting reports from all manner of sources,” he wrote. “Added to this is that the technical barriers to reading research reports may be lower in the clinical sciences than the physical sciences, making the former more accessible. Patients may be exposed to early, unsubstantiated claims relevant to their conditions, while lacking the necessary context in which to interpret these.”8

Dr. Maslove also worries that authors will begin to cite preprints in other articles, which could “overburden editors, reviewers, and readers, who may need to parse reference lists to determine if non–peer-reviewed work has been cited.”

An analysis posted on bioRxiv looked at all preprints posted on the server between November 2013 and December 2017, then matched them to articles subsequently published in peer-reviewed journals. The authors found “empirical evidence that journal articles which have previously been posted as a preprint on bioRxiv receive more citations and more online attention than articles published in the same journals which were not deposited, even when controlling for multiple explanatory variables.”9

“We are extremely nervous about clinical results that may have practice-changing impact being published on a server without peer review.”

Nancy Berliner, MD

Dr. Sever said that identifying which studies have been through peer review when citing published work is important. For example, readers may see a citation list that includes a preprint paper, with “PREPRINT” appearing next to the reference to draw people’s attention to the fact that it is not peer-reviewed data.

“I always point out too, though, that there are loads of other things we cite that are not peer-reviewed, like books, editorials, review articles, lab manuals, and more,” Dr. Krumholz said.

He also pointed out that even peer-reviewed reports should be read with a critical eye.

“Just because something has gone through peer review does not mean it is true,” Dr. Krumholz said, citing articles that have been published in peer-reviewed journals but subsequently withdrawn after publication due to errors or scientific fraud. “Peer review is another layer of caution, but readers need to always ask the hard questions.”

Dr. Berliner agreed that preprint servers play an important role in biomedical research and can substantially aid researchers by providing feedback that may be helpful when finalizing a manuscript for publication. In that case, the final product is the published manuscript.

“However, you can also put things on preprint servers that will never be published or have expert peer-review critique,” Dr. Berliner said. “That is the tension that exists with preprint servers and peer-reviewed research. That is why we are extremely nervous about clinical results that may have practice-changing impact being published on a server without peer review.”

She offered this example: “If you put science experiments on a preprint server and they remain unpublished and the data are not correct, the worst that might happen is someone else may waste their time trying to repeat the experiment,” she said. “Putting clinical recommendations on preprint servers is potentially much more dangerous, as it could be lethal to patients.”

This worry is amplified in the current time, when the COVID-19 pandemic has made health-care providers desperate for guidance. “Untested therapies and unsubstantiated observations that appear on preprint servers without oversight and peer review could lead to patients having adverse outcomes,” Dr. Berliner added.

MedRxiv has a prominent cautionary statement on its website’s home page indicating that data found there “should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.”

Proponents of preprint servers, Dr. Berliner said, often say that people are smart enough to make their own judgments about the validity of the results, but that may be putting too much faith in the end user.

“As a primary source of information that allows for the bypass of true scientific review, [preprints] are dangerous,” Dr. Berliner said. “Especially in the clinical realm.”

Preprints in a Pandemic

Nothing has illustrated the growing use of preprint servers more clearly than the COVID-19 global pandemic. The first manuscript on the novel coronavirus that was uploaded to bioRxiv appeared on January 19, 2020.10 Since then, more than 950 articles related to the virus have been published. In March 2020 alone, more than 850 preprints were uploaded to medRxiv, a large portion of which were related to COVID-19. In fact, on March 30 alone, more than 90 articles related to COVID-19 or the SARS-CoV-2 virus were posted on medRxiv and bioRxiv.

Still, most COVID-19–related research is being published in traditional journals, as journals institute special procedures to expedite the review of this material. For example, Blood has waived submission and publication fees for COVID-19–related submissions and granted them fast-track review and special staff attention throughout the peer review and publication processes. The journal also allows authors who have been affected by the pandemic extra time to submit revisions to their manuscripts.

Even before the global pandemic hit, it seemed that many major medical publishers were getting on board with the preprint model. For example, Blood allows submission of manuscripts that have been posted on preprint servers, but with certain stipulations: The authors must disclose preprint publication, the preprint cannot be updated while the manuscript is under review, the authors must retain the copyright for transfer to the American Society of Hematology, and the preprint comments will be considered in the evaluation of the manuscript.

“Because of the potential impact on patient care, we discourage preprints of a clinical nature in principle, but that doesn’t prevent us from considering publication,” Dr. Berliner said.

A Permanent Presence

NEJM allows submission of manuscripts that have been posted to nonprofit preprint servers, provided the authors notify the journal upon submission. JAMA notes on its website that posting to a preprint server will require the publication to determine whether its publishing of the manuscript “will add meaningful new information to the medical literature or will be redundant.” Similarly, the Journal of Clinical Oncology considers manuscript overlaps and novelty of findings when reviewing manuscripts for publication.

Some journals, like The Lancet, have taken it a step further. In 2018, The Lancet’s publishing company, Elsevier, launched preprints on the Social Science Research Network, or SSRN. Authors of all papers submitted to the family of Lancet journals are asked if they want their submission published first as a preprint.

Dr. Levine mentioned hearing that some journal editors even spend time scanning preprint submissions for exciting papers and contacting authors to encourage them to submit the manuscript to their journal.

In the wake of COVID-19, Eric J. Rubin, MD, PhD, Editor-in-Chief of NEJM, and several deputy editors posted an editorial discussing the importance of sharing information with public health authorities as early as possible and encouraged “authors to submit their work for posting on preprint servers.”11 Joerg Heber, Editorial Director of PLOS and Editor-in-Chief of PLOS One posted a blog that encouraged “all researchers to also consider posting a preprint.”12

Pushing Pause on Preprints?

While the rise of preprint servers has undeniable advantages for the scientific community, there are a huge number of uncontrolled studies being submitted to preprint servers that are no more than case reports, Dr. Berliner pointed out. For example, a preprint posted to medRxiv on March 27 suggested that people with blood type A have a higher risk of acquiring COVID-19, compared with people with other blood types.13 The article, which reviewed data from 2,173 patients with COVID-19 in three Chinese hospitals, was picked up by the mainstream media. Not long after, critics pointed out that, although the findings may be of interest to researchers, they had no real effect on the public except possibly causing undue worry.

Subsequently, two studies from Massachusetts General Hospital in Boston and Columbia Presbyterian Hospital in New York City, debunked the association between blood type and COVID-19 risk – asserting that blood type alone does not offer meaningful protection.14,15

“No one should think they’re protected [on the basis of their blood type],” said Nicholas Tatonetti, PhD, who led the Columbia study, which was also published on medRxiv.

This type of interaction between preprints is an example of the system working, proponents of preprint servers allege. This also played out in the retraction of an article posted to bioRxiv claiming to have found similarities between COVID-19 and HIV.16 However, the retraction has not stopped conspiracy theorists from picking up on the research and spreading it through social media.

“Because of the potential impact on patient care, we discourage preprints of a clinical nature in principle, but that doesn’t prevent us from considering publication [in Blood].”

Nancy Berliner, MD

“The media have jumped on results from preprints a little bit too early,” Dr. Levine said. “[Information in a preprint] shouldn’t be reported as fact in an article. It is a risk.”

Scientists and researchers are eager to see how the use of preprint servers – and traditional journals’ acceptance of them – will have changed when the COVID-19 pandemic passes. In recent months, many traditional journals have not only encouraged posting to preprint servers, but also have reduced the length of their period from manuscript submission to publication.

In fact, in May, studies related to COVID-19 published in the Lancet and NEJM came under scrutiny for what seemed like inconsistent data.17,18 The Lancet study, which evaluated hydroxychloroquine for the treatment of COVID-19, was retracted when the company that provided data to the researchers would not provide full access for a third-party review. A month later, authors of the NEJM study, which suggested that underlying cardiovascular disease increased one’s risk of in-hospital death from COVID-19, called for its retraction after of some of its authors and a third-party auditor were denied access to the raw data used in the study.

As Dr. Krumholz pointed out, none of the critics of preprint servers are being critics for criticism’s sake. “Everyone is just trying to do the right thing,” he said.

“Some people see the preprint model as a threat to the publication model, but I view it as complementary to it,” Dr. Levine said. “There is no doubt that the nature of scientific publishing and access is a rapidly evolving landscape, and preprint will be a part of that dialogue.” —By Leah Lawrence

References

  1. ArXiv.org. About ArXiv. Accessed March 30, 2020, from https://arxiv.org/about.
  2. Yale University press release. Preprint server for health sciences will ‘accelerate’ research. Accessed August 1, 2020, from https://news.yale.edu/2020/06/06/preprint-server-health-sciences-will-accelerate-research.
  3. medXiv.org. Submission Guide. Accessed July 31, 2020, from https://www.medrxiv.org/submit-a-manuscript.
  4. Blood. Peer Review. Accessed July 31, 2020, from https://ashpublications.org/blood/pages/peer-review.
  5. Journal of Clinical Oncology. Peer Review Process. Accessed July 31, 2020, from https://ascopubs.org/jco/authors/peer-review-process.
  6. Welsh J, Lu Y, Dhruva S, et al. Age of data at the time of publication of contemporary clinical trials. JAMA Netw Open. 2018;1:e181065.
  7. Sebo P, Fournier JP, Ragot C, et al. Factors associated with publication speed in general medical journals: a retrospective study of bibliometric data. Scientometrics. 2019;119:1037-1058.
  8. Maslove DM. Medical preprints—a debate worth having. JAMA. 2018;319:443-444.
  9. Fraser N, Momenu F, Mayr P, Peters I. The effect of bioRxiv preprints on citations and altmetrics. bioRxiv preprint. June 22, 2019.
  10. Chen T, Rui J, Wang Q, et al. A mathematical model for simulating the transmission of Wuhan novel coronavirus. bioRxiv preprint. January 19, 2020.
  11. Rubin EJ, Baden LR, Morrissey S, Campion EW. Medical journals and the 2019-nCoV outbreak. N Engl J Med. 2020;382:866.
  12. Heber J. A message to our community regarding COVID-19. PLOS Blogs. Accessed July 31, 2020, from https://blogs.plos.org/plos/2020/03/a-message-to-our-community-regarding-covid-19/.
  13. Zhao J, Yang Y, Huang H, et al. Relationship between the ABO blood group and the COVID-19 susceptibility. medRxiv preprint. March 27, 2020.
  14. Tatonetti N, Zietz M. Testing the association between blood type and COVID-19 infection, intubation, and death. medRxiv preprint. July 21, 2020.
  15. Latz CA, DeCarlo C, Boitano L, et al. Blood type and outcomes in patients with COVID-19. Ann Hematol. 2020;99:2113-2118.
  16. Pradham P, Pandey AK, Mishra A, et al. Uncanny similarity of unique inserts in the 2019-nCoV spoke protein to HIV-1 gp120 and Gag. bioRxiv preprint. January 31, 2020.
  17. Mehra MR, Desai SS, Ruschitzka F, Patel AN. RETRACTED: hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet. 2020 May 22.
  18. Mehra MR, Desai SS, Kuy S, et al. RETRACTED: Cardiovascular disease, drug therapy, and mortality in Covid-19. N Engl J Med. 2020;382:e102.