Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Published: 16 June 2020
COVID-19 impact on research, lessons learned from COVID-19 research, implications for pediatric research
- Debra L. Weiner 1 , 2 ,
- Vivek Balasubramaniam 3 ,
- Shetal I. Shah 4 &
- Joyce R. Javier 5 , 6
on behalf of the Pediatric Policy Council
Pediatric Research volume 88 , pages 148–150 ( 2020 ) Cite this article
148k Accesses
87 Citations
19 Altmetric
Metrics details
The COVID-19 pandemic has resulted in unprecedented research worldwide. The impact on research in progress at the time of the pandemic, the importance and challenges of real-time pandemic research, and the importance of a pediatrician-scientist workforce are all highlighted by this epic pandemic. As we navigate through and beyond this pandemic, which will have a long-lasting impact on our world, including research and the biomedical research enterprise, it is important to recognize and address opportunities and strategies for, and challenges of research and strengthening the pediatrician-scientist workforce.
The first cases of what is now recognized as SARS-CoV-2 infection, termed COVID-19, were reported in Wuhan, China in December 2019 as cases of fatal pneumonia. By February 26, 2020, COVID-19 had been reported on all continents except Antarctica. As of May 4, 2020, 3.53 million cases and 248,169 deaths have been reported from 210 countries. 1
Impact of COVID-19 on ongoing research
The impact on research in progress prior to COVID-19 was rapid, dramatic, and no doubt will be long term. The pandemic curtailed most academic, industry, and government basic science and clinical research, or redirected research to COVID-19. Most clinical trials, except those testing life-saving therapies, have been paused, and most continuing trials are now closed to new enrollment. Ongoing clinical trials have been modified to enable home administration of treatment and virtual monitoring to minimize participant risk of COVID-19 infection, and to avoid diverting healthcare resources from pandemic response. In addition to short- and long-term patient impact, these research disruptions threaten the careers of physician-scientists, many of whom have had to shift efforts from research to patient care. To protect research in progress, as well as physician-scientist careers and the research workforce, ongoing support is critical. NIH ( https://grants.nih.gov/policy/natural-disasters/corona-virus.htm ), PCORI ( https://www.pcori.org/funding-opportunities/applicant-and-awardee-faqs-related-covid-19 ), and other funders acted swiftly to provide guidance on proposal submission and award management, and implement allowances that enable grant personnel to be paid and time lines to be relaxed. Research institutions have also implemented strategies to mitigate the long-term impact of research disruptions. Support throughout and beyond the pandemic to retain currently well-trained research personnel and research support teams, and to accommodate loss of research assets, including laboratory supplies and study participants, will be required to complete disrupted research and ultimately enable new research.
In the long term, it is likely that the pandemic will force reallocation of research dollars at the expense of research areas funded prior to the pandemic. It will be more important than ever for the pediatric research community to engage in discussion and decisions regarding prioritization of funding goals for dedicated pediatric research and meaningful inclusion of children in studies. The recently released 2020 National Institute of Child Health and Development (NICHD) strategic plan that engaged stakeholders, including scientists and patients, to shape the goals of the Institute, will require modification to best chart a path toward restoring normalcy within pediatric science.
COVID-19 research
This global pandemic once again highlights the importance of research, stable research infrastructure, and funding for public health emergency (PHE)/disaster preparedness, response, and resiliency. The stakes in this worldwide pandemic have never been higher as lives are lost, economies falter, and life has radically changed. Ultimate COVID-19 mitigation and crisis resolution is dependent on high-quality research aligned with top priority societal goals that yields trustworthy data and actionable information. While the highest priority goals are treatment and prevention, biomedical research also provides data critical to manage and restore economic and social welfare.
Scientific and technological knowledge and resources have never been greater and have been leveraged globally to perform COVID-19 research at warp speed. The number of studies related to COVID-19 increases daily, the scope and magnitude of engagement is stunning, and the extent of global collaboration unprecedented. On January 5, 2020, just weeks after the first cases of illness were reported, the genetic sequence, which identified the pathogen as a novel coronavirus, SARS-CoV-2, was released, providing information essential for identifying and developing treatments, vaccines, and diagnostics. As of May 3, 2020 1133 COVID-19 studies, including 148 related to hydroxychloroquine, 13 to remdesivir, 50 to vaccines, and 100 to diagnostic testing, were registered on ClinicalTrials.gov, and 980 different studies on the World Health Organization’s International Clinical Trials Registry Platform (WHO ICTRP), made possible, at least in part, by use of data libraries to inform development of antivirals, immunomodulators, antibody-based biologics, and vaccines. On April 7, 2020, the FDA launched the Coronavirus Treatment Acceleration Program (CTAP) ( https://www.fda.gov/drugs/coronavirus-covid-19-drugs/coronavirus-treatment-acceleration-program-ctap ). On April 17, 2020, NIH announced a partnership with industry to expedite vaccine development ( https://www.nih.gov/news-events/news-releases/nih-launch-public-private-partnership-speed-covid-19-vaccine-treatment-options ). As of May 1, 2020, remdesivir (Gilead), granted FDA emergency use authorization, is the only approved therapeutic for COVID-19. 2
The pandemic has intensified research challenges. In a rush for data already thousands of manuscripts, news reports, and blogs have been published, but to date, there is limited scientifically robust data. Some studies do not meet published clinical trial standards, which now include FDA’s COVID-19-specific standards, 3 , 4 , 5 and/or are published without peer review. Misinformation from studies diverts resources from development and testing of more promising therapeutic candidates and has endangered lives. Ibuprofen, initially reported as unsafe for patients with COVID-19, resulted in a shortage of acetaminophen, endangering individuals for whom ibuprofen is contraindicated. Hydroxychloroquine initially reported as potentially effective for treatment of COVID-19 resulted in shortages for patients with autoimmune diseases. Remdesivir, in rigorous trials, showed decrease in duration of COVID-19, with greater effect given early. 6 Given the limited availability and safety data, the use outside clinical trials is currently approved only for severe disease. Vaccines typically take 10–15 years to develop. As of May 3, 2020, of nearly 100 vaccines in development, 8 are in trial. Several vaccines are projected to have emergency approval within 12–18 months, possibly as early as the end of the year, 7 still an eternity for this pandemic, yet too soon for long-term effectiveness and safety data. Antibody testing, necessary for diagnosis, therapeutics, and vaccine testing, has presented some of the greatest research challenges, including validation, timing, availability and prioritization of testing, interpretation of test results, and appropriate patient and societal actions based on results. 8 Relaxing physical distancing without data regarding test validity, duration, and strength of immunity to different strains of COVID-19 could have catastrophic results. Understanding population differences and disparities, which have been further exposed during this pandemic, is critical for response and long-term pandemic recovery. The “Equitable Data Collection and Disclosure on COVID-19 Act” calls for the CDC (Centers for Disease Control and Prevention) and other HHS (United States Department of Health & Human Services) agencies to publicly release racial and demographic information ( https://bass.house.gov/sites/bass.house.gov/files/Equitable%20Data%20Collection%20and%20Dislosure%20on%20COVID19%20Act_FINAL.pdf )
Trusted sources of up-to-date, easily accessible information must be identified (e.g., WHO https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov , CDC https://www.cdc.gov/coronavirus/2019-nCoV/hcp/index.html , and for children AAP (American Academy of Pediatrics) https://www.aappublications.org/cc/covid-19 ) and should comment on quality of data and provide strategies and crisis standards to guide clinical practice.
Long-term, lessons learned from research during this pandemic could benefit the research enterprise worldwide beyond the pandemic and during other PHE/disasters with strategies for balancing multiple novel approaches and high-quality, time-efficient, cost-effective research. This challenge, at least in part, can be met by appropriate study design, collaboration, patient registries, automated data collection, artificial intelligence, data sharing, and ongoing consideration of appropriate regulatory approval processes. In addition, research to develop and evaluate innovative strategies and technologies to improve access to care, management of health and disease, and quality, safety, and cost effectiveness of care could revolutionize healthcare and healthcare systems. During PHE/disasters, crisis standards for research should be considered along with ongoing and just-in-time PHE/disaster training for researchers willing to share information that could be leveraged at time of crisis. A dedicated funded core workforce of PHE/disaster researchers and funded infrastructure should be considered, potentially as a consortium of networks, that includes physician-scientists, basic scientists, social scientists, mental health providers, global health experts, epidemiologists, public health experts, engineers, information technology experts, economists and educators to strategize, consult, review, monitor, interpret studies, guide appropriate clinical use of data, and inform decisions regarding effective use of resources for PHE/disaster research.
Differences between adult and pediatric COVID-19, the need for pediatric research
As reported by the CDC, from February 12 to April 2, 2020, of 149,760 cases of confirmed COVID-19 in the United States, 2572 (1.7%) were children aged <18 years, similar to published rates in China. 9 Severe illness has been rare. Of 749 children for whom hospitalization data is available, 147 (20%) required hospitalization (5.7% of total children), and 15 of 147 required ICU care (2.0%, 0.58% of total). Of the 95 children aged <1 year, 59 (62%) were hospitalized, and 5 (5.3%) required ICU admission. Among children there were three deaths. Despite children being relatively spared by COVID-19, spread of disease by children, and consequences for their health and pediatric healthcare are potentially profound with immediate and long-term impact on all of society.
We have long been aware of the importance and value of pediatric research on children, and society. COVID-19 is no exception and highlights the imperative need for a pediatrician-scientist workforce. Understanding differences in epidemiology, susceptibility, manifestations, and treatment of COVID-19 in children can provide insights into this pathogen, pathogen–host interactions, pathophysiology, and host response for the entire population. Pediatric clinical registries of COVID-infected, COVID-exposed children can provide data and specimens for immediate and long-term research. Of the 1133 COVID-19 studies on ClinicalTrials.gov, 202 include children aged ≤17 years. Sixty-one of the 681 interventional trials include children. With less diagnostic testing and less pediatric research, we not only endanger children, but also adults by not identifying infected children and limiting spread by children.
Pediatric considerations and challenges related to treatment and vaccine research for COVID-19 include appropriate dosing, pediatric formulation, and pediatric specific short- and long-term effectiveness and safety. Typically, initial clinical trials exclude children until safety has been established in adults. But with time of the essence, deferring pediatric research risks the health of children, particularly those with special needs. Considerations specific to pregnant women, fetuses, and neonates must also be addressed. Childhood mental health in this demographic, already struggling with a mental health pandemic prior to COVID-19, is now further challenged by social disruption, food and housing insecurity, loss of loved ones, isolation from friends and family, and exposure to an infodemic of pandemic-related information. Interestingly, at present mental health visits along with all visits to pediatric emergency departments across the United States are dramatically decreased. Understanding factors that mitigate and worsen psychiatric symptoms should be a focus of research, and ideally will result in strategies for prevention and management in the long term, including beyond this pandemic. Social well-being of children must also be studied. Experts note that the pandemic is a perfect storm for child maltreatment given that vulnerable families are now socially isolated, facing unemployment, and stressed, and that children are not under the watch of mandated reporters in schools, daycare, and primary care. 10 Many states have observed a decrease in child abuse reports and an increase in severity of emergency department abuse cases. In the short term and long term, it will be important to study the impact of access to care, missed care, and disrupted education during COVID-19 on physical and cognitive development.
Training and supporting pediatrician-scientists, such as through NIH physician-scientist research training and career development programs ( https://researchtraining.nih.gov/infographics/physician-scientist ) at all stages of career, as well as fostering research for fellows, residents, and medical students willing to dedicate their research career to, or at least understand implications of their research for, PHE/disasters is important for having an ongoing, as well as a just-in-time surge pediatric-focused PHE/disaster workforce. In addition to including pediatric experts in collaborations and consortiums with broader population focus, consideration should be given to pediatric-focused multi-institutional, academic, industry, and/or government consortiums with infrastructure and ongoing funding for virtual training programs, research teams, and multidisciplinary oversight.
The impact of the COVID-19 pandemic on research and research in response to the pandemic once again highlights the importance of research, challenges of research particularly during PHE/disasters, and opportunities and resources for making research more efficient and cost effective. New paradigms and models for research will hopefully emerge from this pandemic. The importance of building sustained PHE/disaster research infrastructure and a research workforce that includes training and funding for pediatrician-scientists and integrates the pediatrician research workforce into high-quality research across demographics, supports the pediatrician-scientist workforce and pipeline, and benefits society.
Johns Hopkins Coronavirus Resource Center. Covid-19 Case Tracker. Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). https://coronavirus.jhu.edu/map.html (2020).
US Food and Drug Administration. Coronavirus (COVID-19) update: FDA issues emergency use authorization for potential COVID-19 treatment. FDA News Release . https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-issues-emergency-use-authorization-potential-covid-19-treatment (2020).
Evans, S. R. Fundamentals of clinical trial design. J. Exp. Stroke Transl. Med. 3 , 19–27 (2010).
Article Google Scholar
Antman, E. M. & Bierer, B. E. Standards for clinical research: keeping pace with the technology of the future. Circulation 133 , 823–825 (2016).
Food and Drug Administration. FDA guidance on conduct of clinical trials of medical products during COVID-19 public health emergency. Guidance for Industry, Investigators and Institutional Review Boards . https://www.fda.gov/regulatory-information/search-fda-guidance-documents/fda-guidance-conduct-clinical-trials-medical-products-during-covid-19-public-health-emergency (2020).
National Institutes of Health. NIH clinical trials shows remdesivir accelerates recovery from advanced COVID-19. NIH New Releases . https://www.nih.gov/news-events/news-releases/nih-clinical-trial-shows-remdesivir-accelerates-recovery-advanced-covid-19#.XrIX75ZmQeQ.email (2020).
Radcliffe, S. Here’s exactly where we are with vaccines and treatments for COVID-19. Health News . https://www.healthline.com/health-news/heres-exactly-where-were-at-with-vaccines-and-treatments-for-covid-19 (2020).
Abbasi, J. The promise and peril of antibody testing for COVID-19. JAMA . https://doi.org/10.1001/jama.2020.6170 (2020).
CDC COVID-19 Response Team. Coronavirus disease 2019 in children—United States, February 12–April 2, 2020. Morb. Mortal Wkly Rep . 69 , 422–426 (2020).
Agarwal, N. Opinion: the coronavirus could cause a child abuse epidemic. The New York Times . https://www.nytimes.com/2020/04/07/opinion/coronavirus-child-abuse.html (2020).
Download references
Author information
Authors and affiliations.
Department of Pediatrics, Division of Emergency Medicine, Boston Children’s Hospital, Boston, MA, USA
Debra L. Weiner
Harvard Medical School, Boston, MA, USA
Department of Pediatrics, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
Vivek Balasubramaniam
Department of Pediatrics and Division of Neonatology, Maria Fareri Children’s Hospital at Westchester Medical Center, New York Medical College, Valhalla, NY, USA
Shetal I. Shah
Division of General Pediatrics, Children’s Hospital Los Angeles, Los Angeles, CA, USA
Joyce R. Javier
Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
You can also search for this author in PubMed Google Scholar
Contributions
All authors made substantial contributions to conception and design, data acquisition and interpretation, drafting the manuscript, and providing critical revisions. All authors approve this final version of the manuscript.
Pediatric Policy Council
Scott C. Denne, MD, Chair, Pediatric Policy Council; Mona Patel, MD, Representative to the PPC from the Academic Pediatric Association; Jean L. Raphael, MD, MPH, Representative to the PPC from the Academic Pediatric Association; Jonathan Davis, MD, Representative to the PPC from the American Pediatric Society; DeWayne Pursley, MD, MPH, Representative to the PPC from the American Pediatric Society; Tina Cheng, MD, MPH, Representative to the PPC from the Association of Medical School Pediatric Department Chairs; Michael Artman, MD, Representative to the PPC from the Association of Medical School Pediatric Department Chairs; Shetal Shah, MD, Representative to the PPC from the Society for Pediatric Research; Joyce Javier, MD, MPH, MS, Representative to the PPC from the Society for Pediatric Research.
Corresponding author
Correspondence to Debra L. Weiner .
Ethics declarations
Competing interests.
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Members of the Pediatric Policy Council are listed below Author contributions.
Rights and permissions
Reprints and permissions
About this article
Cite this article.
Weiner, D.L., Balasubramaniam, V., Shah, S.I. et al. COVID-19 impact on research, lessons learned from COVID-19 research, implications for pediatric research. Pediatr Res 88 , 148–150 (2020). https://doi.org/10.1038/s41390-020-1006-3
Download citation
Received : 07 May 2020
Accepted : 21 May 2020
Published : 16 June 2020
Issue Date : August 2020
DOI : https://doi.org/10.1038/s41390-020-1006-3
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
Federated learning as a smart tool for research on infectious diseases.
- Laura C. Zwiers
- Diederick E. Grobbee
- David S. Y. Ong
BMC Infectious Diseases (2024)
Catalysing global surgery: a meta-research study on factors affecting surgical research collaborations with Africa
- Thomas O. Kirengo
- Hussein Dossajee
- Nchafatso G. Obonyo
Systematic Reviews (2024)
Lessons learnt while designing and conducting a longitudinal study from the first Italian COVID-19 pandemic wave up to 3 years
- Alvisa Palese
- Stefania Chiappinotto
- Carlo Tascini
Health Research Policy and Systems (2023)
Pediatric Research and COVID-19: the changed landscape
- E. J. Molloy
- C. B. Bearer
Pediatric Research (2022)
Cancer gene therapy 2020: highlights from a challenging year
- Georgios Giamas
- Teresa Gagliano
Cancer Gene Therapy (2022)
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape
Nicholas fraser, liam brierley, jessica k polka, federico nanni, jonathon alexis coates.
- Author information
- Article notes
- Copyright and License information
I have read the journal’s policy and the authors of this manuscript have the following competing interests: JP is the executive director of ASAPbio, a non-profit organization promoting the productive use of preprints in the life sciences. GD is a bioRxiv Affiliate, part of a volunteer group of scientists that screen preprints deposited on the bioRxiv server. MP is the community manager for preLights, a non-profit preprint highlighting service. GD and JAC are contributors to preLights and ASAPBio fellows.
* E-mail: [email protected]
Contributed equally.
Received 2020 Oct 8; Accepted 2021 Mar 8; Collection date 2021 Apr.
This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The world continues to face a life-threatening viral pandemic. The virus underlying the Coronavirus Disease 2019 (COVID-19), Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), has caused over 98 million confirmed cases and 2.2 million deaths since January 2020. Although the most recent respiratory viral pandemic swept the globe only a decade ago, the way science operates and responds to current events has experienced a cultural shift in the interim. The scientific community has responded rapidly to the COVID-19 pandemic, releasing over 125,000 COVID-19–related scientific articles within 10 months of the first confirmed case, of which more than 30,000 were hosted by preprint servers. We focused our analysis on bioRxiv and medRxiv, 2 growing preprint servers for biomedical research, investigating the attributes of COVID-19 preprints, their access and usage rates, as well as characteristics of their propagation on online platforms. Our data provide evidence for increased scientific and public engagement with preprints related to COVID-19 (COVID-19 preprints are accessed more, cited more, and shared more on various online platforms than non-COVID-19 preprints), as well as changes in the use of preprints by journalists and policymakers. We also find evidence for changes in preprinting and publishing behaviour: COVID-19 preprints are shorter and reviewed faster. Our results highlight the unprecedented role of preprints and preprint servers in the dissemination of COVID-19 science and the impact of the pandemic on the scientific communication landscape.
An analysis of bioRxiv and medRxiv during the first 10 months of the COVID-19 pandemic reveals that the pandemic has resulted in a cultural shift in the use of preprints for disseminating pandemic-related science.
Introduction
Since January 2020, the world has been gripped by the Coronavirus Disease 2019 (COVID-19) outbreak, which has escalated to pandemic status, and caused over 98 million cases and 2.1 million deaths (43 million cases and 1.1 million deaths within 10 months of the first reported case) [ 1 – 3 ]. The causative pathogen was rapidly identified as a novel virus within the family Coronaviridae and was named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) [ 4 ]. Although multiple coronaviruses are ubiquitous among humans and cause only mild disease, epidemics of newly emerging coronaviruses were previously observed in SARS in 2002 [ 5 ] and Middle East Respiratory Syndrome (MERS) in 2012 [ 6 ]. The unprecedented extent and rate of spread of COVID-19 has created a critical global health emergency, and academic communities have raced to respond through research developments.
New scholarly research has traditionally been communicated via published journal articles or conference presentations. The traditional journal publishing process involves the submission of manuscripts by authors to an individual journal, which then organises peer review, the process in which other scientists (“peers”) are invited to scrutinise the manuscript and determine its suitability for publication. Authors often conduct additional experiments or analyses to address the reviewers’ concerns in 1 or more revisions. Even after this lengthy process is concluded, almost half of submissions are rejected and require resubmission to a different journal [ 7 ]. The entire publishing timeline from submission to acceptance is estimated to take approximately 6 months in the life sciences [ 8 , 9 ]; the median time between the date a preprint is posted and the date on which the first DOI of a journal article is registered is 166 days in the life sciences [ 8 ].
Preprints are publicly accessible scholarly manuscripts that have not yet been certified by peer review and have been used in some disciplines, such as physics, for communicating scientific results for over 30 years [ 10 ]. In 2013, 2 new preprint initiatives for the biological sciences launched: PeerJ Preprints, from the publisher PeerJ, and bioRxiv, from Cold Spring Harbor Laboratory (CSHL). The latter established partnerships with journals that enabled simultaneous preprint posting at the time of submission [ 11 ]. More recently, CSHL, in collaboration with Yale and BMJ, launched medRxiv, a preprint server for the medical sciences [ 12 ]. Preprint platforms serving the life sciences have subsequently flourished, and preprints submissions continue to grow year on year; two-thirds of these preprints are eventually published in peer-reviewed journals [ 8 ].
While funders and institutions explicitly encouraged prepublication data sharing in the context of the recent Zika and Ebola virus disease outbreaks [ 13 ], usage of preprints remained modest through these epidemics [ 14 ]. The COVID-19 crisis represents the first time that preprints have been widely used outside of specific communities to communicate during an epidemic.
We assessed the role of preprints in the communication of COVID-19 research in the first 10 months of the pandemic, between January 1 and October 31, 2020. We found that preprint servers hosted almost 25% of COVID-19–related science, that these COVID-19 preprints were being accessed and downloaded in far greater volume than other preprints on the same servers, and that these were widely shared across multiple online platforms. Moreover, we determined that COVID-19 preprints are shorter and are published in journals with a shorter delay following posting than their non-COVID-19 counterparts. Taken together, our data demonstrate the importance of rapidly and openly sharing science in the context of a global pandemic and the essential role of preprints in this endeavour.
COVID-19 preprints were posted early in the pandemic and represent a significant proportion of the COVID-19 literature
The COVID-19 pandemic has rapidly spread across the globe, from 3 patients in the city of Wuhan on the December 27, 2019 to over 46.1 million confirmed cases worldwide by the end of October 2020 ( Fig 1A ). The scientific community responded rapidly as soon as COVID-19 emerged as a serious threat, with publications appearing within weeks of the first reported cases ( Fig 1B ). By the end of April 2020, over 19,000 scientific publications had appeared, published both in scientific journals (12,679; approximately 65%) and on preprint servers (6,710; approximately 35%) ( Fig 1B )—in some cases, preprints had already been published in journals during this time period and thus contribute to the counts of both sources. Over the following months, the total number of COVID-19–related publications increased approximately linearly, although the proportion of these which were preprints fell: By the end of October, over 125,000 publications on COVID-19 had appeared (30,260 preprints; approximately 25%). Given an output of approximately 5 million journal articles and preprints in the entirety of 2020 (according to data from Dimensions; https://dimensions.ai ), the publication response to COVID-19 represented >2.5% of outputs during our analysis period. In comparison to other recent outbreaks of global significance caused by emerging RNA viruses, the preprint response to COVID-19 has been much larger; 10,232 COVID-19–related preprints were posted to bioRxiv and medRxiv in the first 10 months of the pandemic; in comparison, only 78 Zika virus–related and 10 Ebola virus–related preprints were posted to bioRxiv during the entire duration of the respective Zika virus epidemic (2015 to 2016) and Western African Ebola virus epidemic (2014 to 2016) ( S1A Fig ). This surge in COVID-19 preprints is not explained by general increases in preprint server usage; considering counts of outbreak-related and non-outbreak–related preprints for each outbreak (COVID-19, Ebola or Zika virus), preprint type was significantly associated with outbreak (chi-squared, χ 2 = 2559.2, p < 0.001), with the proportion of outbreak-related preprints being greatest for COVID-19.
Fig 1. Development of COVID-19 and publication response from January 1 to October 31, 2020.
(A) Number of COVID-19 confirmed cases and reported deaths. Data are sourced from https://github.com/datasets/covid-19/ , based on case and death data aggregated by the Johns Hopkins University Center for Systems Science and Engineering ( https://systems.jhu.edu/ ). Vertical lines labelled (i) and (ii) refer to the date on which the WHO declared COVID-19 outbreak a Public Health Emergency of International Concern, and the date on which the WHO declared the COVID-19 outbreak to be a pandemic, respectively. (B) Cumulative growth of journal articles and preprints containing COVID-19–related search terms. (C) Cumulative growth of preprints containing COVID-19–related search terms, categorised by individual preprint servers. Journal article data in (B) are based upon data extracted from Dimensions ( https://www.dimensions.ai ; see Methods section for further details), and preprint data in (B) and (C) are based upon data gathered by Fraser and Kramer (2020). The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019; WHO, World Health Organization.
The 30,260 manuscripts posted as preprints were hosted on a range of preprint servers covering diverse subject areas not limited to biomedical research ( Fig 1C , data from [ 15 ]). It is important to note that this number includes preprints that may have been posted on multiple preprint servers simultaneously; however, by considering only preprints with unique titles (case insensitive), it appears that this only applies to a small proportion of preprint records (<5%). The total number is preprints is nevertheless likely an underestimation of the true volume of preprints posted, as a number of preprint servers and other repositories (e.g., institutional repositories) that could be expected to host COVID-19 research are not included [ 15 ]. Despite being one of the newest preprint servers, medRxiv hosted the largest number of preprints (7,882); the next largest were SSRN (4,180), Research Square (4,089), RePEc (2,774), arXiv (2,592), bioRxiv (2,328), JMIR (1,218), and Preprints.org (1,020); all other preprint servers were found to host <1,000 preprints ( Fig 1C ).
One of the most frequently cited benefits of preprints is that they allow free access to research findings [ 16 ], while a large proportion of journal articles often remain behind subscription paywalls. In response to the pandemic, a number of journal publishers began to alter their open-access policies in relation to COVID-19 manuscripts. One such change was to make COVID-19 literature temporarily open access (at least for the duration of the pandemic), with over 80,000 papers in our dataset being open access ( S1B Fig ).
Attributes of COVID-19 preprints posted between January and October 2020
To explore the attributes of COVID-19 preprints in greater detail, we focused our following investigation on two of the most popular preprint servers in the biomedical sciences: bioRxiv and medRxiv. We compared attributes of COVID-19–related preprints posted within our analysis period between January 1 and October 31, 2020 against non-COVID-19–related preprints posted in the same time frame. In total, 44,503 preprints were deposited to bioRxiv and medRxiv in this period, of which the majority (34,271, 77.0%) were non-COVID-19–related preprints ( Fig 2A , S1 Table ). During the early phase of the pandemic, the posted monthly volumes of non-COVID-19 preprints was relatively constant, while the monthly volume of COVID-19 preprints increased, peaking at 1,967 in May, and subsequently decreased month by month. These patterns persisted when the 2 preprint servers were considered independently ( S2A Fig ). Moreover, COVID-19 preprints have represented the majority of preprints posted to medRxiv each month after February 2020.
Fig 2. Comparison of the properties of COVID-19 and non-COVID-19 preprints deposited on bioRxiv and medRxiv between January 1 and October 31, 2020.
(A) Number of new preprints deposited per month. (B) Preprint screening time in days. (C) License type chosen by authors. (D) Number of versions per preprint. (E) Boxplot of preprint word counts, binned by posting month. (F) Boxplot of preprint reference counts, binned by posting month. Boxplot horizontal lines denote lower quartile, median, upper quartile, with whiskers extending to 1.5*IQR. All boxplots additionally show raw data values for individual preprints with added horizontal jitter for visibility. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019.
The increase in the rate of preprint posting poses challenges for their timely screening. A minor but detectable difference was observed between screening time for COVID-19 and non-COVID-19 preprints ( Fig 2B ), although this difference appeared to vary with server (2-way ANOVA, interaction term; F 1,83333 = 19.22, p < 0.001). Specifically, screening was marginally slower for COVID-19 preprints than for non-COVID-19 preprints deposited to medRxiv (mean difference = 0.16 days; Tukey honest significant difference [HSD] test, p < 0.001), but not to bioRxiv ( p = 0.981). The slower screening time for COVID-19 preprints was a result of more of these preprints being hosted on medRxiv, which had slightly longer screening times overall; bioRxiv screened preprints approximately 2 days quicker than medRxiv independent of COVID-19 status (both p < 0.001; S2B Fig , S1 Table ).
Preprint servers offer authors the opportunity to post updated versions of a preprint, enabling them to incorporate feedback, correct mistakes, or add additional data and analysis. The majority of preprints existed as only a single version for both COVID-19 and non-COVID-19 works, with very few preprints existing in more than 2 versions ( Fig 2C ). This may somewhat reflect the relatively short time span of our analysis period. Although distributions were similar, COVID-19 preprints appeared to have a slightly greater number of versions, 1 [IQR 1] versus 1 [IQR 0]; Mann–Whitney test, p < 0.001). The choice of preprint server did not appear to impact on the number of versions ( S2C Fig , S1 Table ).
bioRxiv and medRxiv allow authors to select from a number of different Creative Commons ( https://creativecommons.org/ ) license types when depositing their work: CC0 (No Rights Reserved), CC-BY (Attribution), CC BY-NC (Attribution, Noncommercial), CC-BY-ND (Attribution, No Derivatives), and CC-BY-NC-ND (Attribution, Noncommercial, No Derivatives). Authors may also select to post their work without a license (i.e., All Rights Reserved) that allows text and data mining. A previous analysis has found that bioRxiv authors tend to post preprints under the more restrictive license types [ 17 ], although there appears to be some confusion among authors as to the precise implications of each license type [ 18 ]. License choice was significantly associated with preprint category (chi-squared, χ 2 = 336.0, df = 5, p < 0.001); authors of COVID-19 preprints were more likely to choose the more restrictive CC-BY-NC-ND or CC-BY-ND than those of non-COVID-19 preprints and less likely to choose CC-BY ( Fig 2D ). Again, the choice of preprint server did not appear to impact on the type of license selected by the authors ( S2D Fig ).
Given the novelty of the COVID-19 research field and rapid speed at which preprints are being posted, we hypothesised that researchers may be posting preprints in a less mature state, or based on a smaller literature base than for non-COVID preprints. To investigate this, we compared the word counts and reference counts of COVID-19 preprints and non-COVID-19 preprints from bioRxiv (at the time of data extraction, HTML full texts from which word and reference counts were derived were not available for medRxiv) ( Fig 2E ). We found that COVID-19 preprints are on average 32% shorter in length than non-COVID-19 preprints (median, 3,965 [IQR 2,433] versus 5,427 [IQR 2,790]; Mann–Whitney test, p < 0.001) ( S1 Table ). Although the length of preprints gradually increased over the analysis period, COVID-19 preprints remained shorter than non-COVID-19 preprints with a similar difference in word count, even when adjusted for factors such as authorship team size and bioRxiv subject categorisation ( S1 Model , S2 and S3 Tables). COVID-19 preprints also contain fewer references than non-COVID-19 preprints ( Fig 2F ), although not fewer than expected relative to overall preprint length, as little difference was detected in reference:word count ratios (median, 1:103 versus 1:101; p = 0.052). As word counts increased over time, the reference counts per preprint also steadily increased.
Scientists turned to preprints for the first time to share COVID-19 science
The number of authors per preprint may give an additional indication as to the amount of work, resources used, and the extent of collaboration in a manuscript. Although little difference was seen in number of authors between preprint servers ( S1 Table ), COVID-19 preprints had a marginally higher number of authors than non-COVID-19 preprints on average (median, 7 [IQR 8] versus 6 [IQR 5]; p < 0.001), due to the greater likelihood of large (11+) authorship team sizes ( Fig 3A ). However, single-author preprints were approximately 2.6 times more common for COVID-19 (6.1% of preprints) than non-COVID-19 preprints (2.3% of preprints) ( Fig 3A ).
Fig 3. Properties of authors of COVID-19 and non-COVID-19 preprints deposited on bioRxiv and medRxiv between January 1 and October 31, 2020.
(A) Proportion of preprints with N authors. (B) Proportion of preprints deposited by country of corresponding author (top 15 countries by total preprint volume are shown). (C) Proportions of COVID-19 and non-COVID-19 corresponding authors from each of the top 15 countries shown in (B) that had previously posted a preprint (darker bar) or were posting a preprint for the first time (lighter bar). (D) Correlation between date of the first preprint originating from a country (according to the affiliation of the corresponding author) and the date of the first confirmed case from the same country for COVID-19 preprints. (E) Change in bioRxiv/medRxiv preprint posting category for COVID-19 preprint authors compared to their previous preprint (COVID-19 or non-COVID-19), for category combinations with n > = 5 authors. For all panels containing country information, labels refer to ISO 3166 character codes. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019.
The largest proportion of preprints in our dataset were from corresponding authors in the United States, followed by significant proportions from the United Kingdom and China ( Fig 3B ). It is notable that China is overrepresented in terms of COVID-19 preprints compared to its non-COVID-19 preprint output: 39% of preprints from Chinese corresponding authors were COVID-19 related, compared to 16.5% of the US output and 20.1% of the UK output. We also found a significant association for corresponding authors between preprint type (COVID-19 or non-COVID-19) and whether this was the author’s first bioRxiv or medRxiv preprint (chi-squared, χ 2 = 840.4, df = 1, p < 0.001). Among COVID-19 corresponding authors, 85% were posting a preprint for the first time, compared to 69% of non-COVID-19 corresponding authors in the same period. To further understand which authors have been drawn to begin using preprints since the pandemic began, we stratified these groups by country ( S4 Table ) and found significant associations for the US, UK, Germany, India (Bonferroni adjusted p < 0.001), France, Canada, Italy ( p < 0.01), and China ( p < 0.05). In all cases, a higher proportion were posting a preprint for the first time among COVID-19 corresponding authors than non-COVID-19 corresponding authors. Moreover, we found that most countries posted their first COVID-19 preprint close to the time of their first confirmed COVID-19 case ( Fig 3D ), with weak positive correlation considering calendar days of both events (Spearman rank; ρ = 0.54, p < 0.001). Countries posting a COVID-19 preprint in advance of their first confirmed case were mostly higher-income countries (e.g., US, UK, New Zealand, and Switzerland). COVID-19 preprints were deposited from over 100 countries, highlighting the global response to the pandemic.
There has been much discussion regarding the appropriateness of researchers switching to COVID-19 research from other fields [ 19 ]. To quantify whether this phenomenon was detectable within the preprint literature, we compared the bioRxiv or medRxiv category of each COVID-19 preprint to the most recent previous non-COVID-19 preprint (if any) from the same corresponding author. Most corresponding authors were not drastically changing fields, with category differences generally spanning reasonably related areas. For example, some authors that previously posted preprints in evolutionary biology have posted COVID-19 preprints in microbiology ( Fig 3E ). This suggests that—at least within the life sciences—principal investigators are utilising their labs’ skills and resources in an expected manner in their contributions to COVID-19 research.
COVID-19 preprints were published quicker than non-COVID-19 preprints
Critics have previously raised concerns that by forgoing the traditional peer-review process, preprint servers could be flooded by poor-quality research [ 20 , 21 ]. Nonetheless, earlier analyses have shown that a large proportion of preprints (approximately 70%) in the biomedical sciences are eventually published in peer-reviewed scientific journals [ 8 ]. We assessed differences in publication outcomes for COVID-19 versus non-COVID-19 preprints during our analysis period, which may be partially related to differences in preprint quality. Published status (published/unpublished) was significantly associated with preprint type (chi-squared, χ 2 = 186.2, df = 1, p < 0.001); within our time frame, 21.1% of COVID-19 preprints were published in total by the end of October, compared to 15.4% of non-COVID preprints. As expected, greater proportions published were seen among preprints posted earlier, with over 40% of COVID-19 preprints submitted in January published by the end of October and less than 10% for those published in August or later ( Fig 4A ). Published COVID-19 preprints were distributed across many journals, with clinical or multidisciplinary journals tending to publish the most COVID-19 preprints ( Fig 4B ). To determine how publishers were prioritising COVID-19 research, we compared the time from preprint posting to publication in a journal. The time interval from posting to subsequent publication was significantly reduced for COVID-19 preprints by a difference in medians of 48 days compared to non-COVID-19 preprints posted in the same time period (68 days [IQR 69] versus 116 days [IQR 90]; Mann–Whitney test, p < 0.001). This did not appear to be driven by any temporal changes in publishing practices, as the distribution of publication times for non-COVID-19 preprints was similar to our control time frame of January to December 2019 ( Fig 4C ). This acceleration additionally varied between publishers (2-way ANOVA, interaction term preprint type*publisher; F 9,5273 = 6.58, p < 0.001) and was greatest for the American Association for the Advancement of Science (AAAS) at an average difference of 102 days (Tukey HSD; p < 0.001) ( Fig 4D ).
Fig 4. Publication outcomes of COVID-19 and non-COVID-19 preprints deposited on bioRxiv and medRxiv between January 1 and October 31, 2020.
(A) Percentage of COVID-19 versus non-COVID-19 preprints published in peer-reviewed journals, by preprint posting month. (B) Destination journals for COVID-19 preprints that were published within our analysis period. Shown are the top 10 journals by publication volume. (C) Distribution of the number of days between posting a preprint and subsequent journal publication for COVID-19 preprints (red), non-COVID-19 preprints posted during the same period (January to October 2020) (green), and non-COVID-19 preprints posted between January and December 2019 (grey). (D) Time from posting on bioRxiv or medRxiv to publication categorised by publisher. Shown are the top 10 publishers by publication volume. Boxplot horizontal lines denote lower quartile, median, upper quartile, with whiskers extending to 1.5*IQR. All boxplots additionally show raw data values for individual preprints with added horizontal jitter for visibility. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019.
Extensive access of preprint servers for COVID-19 research
At the start of our time window, COVID-19 preprints received abstract views at a rate over 18 times that of non-COVID-19 preprints ( Fig 5A ) (time-adjusted negative binomial regression; rate ratio = 18.2, z = 125.0, p < 0.001) and downloads at a rate of almost 30 times ( Fig 5B ) (rate ratio = 27.1, z = 124.2, p < 0.001). Preprints posted later displayed lower usage rates, in part due to the reduced length of time they were online and able to accrue views and downloads. However, decreases in both views and downloads by posting date was stronger for COVID-19 preprints versus non-COVID-19 preprints (preprint type*calendar day interaction terms, both p < 0.001); each additional calendar month in posting date resulted in an estimated 24.3%/7.4% reduction in rate of views and an estimated 28.5%/12.0% reduction in rate of downloads for COVID-19/non-COVID-19 preprints, respectively. Similar trends of decrease were observed when restricting view and download data to the first respective month of each preprint, with highest rates of usage for those posted in January ( S3A and S3B Fig ). The disparity between COVID-19 and non-COVID-19 preprints suggests that either COVID-19 preprints continued to slowly accumulate total usage well beyond their first month online ( Fig 5 ) and/or they received a more diluted share of relative initial interest as larger volumes of preprints (and publications) were available by later months ( Fig 1B ).
Fig 5. Access statistics for COVID-19 and non-COVID-19 preprints posted on bioRxiv and medRxiv.
(A) Boxplots of abstract views, binned by preprint posting month. (B) Boxplots of PDF downloads, binned by preprint posting month. Boxplot horizontal lines denote lower quartile, median, upper quartile, with whiskers extending to 1.5*IQR. All boxplots additionally show raw data values for individual preprints with added horizontal jitter for visibility. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019.
To confirm that usage of COVID-19 and non-COVID-19 preprints was not an artefact of differing preprint server reliance during the pandemic, we compared usage rates during the pandemic period with those from the previous year (January to December 2019), as a non-pandemic control period. Beyond the expected effect of fewer views/downloads of preprints that have been uploaded for a shorter time, the usage data did not differ from that prior to the pandemic ( S3C and S3D Fig ).
Secondly, we investigated usage across additional preprint servers (data kindly provided by each of the server operators). We found that COVID-19 preprints were consistently downloaded more than non-COVID-19 preprints during our time frame, regardless of which preprint server hosted the manuscript ( S3E Fig ), although the gap in downloads varied between server (2-way ANOVA, interaction term; F 3,89990 = 126.6, p < 0.001). Server usage differences were more pronounced for COVID-19 preprints; multiple post hoc comparisons confirmed that bioRxiv and medRxiv received significantly higher usage per COVID-19 preprint than all other servers for which data were available (Tukey HSD; all p values < 0.001). However, for non-COVID-19 preprints, the only observed pairwise differences between servers indicated greater bioRxiv and medRxiv usage than Research Square (Tukey HSD; p < 0.001). This suggests that specific attention has been given disproportionately to bioRxiv and medRxiv as repositories for COVID-19 research.
COVID-19 preprints were shared and cited more widely than non-COVID-19 preprints
We quantified the citation and online sharing behaviour of COVID-19 preprints using citation count data from Dimensions ( https://dimensions.ai ) and counts of various altmetric indicators using data from Altmetric ( https://altmetric.com ) ( Fig 6 ; further details on data sources in Methods section). In terms of citations, we found higher proportions overall of COVID-19 preprints that received at least a single citation (57.9%) than non-COVID-19 preprints (21.5%) during our study period of January 1 to October 31, 2020, although the citation coverage expectedly decreased for both groups for newer posted preprints ( Fig 6A ). COVID-19 preprints also have greater total citation counts than non-COVID-19 preprints (time-adjusted negative binomial regression; rate ratio = 13.7, z = 116.3, p < 0.001). The highest cited COVID-19 preprint had 652 citations, with the 10th most cited COVID-19 preprint receiving 277 citations ( Table 1 ); many of the highest cited preprints focussed on the viral cell receptor, angiotensin converting enzyme 2 (ACE2), or the epidemiology of COVID-19.
Fig 6. Usage of COVID-19 and non-COVID-19 preprints posted on bioRxiv and medRxiv between January 1 and October 31, 2020.
Panels (A)–(F) show the proportion of preprints receiving at least 1 citation or mention in a given source, with the exception of panel (B) which shows the proportion of preprints receiving at least 2 tweets (to account for the fact that each preprint is tweeted once automatically by the official bioRxiv/medRxiv Twitter accounts). The inset in each panel shows a boxplot comparing citations/mentions for all COVID-19 and non-COVID-19 preprints posted within our analysis period. Boxplot horizontal lines denote lower quartile, median, upper quartile, with whiskers extending to 1.5*IQR. All boxplots additionally show raw data values for individual preprints with added horizontal jitter for visibility. Data are plotted on a log-scale with +1 added to each count for visualisation. (G) Proportion of preprints included in reference lists of policy documents from 3 sources: the ECDC, UK POST, and WHO SB. (H) Spearman correlation matrix between indicators shown in panels (A)–(F), as well as abstract views and PDF downloads for COVID-19 preprints. (I) Spearman correlation matrix between indicators shown in panels (A)–(F), in addition to abstract views and PDF downloads for non-COVID-19 preprints. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019; ECDC, European Centre for Disease Prevention and Control; UK POST, United Kingdom Parliamentary Office of Science and Technology; WHO SB, World Health Organization Scientific Briefs.
Table 1. Top 10 cited COVID-19 preprints.
COVID-19, Coronavirus Disease 2019.
Sharing of preprints on Twitter may provide an indicator of the exposure of wider public audiences to preprints. COVID-19 preprints received greater Twitter coverage (98.9% received >1 tweet) than non-COVID-19 preprints (90.7%) (note that the threshold for Twitter coverage was set at 1 rather than 0, to account for automated tweets by the official bioRxiv and medRxiv Twitter accounts) and were tweeted at an overall greater rate than non-COVID-19 preprints (rate ratio = 7.6, z = 135.7, p < 0.001) ( Fig 6B ). The most tweeted non-COVID-19 preprint received 1,656 tweets, whereas 8 of the top 10 tweeted COVID-19 preprints were tweeted over 10,500 times each ( Table 2 ). Many of the top 10 tweeted COVID-19 preprints were related to transmission, reinfection, or seroprevalence. The most tweeted COVID-19 preprint (26,763 tweets) was a study investigating antibody seroprevalence in California [ 22 ]. The fourth most tweeted COVID-19 preprint was a widely criticised (and later withdrawn) study linking the SARS-CoV-2 spike protein to HIV-1 glycoproteins [ 23 ].
Table 2. Top 10 tweeted COVID-19 preprints.
To better understand the discussion topics associated with highly tweeted preprints, we analysed the hashtags used in original tweets (i.e., excluding retweets) mentioning the top 100 most tweeted COVID-19 preprints ( S4A Fig ). In total, we collected 30,213 original tweets containing 11,789 hashtags; we filtered these hashtags for those occurring more than 5 times and removed a selection of generic or overused hashtags directly referring to the virus (e.g., “#coronavirus” and “#covid-19”), leaving a final set of 2,981 unique hashtags. While many of the top-used hashtags were direct, neutral references to the disease outbreak such as “#coronavirusoutbreak” and “#wuhan,” we also found a large proportion of politicised tweets using hashtags associated with conspirational ideologies (e.g., “#qanon,” “#wwg1wga,” an abbreviation of “Where We Go One, We Go All” a tag commonly used by QAnon supporters), xenophobia (e.g., “#chinazi”), or US-specific right-wing populism (e.g., “#maga”). Other hashtags also referred to topics directly associated with controversial preprints, e.g., “#hydroxychloroquine” and “#hiv,” both of which were major controversial topics associated with several of the top 10 most tweeted preprints.
As well as featuring heavily on social media, COVID-19 research has also pervaded print and online news media. In terms of coverage, 28.7% of COVID-19 preprints were featured in at least a single news article, compared to 1.0% of non-COVID-19 preprints ( Fig 6C ), and were used overall in news articles at a rate almost 100 times that of non-COVID-19 preprints (rate ratio = 92.8, z = 83.3, p < 0.001). The top non-COVID-19 preprint was reported in 113 news articles, whereas the top COVID-19 preprints were reported in over 400 news articles ( Table 3 ). Similarly, COVID-19 preprints were also used more in blogs (coverage COVID-19/non-COVID-19 preprints = 14.3%/9.1%, rate ratio = 3.73, z = 37.3, p < 0.001) and Wikipedia articles (coverage COVID-19/non-COVID-19 preprints = 0.7%/0.2%, rate ratio = 4.47, z = 7.893, p < 0.001) at significantly greater rates than non-COVID-19 preprints ( Fig 6D and 6E , Table 4 ). We noted that several of the most widely disseminated preprints that we classified as being non-COVID-19 related featured topics nonetheless relevant to generalised infectious disease research, such as human respiratory physiology and personal protective equipment.
Table 3. Top 10 COVID-19 preprints covered by news organisations.
Table 4. top 10 commented on covid-19 preprints..
A potential benefit of preprints is that they allow authors to receive an incorporate feedback from the wider community prior to journal publication. To investigate feedback and engagement with preprints, we quantified the number of comments received by preprints directly via the commenting system on the bioRxiv and medRxiv platforms. We found that non-COVID-19 preprints were commented upon less frequently compared to COVID-19 preprints (coverage COVID-19/non-COVID-19 preprints = 15.9%/3.1%, time-adjusted negative binomial regression; rate ratio = 11.0, z = 46.5, p < 0.001) ( Fig 6F ); the most commented non-COVID-19 preprint received only 68 comments, whereas the most commented COVID-19 preprint had over 580 comments ( Table 5 ). One preprint, which had 129 comments, was retracted within 3 days of being posted following intense public scrutiny ( Table 4 , doi: 10.1101/2020.01.30.927871 ). As the pandemic has progressed, fewer preprints were commented upon. Collectively, these data suggest that the most discussed or controversial COVID-19 preprints are rapidly and publicly scrutinised, with commenting systems being used for direct feedback and discussion of preprints.
Table 5. Top 10 most blogged COVID-19 preprints.
Within a set of 81 COVID-19 policy documents (which were manually retrieved from the European Centre for Disease Prevention and Control (ECDC), United Kingdom Parliamentary Office of Science and Technology (UK POST), and World Health Organization Scientific Briefs (WHO SB)), 52 documents cited preprints ( Fig 6G ). However, these citations occurred at a relatively low frequency, typically constituting less than 20% of the total citations in these 52 documents. Among 255 instances of citation to a preprint, medRxiv was the dominant server cited ( n = 209, 82%), with bioRxiv receiving a small number of citations ( n = 21) and 5 other servers receiving ≤10 citations each (arXiv, OSF, preprints.org , Research Square, and SSRN). In comparison, only 16 instances of citations to preprints were observed among 38 manually collected non-COVID-19 policy documents from the same sources.
To understand how different usage and sharing indicators may represent the behaviour of different user groups, we calculated the Spearman correlation between the indicators presented above (citations, tweets, news articles, blog mentions, Wikipedia citations, and comment counts) as well as with abstract views and download counts as previously presented ( Fig 6H and 6I ). Overall, we found stronger correlations between all indicators for COVID-19 preprints compared to non-COVID-19 preprints. For COVID-19 preprints, we found expectedly strong correlation between abstract views and PDF downloads (Spearman ρ = 0.91, p < 0.001), weak to moderate correlation between the numbers of citations and Twitter shares (Spearman ρ = 0.48, p < 0.001), and the numbers of citations and news articles (Spearman ρ = 0.33, p < 0.001) suggesting that the preprints cited extensively within the scientific literature did not necessarily correlate with those that were mostly shared by the wider public on online platforms. There was a slightly stronger correlation between COVID-19 preprints that were most blogged and those receiving the most attention in the news (Spearman ρ = 0.54, p < 0.001) and moderate correlation between COVID-19 preprints that were most tweeted and those receiving the most attention in the news (Spearman ρ = 0.51, p < 0.001), suggesting similarity between preprints shared on social media and in news media. Finally, there was a weak correlation between the number of tweets and number of comments received by COVID-19 preprints (Spearman ρ = 0.36, p < 0.001). Taking the top 10 COVID-19 preprints by each indicator, there was substantial overlap between all indicators except citations ( S4B Fig ).
In summary, our data reveal that COVID-19 preprints received a significant amount of attention from scientists, news organizations, the general public, and policy-making bodies, representing a departure for how preprints are normally shared (considering observed patterns for non-COVID-19 preprints).
The usage of preprint servers within the biological sciences has been rising since the inception of bioRxiv and other platforms [ 10 , 25 ]. The urgent threat of a global pandemic has catapulted the use of preprint servers as a means of quickly disseminating scientific findings into the public sphere, supported by funding bodies encouraging preprinting for COVID-19 research [ 26 , 27 ]. Our results show that preprints have been widely adopted for the dissemination and communication of COVID-19 research, and in turn, the pandemic has greatly impacted the preprint and science publishing landscape [ 28 ].
Changing attitudes and acceptance within the life sciences to preprint servers may be one reason why COVID-19 research is being shared more readily as preprints compared to previous epidemics. In addition, the need to rapidly communicate findings prior to a lengthy review process might be responsible for this observation ( Fig 3 ). A recent study involving qualitative interviews of multiple research stakeholders found “early and rapid dissemination” to be among the most often cited benefits of preprints [ 16 ]. These findings were echoed in a survey of approximately 4,200 bioRxiv users [ 10 ] and are underscored by the 6-month median lag between posting of a preprint and subsequent journal publication [ 8 , 16 ]. Such timelines for disseminating findings are clearly incompatible with the lightning-quick progression of a pandemic. An analysis of publication timelines for 14 medical journals has shown that some publishers have taken steps to accelerate their publishing processes for COVID-19 research, reducing the time for the peer-review stage (submission to acceptance) on average by 45 days and the editing stage (acceptance to publication) by 14 days [ 29 ], yet this still falls some way short of the approximately 1 to 3 days screening time for bioRxiv and medRxiv preprints ( Fig 2B ). This advantage may influence the dynamics of preprint uptake: As researchers in a given field begin to preprint, their colleagues may feel pressure to also preprint in order to avoid being scooped. Further studies on understanding the motivations behind posting preprints, for example, through quantitative and qualitative author surveys, may help funders and other stakeholders that support the usage of preprints to address some of the social barriers for their uptake [ 30 ].
One of the primary concerns among authors around posting preprints is premature media coverage [ 16 , 31 ]. Many preprint servers created highly visible collections of COVID-19 work, potentially amplifying its visibility. From mid-March 2020, bioRxiv and medRxiv included a banner to explain that preprints should not be regarded as conclusive and not reported on in the news media as established information [ 32 ]. Despite this warning message, COVID-19 preprints have received unprecedented coverage on online media platforms ( Fig 6 ). Indeed, even before this warning message was posted, preprints were receiving significant amounts of attention. Twitter has been a particularly notable outlet for communication of preprints, a finding echoed by a recent study on the spread of the wider (i.e., not limited to preprints) COVID-19 research field on Twitter, which found that COVID-19 research was being widely disseminated and driven largely by academic Twitter users [ 33 , 34 ]. Nonetheless, the relatively weak correlation found between citations and other indicators of online sharing ( Fig 6H ) suggests that the interests of scientists versus the broader public differ significantly: Of the articles in the top 10 most shared on Twitter, in news articles or on blogs, only one is ranked among the top 10 most cited articles ( S4B Fig ). Hashtags associated with individual, highly tweeted preprints reveal some emergent themes that suggest communication of certain preprints can also extend well beyond scientific audiences ( S4A Fig ) [ 34 ]. These range from good public health practice (“#washyourhands”) to right-wing philosophies (#chinalies), conspiracy theories (“#fakenews” and “#endthelockdown”), and xenophobia (“#chinazi”). Many of the negative hashtags have been perpetuated by public figures such as the former President of America and the right-wing media [ 35 , 36 ]. Following President Trump’s diagnosis of COVID-19, one investigation found a wave of anti-Asian sentiment and conspiracy theories across Twitter [ 37 ]. This type of misinformation is common to new diseases, and social media platforms have recently released a statement outlining their plans to combat this issue [ 38 ]. An even greater adoption of open science principles has recently been suggested as one method to counter the misuse of preprints and peer-reviewed articles [ 24 ]; this remains an increasingly important discourse.
The fact that news outlets are reporting extensively on COVID-19 preprints ( Fig 6C and 6D ) represents a marked change in journalistic practice: Pre-pandemic, bioRxiv preprints received very little coverage in comparison to journal articles [ 25 ]. This cultural shift provides an unprecedented opportunity to bridge the scientific and media communities to create a consensus on the reporting of preprints [ 21 , 39 ]. Another marked change was observed in the use of preprints in policy documents ( Fig 6G ). Preprints were remarkably underrepresented in non-COVID-19 policy documents yet present, albeit at relatively low levels, in COVID-19 policy documents. In a larger dataset, two of the top 10 journals which are being cited in policy documents were found to be preprint servers (medRxiv and SSRN in fifth and eighth position, respectively) [ 40 ]. This suggests that preprints are being used to directly influence policymakers and decision-making. We only investigated a limited set of policy documents, largely restricted to Europe; whether this extends more globally remains to be explored [ 41 ]. In the near future, we aim to examine the use of preprints in policy in more detail to address these questions.
As most COVID-19-preprints were not yet published, concerns regarding quality will persist [ 20 ]. This is partially addressed by prominent scientists using social media platforms such as Twitter to publicly share concerns about poor-quality COVID-19 preprints or to amplify high-quality preprints [ 42 ]. The use of Twitter to “peer-review” preprints provides additional public scrutiny of manuscripts that can complement the more opaque and slower traditional peer-review process. In addition to Twitter, the comments section of preprint servers can be used as a public forum for discussion and review. However, an analysis of all bioRxiv comments from September 2019 found a very limited number of peer-review style comments [ 43 ]. Despite increased publicity for established preprint review services (such as PREreview [ 44 , 45 ]), there has been a limited use of these platforms [ 46 ]. However, independent preprint review projects have arisen whereby reviews are posted in the comments section of preprint servers or hosted on independent websites [ 47 , 48 ]. These more formal projects partly account for the increased commenting on the most high-profile COVID-19 preprints ( Fig 4 ). Although these new review platforms partially combat poor-quality preprints, it is clear that there is a dire need to better understand the general quality and trustworthiness of preprints compared to peer-reviewed articles. Recent studies have suggested that the quality of reporting in preprints differs little from their later peer-reviewed articles [ 49 ], and we ourselves are currently undertaking a more detailed analysis. However, the problem of poor-quality science is not unique to preprints and ultimately, a multipronged approach is required to solve some of these issues. For example, scientists must engage more responsibly with journalists and the public, in addition to upholding high standards when sharing research. More significant consequences for academic misconduct and the swift removal of problematic articles will be essential in aiding this. Moreover, the politicisation of public health research has become a polarising issue, and more must be done to combat this; scientific advice should be objective and supported by robust evidence. Media outlets and politicians should not use falsehoods or poor-quality science to further a personal agenda. Thirdly, transparency within the scientific process is essential in improving the understanding of its internal dynamics and providing accountability.
Our data demonstrate the indispensable role that preprints, and preprint servers, are playing during a global pandemic. By communicating science through preprints, we are sharing research at a faster rate and with greater transparency than allowed by the current journal infrastructure. Furthermore, we provide evidence for important future discussions around scientific publishing and the use of preprint servers.
Preprint metadata for bioRxiv and medRxiv
We retrieved basic preprint metadata (DOIs, titles, abstracts, author names, corresponding author name and institution, dates, versions, licenses, categories, and published article links) for bioRxiv and medRxiv preprints via the bioRxiv Application Programming Interface (API; https://api.biorxiv.org ). The API accepts a “server” parameter to enable retrieval of records for both bioRxiv and medRxiv. We initially collected metadata for all preprints posted from the time of the server’s launch, corresponding to November 2013 for bioRxiv and June 2019 for medRxiv, until the end of our analysis period on October 31, 2020 ( N = 114,214). Preprint metadata, and metadata related to their linked published articles, were collected in the first week of December 2020. Note that where multiple preprint versions existed, we included only the earliest version and recorded the total number of following revisions. Preprints were classified as “COVID-19 preprints” or “non-COVID-19 preprints” on the basis of the following terms contained within their titles or abstracts (case insensitive): “coronavirus,” “covid-19,” “sars-cov,” “ncov-2019,” “2019-ncov,” “hcov-19,” “sars-2.” For comparison of preprint behaviour between the COVID-19 outbreak and previous viral epidemics, namely Western Africa Ebola virus and Zika virus ( S1 Fig ), the same procedure was applied using the keywords “ebola” or “zebov” and “zika” or “zikv,” respectively.
For a subset of preprints posted between September 1, 2019 and April 30, 2020 ( N = 25,883), we enhanced the basic preprint metadata with data from a number of other sources, as outlined below. Note that this time period was chosen to encapsulate a 10-month analysis period from January 1 to October 31, 2020, in which we make comparative analysis between COVID-19 and non-COVID-19–related preprints, ( N = 44,503), as well as the preceding year from January 1 to December 31, 2019 ( N = 30,094), to use as a pre-COVID-19 control group. Of the preprints contained in the 10-month analysis period, 10,232 (23.0%) contained COVID-19–related keywords in their titles or abstracts.
For all preprints contained in the subset, disambiguated author affiliation and country data for corresponding authors were retrieved by querying raw affiliation strings against the Research Organisation Registry (ROR) API ( https://github.com/ror-community/ror-api ). The API provides a service for matching affiliation strings against institutions contained in the registry, on the basis of multiple matching types (named “phrase,” “common terms,” “fuzzy,” “heuristics,” and “acronyms”). The service returns a list of potential matched institutions and their country, as well as the matching type used, a confidence score with values between 0 and 1, and a binary “chosen” indicator relating to the most confidently matched institution. A small number (approximately 500) of raw affiliation strings returned from the bioRxiv API were truncated at 160 characters; for these records, we conducted web scraping using the rvest package for R [ 50 ] to retrieve the full affiliation strings of corresponding authors from the bioRxiv public web pages, prior to matching. For the purposes of our study, we aimed for higher precision than recall, and thus only included matched institutions where the API returned a confidence score of 1. A manual check of a sample of returned results also suggested higher precision for results returned using the “phrase” matching type, and thus we only retained results using this matching type. In a final step, we applied manual corrections to the country information for a small subset of records where false positives would be most likely to influence our results by (a) iteratively examining the chronologically first preprint associated with each country following affiliation matching and applying manual rules to correct mismatched institutions until no further errors were detected ( n = 8 institutions); and (b) examining the top 50 most common raw affiliation strings and applying manual rules to correct any mismatched or unmatched institutions ( n = 2 institutions). In total, we matched 54,289 preprints to a country (72.8%); for COVID-19 preprints alone, 6,692 preprints (65.4%) were matched to a country. Note that a similar, albeit more sophisticated method of matching bioRxiv affiliation information with the ROR API service was recently documented by Abdill and colleagues [ 51 ].
Word counts and reference counts for each preprint were also added to the basic preprint metadata via scraping of the bioRxiv public web pages (medRxiv currently does not display full HTML texts, and so calculating word and reference counts was limited to bioRxiv preprints). Web scraping was conducted using the rvest package for R [ 50 ]. Word counts refer to words contained only in the main body text, after removing the abstract, figure captions, table captions, acknowledgements, and references. In a small number of cases, word counts could not be retrieved because no full text existed; this occurs as we targeted only the first version of a preprint, but in cases where a second version was uploaded very shortly (i.e., within a few days) after the first version, the full-text article was generated only for the second version. Word and reference counts were retrieved for 61,397 of 61,866 bioRxiv preprints (99.2%); for COVID-19 preprints alone, word and reference counts were retrieved for 2,314 of 2,333 preprints (99.2%). Word counts ranged from 408 to 49,064 words, while reference counts ranged from 1 to 566 references.
Our basic preprint metadata retrieved from the bioRxiv API also contained DOI links to published versions (i.e., a peer-reviewed journal article) of preprints, where available. In total, 22,151 records in our preprint subset (29.7%) contained links to published articles, although of COVID-19 preprints, only 2,164 preprints contained such links (21.1%). It should be noted that COVID-19 articles are heavily weighted towards the most recent months of the dataset and have thus had less time to progress through the journal publication process. Links to published articles are likely an underestimate of the total proportion of articles that have been subsequently published in journals—both as a result of the delay between articles being published in a journal and being detected by bioRxiv and bioRxiv missing some links to published articles when, e.g., titles change significantly between the preprint and published version [ 25 ]. Published article metadata (titles, abstracts, publication dates, journal, and publisher name) were retrieved by querying each DOI against the Crossref API ( https://api.crossref.org ), using the rcrossref package for R [ 52 ]. With respect to publication dates, we use the Crossref “created” field which represent the date on which metadata was first deposited and has been suggested as a good proxy of the first online availability of an article [ 53 , 54 ]. When calculating delay from preprint posting to publication dates, erroneous negative values (i.e., preprints posted after published versions) were ignored. We also retrieved data regarding the open access status of each article by querying each DOI against the Unpaywall API ( https://unpaywall.org/products/api ) via the roadoi package for R [ 55 ].
Usage, altmetrics, and citation data
For investigating the rates at which preprints are used, shared, and cited, we collected detailed usage, altmetrics, and citation data for all bioRxiv and medRxiv preprints posted between January 1, 2019 and October 31, 2020 (i.e., for every preprint where we collected detailed metadata, as described in the previous section). All usage, altmetrics, and citation data were collected in the first week of December 2020.
Usage data (abstract views and PDF downloads) were scraped from the bioRxiv and medRxiv public web pages using the rvest package for R [ 50 ]. bioRxiv and medRxiv web pages display abstract views and PDF downloads on a calendar month basis; for subsequent analysis (e.g., Fig 4 ), these were summed to generate total abstract views and downloads since the time of preprint posting. In total, usage data were recorded for 74,461 preprints (99.8%)—a small number were not recorded, possibly due to server issues during the web scraping process. Note that bioRxiv web pages also display counts of full-text views, although we did not include these data in our final analysis. This was partially to ensure consistency with medRxiv, which currently does not provide display full HTML texts, and partially due to ambiguities in the timeline of full-text publishing—the full text of a preprint is added several days after the preprint is first available, but the exact delay appears to vary from preprint to preprint. We also compared rates of PDF downloads for bioRxiv and medRxiv preprints with other preprint servers (SSRN and Research Square) ( S3C Fig )—these data were provided directly by representatives of each of the respective preprint servers.
Counts of multiple altmetric indicators (mentions in tweets, blogs, and news articles) were retrieved via Altmetric ( https://www.altmetric.com ), a service that monitors and aggregates mentions to scientific articles on various online platforms. Altmetric provide a free API ( https://api.altmetric.com ) against which we queried each preprint DOI in our analysis set. Importantly, Altmetric only contains records where an article has been mentioned in at least one of the sources tracked; thus, if our query returned an invalid response, we recorded counts for all indicators as 0. Coverage of each indicator (i.e., the proportion of preprints receiving at least a single mention in a particular source) for preprints were 99.3%, 10.3%, 7.4%, and 0.33 for mentions in tweets, blogs news, and Wikipedia articles, respectively. The high coverage on Twitter is likely driven, at least in part, by automated tweeting of preprints by the official bioRxiv and medRxiv Twitter accounts. For COVID-19 preprints, coverage was found to be 99.99%, 14.3%, 28.7%, and 0.76% for mentions in tweets, blogs, news, and Wikipedia articles, respectively.
To quantitatively capture how high-usage preprints were being received by Twitter users, we retrieved all tweets linking to the top 10 most-tweeted preprints. Tweet IDs were retrieved via the Altmetric API service and then queried against the Twitter API using the rtweet package [ 56 ] for R, to retrieve full tweet content.
Citations counts for each preprint were retrieved from the scholarly indexing database Dimensions ( https://dimensions.ai ). An advantage of using Dimensions in comparison to more traditional citation databases (e.g., Scopus, Web of Science) is that Dimensions also includes preprints from several sources within their database (including from bioRxiv and medRxiv), as well as their respective citation counts. When a preprint was not found, we recorded its citation counts as 0. Of all preprints, 13,298 (29.9%) recorded at least a single citation in Dimensions. For COVID-19 preprints, 5,294 preprints (57.9%) recorded at least a single citation.
bioRxiv and medRxiv html pages feature a Disqus ( https://disqus.com ) comment platform to allow readers to post text comments. Comment counts for each bioRxiv and medRxiv preprint were retrieved via the Disqus API service ( https://disqus.com/api/docs/ ). Where multiple preprint versions existed, comments were aggregated over all versions. Text content of comments for COVID-19 preprints were provided directly by the bioRxiv development team.
Screening time for bioRxiv and medRxiv
To calculate screening time, we followed the method outlined by Steve Royle [ 57 ]. In short, we calculate the screening time as the difference in days between the preprint posting date and the date stamp of submission approval contained within bioRxiv and medRxiv DOIs (only available for preprints posted after December 11, 2019). bioRxiv and medRxiv preprints were filtered to preprints posted between January 1 and October 31, 2020, accounting for the first version of a posted preprint.
Policy documents
To describe the level of reliance upon preprints in policy documents, a set of policy documents were manually collected from the following institutional sources: the ECDC (including rapid reviews and technical reports), UK POST, and WHO SB ( n = 81 COVID-19–related policies, n = 38 non-COVID-19–related policies). COVID-19 policy documents were selected from January 1, 2020 to October 31, 2020. Due to the limited number of non-COVID-19 policy documents from the same time period, these documents were selected dating back to September 2018. Reference lists of each policy document were then text mined and manually verified to calculate the proportion of references that were preprints.
Journal article data
To compare posting rates of COVID-19 preprints against publication rates of articles published in scientific journals ( Fig 1B ), we extracted a dataset of COVID-19 journal articles from Dimensions ( https://www.dimensions.ai ), via the Dimensions Analytics API service. Journal articles were extracted based on presence of the following terms (case insensitive) in their titles or abstracts: “coronavirus,” “covid-19,” “sars-cov,” “ncov-2019,” “2019-ncov,” “hcov-19,” and “sars-2.” Data were extracted in the first week of December 2020 and covered the period January 1, 2020 to October 31, 2020. To ensure consistency of publication dates with our dataset of preprints, journal articles extracted from Dimensions were matched with records in Crossref on the basis of their DOIs (via the Crossref API using the rcrossref package for R [ 52 ]), and the Crossref “created” field was used as the publication date. The open access status of each article ( S1B Fig ) was subsequently determined by querying each DOI against the Unpaywall API via the roadoi package for R [ 55 ].
Statistical analyses
Preprint counts were compared across categories (e.g., COVID-19 or non-COVID-19) using chi-squared tests. Quantitative preprint metrics (e.g., word count and comment count) were compared across categories using Mann–Whitney tests and correlated with other quantitative metrics using Spearman rank tests for univariate comparisons.
For time-variant metrics (e.g., views, downloads, which may be expected to vary with length of preprint availability), we analysed the difference between COVID-19 and non-COVID-19 preprints using generalised linear regression models with calendar days since January 1, 2020 as an additional covariate and negative binomially distributed errors. This allowed estimates of time-adjusted rate ratios comparing COVID-19 and non-COVID-19 preprint metrics. Negative binomial regressions were constructed using the function “glm.nb” in R package MASS [ 58 ]. For multivariate categorical comparisons of preprint metrics (e.g., screening time between preprint type and preprint server or publication delay between preprint type and publisher for top 10 publishers), we constructed 2-way factorial ANOVAs, testing for interactions between both category variables in all cases. Pairwise post hoc comparisons of interest were tested using Tukey HSD while correcting for multiple testing, using function “glht” while setting multiple comparisons to “Tukey” in R package multcomp [ 53 ].
Parameters and limitations of this study
We acknowledge a number of limitations in our study. Firstly, to assign a preprint as COVID-19 or not, we used keyword matching to titles/abstracts on the preprint version at the time of our data extraction. This means we may have captured some early preprints, posted before the pandemic that had been subtly revised to include a keyword relating to COVID-19. Our data collection period was a tightly defined window (January to October 2020) which may impact upon the altmetric and usage data we collected as those preprints posted at the end of October would have had less time to accrue these metrics.
Supporting information
(A) Total number of preprints posted on bioRxiv and medRxiv during multiple epidemics: Western Africa Ebola virus, Zika virus, and COVID-19. The number of preprints posted that were related to the epidemic and the number that were posted but not related to the epidemic in the same time period are shown. Periods of data collection for Western Africa Ebola virus (January 24, 2014 to June 9, 2016) and Zika virus (March 2, 2015 to November 18, 2016) correspond to the periods between the first official medical report and WHO end of Public Health Emergency of International Concern declaration. The period of data collection for COVID-19 refers to the analysis period used in this study, January 1, 2020 to October 31, 2020. (B) Comparison of COVID-19 journal article accessibility (open versus closed access) according to data provided by Unpaywall ( https://unpaywall.org ). The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019; WHO, World Health Organization.
(A) Number of new preprints posted to bioRxiv versus medRxiv per month. (B) Preprint screening time in days for bioRxiv versus medRxiv. (C) Number of preprint versions posted to bioRxiv versus medRxiv. (D) License type chosen by authors for bioRxiv versus medRxiv. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019.
(A) Boxplots of abstracts views received by COVID-19 and non-COVID-19 preprints in the same calendar month in which they were posted, binned by preprint posting month. (B) Boxplots of PDF downloads received by COVID-19 and non-COVID-19 preprints in the same calendar month in which they were posted, binned by preprint posting month. (C) Boxplots of total abstract views for non-COVID preprints between January 2019 and October 2020, binned by preprint posting month (D) Boxplots of total PDF downloads for for non-COVID preprints between January 2019 and October 2020, binned by preprint posting month. (E) Comparison of PDF downloads for COVID-19 and non-COVID-19 preprints across multiple preprint servers. Red shaded areas in (C) and (D) represent our analysis time period, concurrent with the COVID-19 pandemic. Boxplot horizontal lines denote lower quartile, median, upper quartile, with whiskers extending to 1.5*IQR. All boxplots additionally show raw data values for individual preprints with added horizontal jitter for visibility. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019.
(A) Wordcloud of hashtags for the 100 most tweeted COVID-19 preprints. The size of the word reflects the hashtag frequency (larger = more frequent). Only hashtags used in at least 5 original tweets (excluding retweets) were included. Some common terms relating directly to COVID-19 were removed for visualisation (“covid19,” “coronavirus,” “ncov2019,” “covid,” “covid2019,” “sarscov2,” “2019ncov,” “hcov19,” “19,” “novelcoronavirus,” “corona,” “coronaovirus,” “coronarovirus,” and “coronarvirus”). (B) Euler diagram showing overlap between the 10 most tweeted COVID-19 preprints, the 10 most covered COVID-19 preprints in the news, the 10 most blogged about preprints, the 10 most commented-upon preprints, and the 10 most cited COVID-19 preprints. The data underlying this figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4587214#.YEN22Hmnx9A . COVID-19, Coronavirus Disease 2019.
Acknowledgments
The authors would like to thank Ted Roeder, John Inglis, and Richard Sever from bioRxiv and medRxiv for providing information relating to comments on Coronavirus Disease 2019 (COVID-19) preprints. We would also like to thank Martyn Rittman ( preprints.org ), Shirley Decker-Lucke (SSRN), and Michele Avissar-Whiting (Research Square) for kindly providing usage data. Further thanks to Helena Brown and Sarah Bunn for conversations regarding media usage and government policy.
Abbreviations
American Association for the Advancement of Science
angiotensin converting enzyme 2
Application Programming Interface
Coronavirus Disease 2019
Cold Spring Harbor Laboratory
European Centre for Disease Prevention and Control
honest significant difference
Middle East Respiratory Syndrome
Research Organisation Registry
Severe Acute Respiratory Syndrome Coronavirus 2
United Kingdom Parliamentary Office of Science and Technology
World Health Organization Scientific Briefs
Data Availability
All data and code used in this study are available on GitHub ( https://github.com/preprinting-a-pandemic/pandemic_preprints ) and Zenodo (DOI: 10.5281/zenodo.4501924 ).
Funding Statement
NF acknowledges funding from the German Federal Ministry for Education and Research, grant numbers 01PU17005B (OASE) and 01PU17011D (QuaMedFo). LB acknowledges funding from a Medical Research Council Skills Development Fellowship award, grant number MR/T027355/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- 1. WHO. COVID-19 Weekly Epidemiological Update-11. 2020 Oct. Available from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/weekly-epi-update-11.pdf .
- 2. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med. 2020;382:727–33. 10.1056/NEJMoa2001017 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 3. WHO. Coronavirus Disease (COVID-19) Weekly Epidemiological Update—24. 2021 Jan. Available from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20210127_weekly_epi_update_24.pdf
- 4. Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol. 2020;5:536–44. 10.1038/s41564-020-0695-z [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 5. Ksiazek TG, Erdman D, Goldsmith CS, Zaki SR, Peret T, Emery S, et al. A Novel Coronavirus Associated with Severe Acute Respiratory Syndrome. In: 10.1056/NEJMoa030781 [Internet]. Massachusetts Medical Society; 7 October 2009. [cited 13 May 2020]. 10.1056/NEJMoa030781 [ DOI ] [ Google Scholar ]
- 6. Zaki AM, van Boheemen S, Bestebroer TM, Osterhaus ADME, Fouchier RAM. Isolation of a Novel Coronavirus from a Man with Pneumonia in Saudi Arabia. N Engl J Med. 2012;367:1814–20. 10.1056/NEJMoa1211721 [ DOI ] [ PubMed ] [ Google Scholar ]
- 7. Wallach JD, Egilman AC, Gopal AD, Swami N, Krumholz HM, Ross JS. Biomedical journal speed and efficiency: a cross-sectional pilot survey of author experiences. Res Integr Peer Rev. 2018;3:1. 10.1186/s41073-017-0045-8 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 8. Abdill RJ, Blekhman R. Tracking the popularity and outcomes of all bioRxiv preprints. Pewsey E, Rodgers P, Greene CS. Elife. 2019;8:e45133. 10.7554/eLife.45133 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 9. Björk B-C, Solomon D. The publishing delay in scholarly peer-reviewed journals. J Informet. 2013;7:914–23. 10.1016/j.joi.2013.09.001 [ DOI ] [ Google Scholar ]
- 10. Sever R, Roeder T, Hindle S, Sussman L, Black K-J, Argentine J, et al. bioRxiv: the preprint server for biology. bioRxiv. 2019;833400. 10.1101/833400 [ DOI ] [ Google Scholar ]
- 11. Kaiser J, 2014, Am 12:00. BioRxiv at 1 year: A promising start. In: Science | AAAS [Internet]. 11 Nov 2014 [cited 13 May 2020]. Available from: https://www.sciencemag.org/news/2014/11/biorxiv-1-year-promising-start
- 12. Rawlinson C, Bloom T. New preprint server for medical research. BMJ. 2019;365. 10.1136/bmj.l2301 [ DOI ] [ PubMed ] [ Google Scholar ]
- 13. Wellcome Trust. Sharing data during Zika and other global health emergencies | Wellcome. In: Wellcomeacuk [Internet]. 10 Feb 2016 [cited 13 May 2020]. Available from: https://wellcome.ac.uk/news/sharing-data-during-zika-and-other-global-health-emergencies
- 14. Johansson MA, Reich NG, Meyers LA, Preprints LM. An underutilized mechanism to accelerate outbreak science. PLoS Med. 2018;15:e1002549. 10.1371/journal.pmed.1002549 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 15. Fraser N, Kramer B. covid19_preprints. 2020. 10.6084/m9.figshare.12033672.v16 [ DOI ] [ Google Scholar ]
- 16. Chiarelli A, Johnson R, Pinfield S, Preprints RE. Scholarly Communication: An Exploratory Qualitative Study of Adoption, Practices, Drivers and Barriers. F1000Res. 2019;8:971. 10.12688/f1000research.19619.2 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 17. Himmelstein D. The licensing of bioRxiv preprints. Satoshi Village. 2016. [cited 19 May 2020]. Available from: https://blog.dhimmel.com/biorxiv-licenses/ [ Google Scholar ]
- 18. ASAPbio. asapbio/licensing. ASAPbio; 2018. Available from: https://github.com/asapbio/licensing
- 19. Gog JR. How you can help with COVID-19 modelling. Nat Rev Phys. 2020; 1–2. 10.1038/s42254-020-0175-7 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 20. Bagdasarian N, Cross GB, Fisher D. Rapid publications risk the integrity of science in the era of COVID-19. BMC Med. 2020;18:192. 10.1186/s12916-020-01650-6 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 21. Sheldon T. Preprints could promote confusion and distortion. Nature. 2018;559:445–6. 10.1038/d41586-018-05789-4 [ DOI ] [ PubMed ] [ Google Scholar ]
- 22. Bendavid E, Mulaney B, Sood N, Shah S, Ling E, Bromley-Dulfano R, et al. COVID-19 Antibody Seroprevalence in Santa Clara County, California. medRxiv. 2020; 2020.04.14.20062463. 10.1101/2020.04.14.20062463 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 23. Pradhan P, Pandey AK, Mishra A, Gupta P, Tripathi PK, Menon MB, et al. Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag. bioRxiv. 2020; 2020.01.30.927871. 10.1101/2020.01.30.927871 32511314 [ DOI ] [ Google Scholar ]
- 24. Besançon L, Peiffer-Smadja N, Segalas C, Jiang H, Masuzzo P, Smout C, et al. Open Science Saves Lives: Lessons from the COVID-19 Pandemic. bioRxiv. 2020; 2020.08.13.249847. 10.1101/2020.08.13.249847 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 25. Fraser N, Momeni F, Mayr P, Peters I. The relationship between bioRxiv preprints, citations and altmetrics. Quant Sci Stud. 2020:1–21. 10.1162/qss_a_00043 [ DOI ] [ Google Scholar ]
- 26. Wellcome Trust. Coronavirus (COVID-19): sharing research data | Wellcome. 31 Jan 2020 [cited 21 May 2020]. Available from: https://wellcome.ac.uk/coronavirus-covid-19/open-data
- 27. Wellcome Trust. Publishers make coronavirus (COVID-19) content freely available and reusable | Wellcome. 16 Mar 2020 [cited 21 May 2020]. Available from: https://wellcome.ac.uk/press-release/publishers-make-coronavirus-covid-19-content-freely-available-and-reusable
- 28. Ioannidis JPA, Salholz-Hillel M, Boyack KW, Baas J. The rapid, massive infection of the scientific literature and authors by COVID-19. bioRxiv. 2020; 2020.12.15.422900. 10.1101/2020.12.15.422900 [ DOI ] [ Google Scholar ]
- 29. Horbach SPJM. Pandemic publishing: Medical journals strongly speed up their publication process for COVID-19. Quant Sci Stud. 2020;1:1056–67. 10.1162/qss_a_00076 [ DOI ] [ Google Scholar ]
- 30. Penfold NC, Polka JK. Technical and social issues influencing the adoption of preprints in the life sciences. PLoS Genet. 2020;16:e1008565. 10.1371/journal.pgen.1008565 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 31. ASAPbio. Preprint authors optimistic about benefits: preliminary results from the #bioPreprints2020 survey. In: ASAPbio [Internet]. 27 Jul 2020 [cited 1 Feb 2021]. Available from: https://asapbio.org/biopreprints2020-survey-initial-results
- 32. Inglis J. We’ve just put an additional, cautionary note about the use of preprints on every @biorxivpreprint https://t.co/08eSXL4dDi . In: Twitter [Internet]. 1 Feb 2020 [cited 22 May 2020]. Available from: https://twitter.com/johnringlis/status/1223598414493077505
- 33. Fang Z, Costas R. Tracking the Twitter attention around the research efforts on the COVID-19 pandemic. ArXiv200605783 Cs. 2020 [cited 16 Sep 2020]. Available from: http://arxiv.org/abs/2006.05783
- 34. Carlson J, Harris K. Quantifying and contextualizing the impact of bioRxiv preprints through automated social media audience segmentation. PLoS Biol. 2020;18:e3000860. 10.1371/journal.pbio.3000860 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 35. Yaqub U. Tweeting During the Covid-19 Pandemic: Sentiment Analysis of Twitter Messages by President Trump. Digit Gov Res Pract. 2021;2:1–7. 10.1145/3428090 [ DOI ] [ Google Scholar ]
- 36. Gruzd A, Mai P. Going viral: How a single tweet spawned a COVID-19 conspiracy theory on Twitter. Big Data Soc. 2020;7:2053951720938405. 10.1177/2053951720938405 [ DOI ] [ Google Scholar ]
- 37. Anti-Defamation League. At the Extremes: The 2020 Election and American Extremism | Part 3. In: At the Extremes: The 2020 Election and American Extremism | Part 3 [Internet]. 10 Aug 2020 [cited 27 Jan 2021]. Available from: https://www.adl.org/blog/at-the-extremes-the-2020-election-and-american-extremism-part-3
- 38. Lally C, Christie L. COVID-19 misinformation UK Parliam POST. 2020 [cited 21 May 2020]. Available from: https://post.parliament.uk/analysis/covid-19-misinformation/ , https://post.parliament.uk/analysis/covid-19-misinformation/
- 39. Fleerackers A, Riedlinger M, Moorhead L, Ahmed R, Alperin JP. Communicating Scientific Uncertainty in an Age of COVID-19: An Investigation into the Use of Preprints by Digital Media Outlets. Health Commun. 2021;0:1–13. 10.1080/10410236.2020.1864892 [ DOI ] [ PubMed ] [ Google Scholar ]
- 40. Adie E. COVID-19-policy dataset. 2020. 10.6084/m9.figshare.12055860.v2 [ DOI ] [ Google Scholar ]
- 41. Yin Y, Gao J, Jones BF, Wang D. Coevolution of policy and science during the pandemic. Science. 2021;371:128–30. 10.1126/science.abe3084 [ DOI ] [ PubMed ] [ Google Scholar ]
- 42. Markus A, Oransky I, Retraction Watch. Eye for Manipulation: A Profile of Elisabeth Bik. In: The Scientist Magazine® [Internet]. 7 May 2019 [cited 21 May 2020]. Available from: https://www.the-scientist.com/news-opinion/eye-for-manipulation—a-profile-of-elisabeth-bik-65839
- 43. Malički M, Costello J, Alperin JP, Maggio LA. From amazing work to I beg to differ—analysis of bioRxiv preprints that received one public comment till September 2019. bioRxiv. 2020; 2020.10.14.340083. 10.1101/2020.10.14.340083 [ DOI ] [ Google Scholar ]
- 44. OASPA. COVID-19 Publishers Open Letter of Intent—Rapid Review. In: OASPA [Internet]. 27 May 2020 [cited 13 May 2020]. Available from: https://oaspa.org/covid-19-publishers-open-letter-of-intent-rapid-review/
- 45. Johansson MA, Saderi D. Open peer-review platform for COVID-19 preprints. Nature. 2020;579:29–9. 10.1038/d41586-020-00613-4 [ DOI ] [ PubMed ] [ Google Scholar ]
- 46. Brierley L. The role of research preprints in the academic response to the COVID-19 epidemic. 2020. 10.22541/au.158516578.89167184 [ DOI ] [ Google Scholar ]
- 47. Vabret N, Samstein R, Fernandez N, Merad M. Advancing scientific knowledge in times of pandemics. Nat Rev Immunol. 2020:1–1. 10.1038/s41577-019-0258-9 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 48. MIT Press. The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19. In: MIT News | Massachusetts Institute of Technology [Internet]. 29 Jun 2020 [cited 13 Sep 2020]. Available from: https://news.mit.edu/2020/mit-press-and-uc-berkeley-launch-rapid-reviews-covid-19-0629
- 49. Carneiro CFD, Queiroz VGS, Moulin TC, Carvalho CAM, Haas CB, Rayêe D, et al. Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature. Res Integr Peer Rev. 2020;5:16. 10.1186/s41073-020-00101-3 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 50. Wickham H, RStudio. rvest: Easily Harvest (Scrape) Web Pages. 2019. Available from: https://CRAN.R-project.org/package=rvest
- 51. Abdill RJ, Adamowicz EM, Blekhman R. International authorship and collaboration in bioRxiv preprints. bioRxiv. 2020; 2020.04.25.060756. 10.7554/eLife.58496 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 52. Chamberlain S, Zhu H, Jahn N, Boettiger C, Ram K. rcrossref: Client for Various “CrossRef” “APIs” 2020. Available from: https://CRAN.R-project.org/package=rcrossref
- 53. Fang Z, Costas R. Studying the accumulation velocity of altmetric data tracked by Altmetric.com . Scientometrics. 2020;123: 1077–1101. 10.1007/s11192-020-03405-9 [ DOI ] [ Google Scholar ]
- 54. Haustein S, Bowman TD, Costas R. When is an article actually published? An analysis of online availability, publication, and indexation dates. ArXiv150500796 Cs. 2015 [cited 22 Jan 2021]. Available from: http://arxiv.org/abs/1505.00796
- 55. Jahn N, rOpenSci TS roadoi: Find Free Versions of Scholarly Publications via Unpaywall. 2019. Available from: https://CRAN.R-project.org/package=roadoi
- 56. Kearney M. rtweet: Collecting and analyzing Twitter data. J Open Source Softw. 2019;4:1829. 10.21105/joss.01829 [ DOI ] [ Google Scholar ]
- 57. Royle Steve. Screenager: screening times at bioRxiv. In: quantixed [Internet]. 30 Mar 2020 [cited 22 May 2020]. Available from: https://quantixed.org/2020/03/30/screenager-screening-times-at-biorxiv/
- 58. Venables WN, Ripley BD. Modern Applied Statistics with S. 4th ed. New York: Springer-Verlag; 2002. 10.1007/978-0-387-21706-2 [ DOI ] [ Google Scholar ]
Decision Letter 0
Roland g roberts.
22 Oct 2020
Dear Dr Coates,
Thank you for submitting your manuscript entitled "Preprinting the COVID-19 pandemic" for consideration as a Meta-Research Article by PLOS Biology.
Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.
However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.
Please re-submit your manuscript within two working days, i.e. by Oct 26 2020 11:59PM.
Login to Editorial Manager here: https://www.editorialmanager.com/pbiology
During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.
Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.
Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.
Feel free to email us at [email protected] if you have any queries relating to your submission.
Kind regards,
Roli Roberts
Roland G Roberts, PhD,
Senior Editor
PLOS Biology
Decision Letter 1
16 Nov 2020
Thank you very much for submitting your manuscript "Preprinting the COVID-19 pandemic" for consideration as a Meta-Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by four independent reviewers.
You'll see that all four reviewers are broadly positive about your study, but each of them raises a number of concerns, some of which will need additional analyses (and in some cases data) to address.
a) Three of the four reviewers request that you update your article by considering data for preprints beyond April 30th. Given the potential for further interesting trends preprint deposition and sharing after this date, we think that this is important to address.
b) Given some of the political insights that you discuss, you might find the following recent article about preprint audience segmentation useful. It's by Carlson and Harris, and was published in PLOS Biology just two weeks before you submitted, so you may not have been aware of it: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000860
In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome re-submission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers.
We expect to receive your revised manuscript within 3 months.
Please email us ( [email protected] ) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.
**IMPORTANT - SUBMITTING YOUR REVISION**
Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:
1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.
*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.
You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.
2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.
*Re-submission Checklist*
When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist
To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.
Please make sure to read the following important policies and guidelines while preparing your revision:
*Published Peer Review*
Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:
https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/
*PLOS Data Policy*
Please note that as a condition of publication PLOS' data policy ( http://journals.plos.org/plosbiology/s/data-availability ) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5
*Blot and Gel Data Policy*
We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements
*Protocols deposition*
To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods
Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.
Senior Editor,
*****************************************************
REVIEWERS' COMMENTS:
Reviewer #1:
In "Preprinting the COVID-19 pandemic," Fraser et al. present a useful and timely analysis on the fate of preprints during the COVID-19 pandemic. The paper reports that preprints investigating COVID-19 are published at a higher rate than preprints on other topics, with much shorter delay between preprint and publication. The study also shows that COVID-19 preprints are much shorter and highlight interesting differences between papers at bioRxiv and medRxiv, particularly regarding screening time. This work represents an important contribution to our understanding of the role of preprints in the current pandemic and what we might expect from a future crisis. The data collection approach is logical, organized and thorough. However, there are several major issues that I recommend addressing.
MAJOR POINTS
1. The manuscript's primary dataset is preprints submitted to bioRxiv and medRxiv between 1 Jan 2020 and 30 Apr 2020. The work asks important questions, but, entering the 11th month of a pandemic in which conditions continue to change quickly, the answers presented in the paper may be representative of the pandemic, as trends may have shifted since April. The manuscript would greatly benefit from analyzing preprints from January through at least the summer, when COVID-19 preprint submissions began to decline again: https://github.com/nicholasmfraser/covid19_preprints
2. The results regarding publication rate (and publication delay) would be much more interpretable if the deadline for publication were extended significantly past the deadline for posting the preprint. Currently, the 30 Apr cutoff for preprints appears to be identical to the cutoff for publication—i.e. preprints posted on 27 Apr still count toward the tally of total preprints, but would only count toward the published preprints if they were published within 3 days of appearing online. This has at least two direct effects on the results: First, it artificially deflates the COVID-19 publication rate, particularly relative to the non-COVID-19 publication rate, which is based on a corpus that is growing at a slower (relative) pace. Second, and more importantly, it skews the "time to publication" measurement in favor of preprints that are published quickly: The paper reports that many COVID-19 preprints are published in less than 30 days, but, given the growth pattern of COVID-19 preprints, most papers in their dataset could only have been published in less than 30 days. If the same preprints were evaluated, but the publication cutoff was extended for another 4 months, would the distribution even out? One way to examine this would be to re-evaluate publication-related outcomes after excluding preprints posted close to the publication cutoff. For example, if publication data is analyzed as of 30 Sep, it would be helpful if publication results only considered preprints posted before, say, 1 Aug, even if the other analyses consider preprints posted all the way through September.
3. Several calculations described in the paper appear to be incorrect. While these would almost certainly be caught during re-analysis of an expanded dataset, I wanted to highlight them here as well. I couldn't find code for the analyses, so I made my best guess for how to replicate the tests given the data in the paper's associated GitHub repository.
Line 144: The paper states, "single-author preprints were almost three times more common among COVID-19 than non-COVID-19 preprints." By my count, 205 out of 2527 COVID preprints (8.1%) listed only one author, and 288 out of 12285 non-COVID preprints (2.3%). Dividing the COVID proportion by the non-COVID proportion is 3.46, not "almost three."
Line 191: Using the data in "preprints_full_20190901_20200430.csv," I can't reproduce the reported medians here. When comparing COVID and non-COVID preprints, the text states it's 3432 vs 6143, but I get 3472 vs 6174.
Lines 202-204: The text says that 4% of COVID preprints were published, compared to 3% of non-COVID preprints. However, Figure 2I indicates the COVID publication rate is somewhere around 12 percent, with the non-COVID rate around 6 or 7. The provided data supports the version in the text, so it would be helpful to fix (or explain) the difference here.
Line 210: The paper reports a mean publication delay of 22.5 days for COVID preprints and 48.7 days for non-COVID preprints, but from their data, I get means of 21.0 and 32.7, a much smaller gap. The difference in means is tested using a two-way ANOVA test, but only one interaction term (COVID vs non-COVID) is clear from the description. If "source" (bioRxiv vs. medRxiv) is used as the second term, an ANOVA for "delay_in_days ~ covid_preprint+source" returns non-significant F-values for both source and covid_preprint, suggesting COVID status may not actually affect publication time—a big change from the stated results.
Line 214: The paper describes an ANOVA using publishers, but the degrees of freedom listed (283) suggest they actually used journals, not publishers. It could well just be a typo, but there are 284 journals and only 62 publishers in the dataset. Clarification would be helpful.
Line 270: The text states that 6.7% of non-COVID preprints were mentioned in a news article. Of the 12,285 non-COVID preprints in the analysis period, I only see 83 with at least one mention in a news article, which is 0.68%.
Figure 2: It appears panel 2J has incorrect data in it: It caught my eye because there are papers in the "140 days" bucket, even though the analysis period is shorter than 140 days. Reproducing the panel using the data in the repository shows a different distribution that doesn't go as far to the right.
Figure S2: I'm unable to reproduce panels E and F from this figure using the data provided. As submitted, the panel says Science has published 15 COVID preprints, for example, but I can only find 4. It says the Journal of Infectious Diseases has published 10, but I can only find 4, and so on. This may be caused by the same issue present in Figure 2J. (Incidentally, the bars in panel E are organized alphabetically, which doesn't seem like the order that would be most relevant to readers. Ordering them by value may present the information in a way that's more easily interpretable.)
4. I defer to the editor on whether this is a major issue, but I wanted to highlight several ways in which the current submission doesn't meet my understanding of the PLOS Biology requirements on data availability. First, the referenced GitHub repository provides thorough access to most of the data, but GitHub isn't intended for archival use and isn't included on the journals list of recommended repositories:
https://journals.plos.org/plosbiology/s/recommended-repositories
The paper would benefit from depositing the code somewhere that it could be easily cited and reliably preserved. Personally, I've had a great experience with Zenodo, which has a feature that enables a direct transfer between GitHub and a free Zenodo repo:
https://guides.github.com/activities/citable-code/
Figure S3: I believe the data used for panel C is missing from the dataset. The PLOS guidelines state that "all numerical values used to generate graphs must be provided": https://journals.plos.org/plosbiology/s/submission-guidelines
Similarly, there is no data available to reproduce the results described on lines 233-243. While it looks like the data was not licensed for public release, is it possible anonymized data (e.g. lists of downloads, without any metadata attached) would be allowed, since that's effectively what appears in the figure?
Lines 215-217: The statement that "non-COVID-19 preprints had a 10.6% higher acceptance rate than COVID-19 manuscripts" is poorly supported here. While it's not hard to believe this is an accurate characterization of the data that was provided to the authors, there is no information more specific than "several publishers" about what journals this refers to. In addition, readers are not provided with any data to support the findings, nor, as required in the PLOS guidelines, are they given "All necessary contact information others would need to apply to gain access to the data."
https://journals.plos.org/plosbiology/s/data-availability
MINOR POINTS
Line 86: The paper states that COVID-19 preprints "are reviewed faster than their non-COVID-19 counterparts," but that isn't the only explanation for the observed differences. Given the stakes, it's not impossible that preprint authors were just less likely to post a preprint until they knew they had a done deal at a journal. For authors scared (rightly or not) of getting scooped, posting a preprint 24 hours before your peer-reviewed article goes live may be a way to drum up attention and appear "open" without any risk. Unless there is a way to demonstrate that authors deposit COVID and non-COVID preprints at the same point in the publication process, the statement that they are "reviewed faster" seems to make a large interpretive leap when a phrase like "spend less time on bioRxiv prior to publication" is better justified. This may be a moot point if changes are made regarding Major Issue 2 above.
Line 98: In multiple places (lines 31 and 109, Figure 1B), the text references the number of published papers related to COVID-19, but it's unclear where this information comes from. The legend for figure 1 says "Journal data in (B) is based upon data extracted from Dimensions," but the paper would benefit from elaboration in the Methods section regarding the search strategy and when the search was performed.
Lines 185-187: The paper states that "COVID-19 preprints did not discernibly differ in number of versions compared with non-COVID-19 preprints," using as evidence that both categories have a median of 1. However, Figure 2C shows a noticeable difference in the distributions. Testing the difference between groups using something like the Mann-Whitney test would enable including a definitive statement.
Line 192: The paper states that the difference in preprint length between COVID and non-COVID papers "supports the anecdotal observations that preprints are being used to share more works-in-progress rather than complete stories." However, given the accelerated publication rate of COVID preprints, it seems likely that this could also just indicate that for COVID, the bar for a "complete story" is lower. This isn't a necessary analysis, but if the authors are interested, this section would be improved by an analysis of preprint length among PUBLISHED preprints: Do shorter preprints have a longer delay before publication? If so, I think that would be much more supportive of the idea that people are sharing results as they work. However, if short COVID-19 preprints are published just as quickly as longer ones, that suggests a different story.
Line 195: The text states that the difference in total references between COVID and non-COVID papers reflects "the new, emerging COVID-19 field and dearth of prior literature to reference." However, particularly given the dramatic length difference, maybe shorter, more straightforward COVID preprints simply require less supporting references—if an average non-COVID preprint reports the results of, say, 3 major experiments, it would probably require more background and support than a COVID preprint only reports one experiment. The most straightforward fix for this is to remove some of this over-interpretation, but it may be testable by evaluating something like "references per word."
Line 486: More documentation would be helpful regarding how the preprint publication dates were determined. The text specifies that they were retrieved from Crossref, but the Crossref API provides several different "publication" dates that do not always match. Since there is not an explicit connection between publication date and the date Crossref receives the data, it would also be helpful if the paper specified the dates that the publication data was pulled.
OTHER SUGGESTIONS
The notes below are only intended as friendly suggestions and are not critical to the integrity of the paper.
Lines 27-37: I found it striking that the abstract does not actually describe any results. Readers may be more likely to read on if they're given a better idea of what kind of analysis is to come.
Line 109: The word "published" here is ambiguous: It sounds like the 6,753 preprints included in this total were preprints that were all subsequently published in journals, but, if I'm reading it correctly, this conflates posting to a preprint server with "publishing." Given that the paper deals with preprints that later appeared in journals, it would be beneficial to rephrase this.
Line 155: The phrase "as an expectation" tripped me up here—perhaps "as expected" would be more clear?
337: This sentence suggests the use of preprint servers has been "encouraged by funding bodies requiring COVID-19 research to be open access," but the call to make research open access seems almost separate from the push for people to post preprints—that is, it's not clear to me that posting a preprint prior to publication would satisfy Wellcome's commitment that "all peer-reviewed research publications relevant to the outbreak are made immediately open access." Given that both cited examples strongly encourage preprints but only Wellcome's mentions open-access publication, it may be better to edit this sentence to remove the phrase "requiring COVID-19 research to be open access."
Lines 402-404: Seems like this manuscript is citing its own preprint. This is the first I've seen this, and I'm not sure what is the rationale. If there is an analysis that was included in the preprint version but not the current version, yet still important enough to cite, it might make sense to include the analysis in the current manuscript.
Lines 529-531: The citations of preprints seems very relevant and is important enough to move up to the results section. In addition, the phrase "all preprints" is unclear here, since the manuscript includes references to multiple preprint servers. It appears this refers to the bioRxiv and medRxiv preprints posted in September 2019 or later; it would be helpful to clarify that.
Line 573: It's not clear what method was used for multiple test correction here. The impression that I get from the multcomp documentation is that the glht.summary() function has multiple sophisticated options. It's possible I'm misreading this, but more clarity would be appreciated.
Figure 1: The y-axis label in panel B looks at first glance like it may refer to a ratio, though I think it actually means "articles OR preprints." Would "Manuscripts" be more clear as a label?
Figure 2: Panel B has been effectively scrambled by the lopsided number of preprints processed between medRxiv and bioRxiv, data that is much better visualized in Figure S2 (panel B). Right now, Figure 2B makes it look as if COVID preprints took far longer than usual to be screened, while the results (and Figure S2b) show that there wasn't a big COVID effect, and it's just that COVID preprints were more likely to show up on medRxiv, where screening always takes longer. I suggest replacing Figure 2B with Figure S2B, which is a little more cluttered but far more interpretable.
Figure 2D contains so much interesting data, but is almost impossible to read because of the delineation between first-time authors and returning ones. This panel may be better as a supplementary figure. At the author's discretion, I'd suggest making a scatter plot similar to the one in Figure S2C, plotting COVID vs. non-COVID percentages, leaving the distinction of first-time authors to a supplement.
Panel 2I seems to be an unnecessary use of space—I think readers can conceptually compare one number against another, slightly smaller number without a picture.
Reviewer #2:
[identifies himself as Euan Adie]
Fraser et al. have produced a thorough, detailed analysis of preprints relating to COVID-19 in 2020 and compared them to non-COVID preprints in the same period and earlier.
I'm impressed by the range of the data examined and by the quality of the associated, documented code for the analysis the authors have placed on github, though I didn't re-run the analysis for myself.
We've unquestionably seen a change in the use of preprints during the pandemic and the breadth & wide scope of this study makes it novel and significant enough for publication.
That said I did find myself wanting a clearer picture of what some of the data means, particularly around whether or not how many COVID-19 preprints are genuine work of a standard equivalent to what would normally be submitted to preprint servers vs more lightweight but still high quality articles vs opportunistic "spam" articles. The manuscript touches on some of these points but left me unclear.
My comments in order of appearance rather than importance, (6) is the only revision I'd consider essential:
1) COMMENT Line 70: preprints aren't only -scientific- manuscripts, you mention humanities etc. preprints later on
2) COMMENT Line 80: first time preprints have been widely used to communicate during epidemic: "widely" is doing a lot of work here… as we don't know e.g. what proportion of all researchers working on Zika used preprint servers. Maybe better to say widely used outside of specific communities?
3) COMMENT Line 109 - more than 16,000 articles published, a large proportion preprints: as a number of preprints go on to become published articles it'd be good here to highlight the number that are preprints OR published versions of preprints rather than just the former.
4) QUESTION Line 112 -SSRN - I'm not familiar with the SSRN workflow but do notice that The Lancet "First Look" papers live there https://www.ssrn.com/index.cfm/en/the-lancet/ … were these treated as preprints, and did any other medical journals introduce "one click" ways to deposit papers as preprints in 2020? Basically were any new workflows introduced that might have influenced author choices?
5) COMMENT Line 181 - different types of license being adopted for COVID-19 work: I'd be interested in some brief discussion around why this might be. e.g. are there pharma collaborations or is it connected to how new to preprints authors are? It doesn't gel 100% with authors wanting to be as open as possible - maybe it's just speed that's important to them?
6) ESSENTIAL SUGGESTED REVISION Line 193: support anecdotal evidence that preprints are being used to share works in progress: I'd really like to see this expanded upon here or in the discussion, as it relates also to the quality question you raise around line 201 and later on too… it seems like it would make interpreting other aspects of the data easier. Specifically, does the data suggest that (a) COVID-19 preprints are mostly opportunistic short works or genuinely works in progress, with only one version, that then get submitted to journals with few changes (which may explain the lower acceptance rate, and is what's implied at first)? or (b) do they usually undergo significant changes between preprint and final published version (on line 401 it's asserted that the preprints are of relatively good quality, because the acceptance rate is only a little lower)? In the latter case authors may not engaging with versioning or perhaps publishers have lowered standards for COVID related work, which would be good to know. You start addressing the opportunistic part by looking at changing fields which I think is a good start. I realize that as you say it's very difficult to assess the "quality" of a preprint… perhaps you could get a feel for things by assessing the number of changes in the final published version, for a random subset of the articles?
7) SUGGESTED REVISION: Line 194: COVID-19 preprints have fewer references: are you controlling for number of words? If the articles are 44% shorter it stands to reason that they should also have fewer references, it may be that the articles are more focused or just don't have scope for many citations. I don't think we can say that it reflects only the dearth of prior literature.
8) SUGGESTED REVISION: Line 208: we see faster publication times for COVID-19 preprints: again, would be interesting to see this controlled by article length. Shorter articles with clearer, short hypotheses and opinion pieces will be easier to review and to a certain extent copyedit than longer, data heavy papers.
9) QUESTION: Line 219: was there any anecdotal evidence from MedRxiv / bioRxiv about where the abstract views were coming from? For example direct links from the websites of public health bodies.
10) COMMENT: Line 339: I think it's too early to say if the change is permanent or related to the specific circumstances of the pandemic
11) SUGGESTED REVISION: Line 376: Marked change in journalistic practice: I suspect this is correct, but it's hard to say without data on *why* papers were picked up more by the new: it could also be because university press offices that were previously only worked with press officers at high impact journals have suddenly become interested in COVID-19 preprints and so scan bioRxiv / medRxiv (or perhaps medRxiv has started reaching out to journalists directly?) and that some researchers are very keen to see public engagement around their COVID work. There are few science journalists and it is rare for journalists
Reviewer #3:
The manuscript by Fraser et al. is very interesting inasmuch as it documents how many manuscripts on COVID-19 have been shared during the early months of the pandemic. However, the manuscript by Fraser et al. could be easily misread at COVID-19 being special and preprints fulfilling a unique need of the scientific community. While the authors are transparent in communicating their potential conflicts of interest, the latter framing would not seem necessary for the manuscript to be interesting.
My main concerns are well encapsulated in the statement of the abstract "Although the last pandemic occurred only a decade ago, the way science operates and responds to current events has experienced a paradigm shift in the interim. The scientific community responded rapidly to the COVID-19 pandemic, releasing over 16,000 COVID-19 scientific articles within 4 months of the first confirmed case, of which 6,753 were hosted by preprint servers.". This statement and similar impactions throughout the manuscript suggest that the way science operated changed and that this was tied to preprints.
However, I do not see the manuscript of Fraser et al. as providing evidence for a change in the way science is conducted aside from acknowledging that the volume of research dedicated to COVID-19 is large, and larger than for past pandemics. On a philosophical note, the manuscript does not follow's Kuhn's definition of paradigm shift (that would be more similar to a declination). On a scientometric level, biomedical scientists have already been very responsive toward epidemics and given SARS, MERS and other global threats a disproportional share of their attention. This could be seen, for instance, by determining the citation metrics of publications (controlled for years) for publications in MEDLINE of each individual MeSH term. Such an analysis places MeSH terms corresponding to emergent pathogens as the most-cited MeSH terms of at least the last ~15 years, generally even at the peak position of all MeSH terms (if one excluded MeSH terms occurring only in a handful of manuscripts). One may interpret this as no single set of topics of biology having received as much interest of scientists as pathogens causing pandemics. An alternative reading of the manuscript of Fraser et al. could thus be that scientists did - and do - redirect their attention toward emergent pathogens and pandemics - but that the volume of research on COVID-19 is higher than for other emergent pathogenes/epidemics due to other reasons (e.g.: fraction of people in research-heavy countries that are affected?). Likely, one may conclude that preprints have not been necessary for emerging pathogens to draw more attention among scientists than any other topic.
Other comments:
When comparing COVID-19 preprints against others it remains unclear from the methods section and text, whether all comparisons (e.g.: also license type, number of versions of preprints, lengths of texts) are restricted to preprints posted in the same observational period as COVID-19 preprints (which appears introduced as a statement in the context of Fig 2I). In case that the range of dates allowed for non-COVID-19 preprints differed, the interpretation surrounding many elements of Figure 2 might change.
The authors show differences in reception of COVID-19 vs non-COVID-19 manuscripts (Figure 3), which they interpret as "extensive access". While they rule out some possible alternative scenarios, it remains unclear to which extent this reception is driven by scientists using preprints differently in the context of COVID-19 vs. preprint servers (or secondary sites, such as covidpreprints run by one of the authors) prioritizing the visibility of COVID-19 manuscripts over others (e.g.: bioRxiv has a bold red link to "COVID-19 SARS-CoV-2 preprints, suggesting that there are active efforts to help COVID-19 preprints to gain more visibility). Maybe there is data to make a more stringent statement, otherwise, I would recommend rephrasing.
The data as shown in panels Fig 4A-E provides very little information not contained in the text since distributions are similar, sometimes have the same median, and there are many dots - but it remains unintuitive for reader if there would be a relative change (as total number of dots could be dominated by total number of preprints in each category). Possibly more information could be conveyed by making a cumulative plot (or survival analysis) with increasing values (now y) as thresholds (on x), and plotting two different lines (COVID-19 and non-COVID 19). As an additional, related challenge around these panel, it appears that likely the statistical tests given in the main text should be replaced by non-parametric test, or replaced by tests that do not test differences in the centrality (as dynamic range is small) - but instead test (e.g.: via Fisher's exact test) whether proportions of preprints with at least one y-value (e.g.: one citation) would differ between COVID-19 and non-COVID-19 preprints.
For Figure 4G,H it is unclear whether the inferred statement on higher correlations among COVID-19 truly reflect different correlations, or a higher number of preprints with non-zero values among the COVID-19 group.
The discussion spares possible criticisms around preprints. First, the tournament-type economics of science that requires scientists to accumulate reputation may force researchers to publish on preprint servers due to the risk of being scooped rather than because of their perception of the usefulness of preprint servers, and thus contribute to a research culture that could be perceived unfavorably (e.g.: along "publish-or-perish"). Second, scientific disciplines can slow down in their rate of innovation if they grow too big (Chu et Evans, 2018, ScArXiv). In this sense a larger volume of publications (in manuscripts, but further increased by preprints) may be expected to lead to more conservative research and rather limit the overall progress of scientific fields (besides size).
The discussion section misses the possibility that preprint servers are not neutral services, but themselves act in a way that could prioritize COVID-19 (e.g.: by highlighting COVID-19 publications as bioRxiv does).
Extending the analysis beyond April 30th would be interesting as journals and the scientific community had more time to adopt. Based on Figure 3A, B, which shows diminishing differences between COVID-19 and non-COVID-19 preprints over time, it remains unclear whether the findings reported by the authors throughout the manuscript only refer to the first weeks of a pandemic (a time-period that would be very important, and where preprints might be particularly relevant), or a "shift" as they imply in the discussion section.
"Escalating demands made by reviewers and editors are lengthening the publication process still further [8,9]." isn't backed up by the references, and might rather mirror a common perception, and topic of further study.
While it would seem unlikely to change the overall findings, the comments on other RNA viruses might be incomplete as the keyword-based matching used by the authors uses fewer synonym than for COVID-19 and excludes most synonyms provided by NCBI Taxonomy, which is a reference database for the nomenclature of organisms.
The analysis of the paragraph between lines 135-141 visually appears at odds with the referenced panels, Figure 2B, and S2B, where non-COVID-19 preprints appear to have been screened more rapidly than non-COVID-19 preprints. Particularly, there appear to be more non-COVID-19 preprints with a screening time of 0 or 1 days. As the distributions of screening times are not normally distributed (and cannot be close to 0 as there is no negative screening time), providing the median rather than mean - and doing so separately for bioRxiv and medRxiv - and providing a fitting non-parametric test (e.g.: ranksum) could more accurately describe the overall trends in the data.
The analysis of the subsequent paragraph, between lines 142-146, the additional variability in team size, a half sentence which separates bioRxiv vs. medRxiv may help to clarify how much of the variability would stem from different team sizes in different academic fields (e.g.:, molecular biology vs. clinical research).
"Additionally, India had a higher representation among COVID-19 authors specifically using preprints for the first time compared to non-COVID-19 posting patterns." The visual impression from the figure is that the absolute numbers for India are very small, opening the possibility that the difference is not statistically significant. Adding a statistic test to the statement would prevent those thoughts.
Within Figure S2, panel D appears on the left-hand side from panel C, which is opposite to the reading direction (left to right) common in most scientific publications.
The discussion section claims a cultural shift regarding media. However, it remains unclear whether there was a long-lasting effect, as implied through "shift", which remained after April 2020. Further it remains unclear whether there was a conceptual change in the way media would operate that has been enabled by preprints, or whether also in the early days of other epidemics media used essentially any source outside of scientific journals to obtain information that they would include in their coverage (e.g.: interviews with scientists, reports of local health agencies….).
The statement on politicization on science needing to be "prevented at all costs" could be a little bit more specific to avoid reading it in a possibly unintended manner. For instance, one could argue that arguments among politicians should be based on science, that scientists should focus their research on problems identified by societies and their representatives, and that academics should also care about gender or racial injustice that is present in their societies to avoid that these are manifested through educational systems.
Very, very minor point of curiosity - please fully ignore for any practical concerns unless already considered somehow by authors: The manuscript very understandably focuses on academic science. Would the findings also extend to non-academic preprint servers that scientists should ignore for many good reasons, such as viX ra.org?
Reviewer #4:
This paper focuses on summarizing various attributes, bibliometrics, and altmetrics of preprints pertaining to the COVID-19 pandemic. Overall, I think the manuscript is thoughtful and thorough, and provides a timely overview of how the pandemic has impacted scientific publishing and vice versa. Even beyond the pandemic, this should prove to be a useful point of reference in the ongoing debate surrounding preprints, open science, and peer review.
While the descriptive statistics and univariate tests provide a nice backdrop for thinking about the unprecedented changes to publishing practices induced by COVID-19, I'd like to see the authors attempt to address some more challenging hypotheses and apply some slightly more sophisticated multivariate statistical analyses to support their conclusions.
MAJOR COMMENTS
The authors allude to "anecdotal observations that preprints are being used to share more works-in-progress" (line 192) as the reason COVID-19 preprints tend to be shorter in length--is this based on the authors' own anecdotal evidence, or are there references that can be cited here? "Works-in-progress" implies the research published was not as rigorous as in non-COVID-19 research--although there are certainly examples where results in a preprint were "half-baked", this scenario should be differentiated from studies in which authors are simply sharing results incrementally in short reports rather than waiting to accumulate multiple results before sharing. This practice is something various publishers have tried to promote over the years (e.g., Cell Reports https://www.cell.com/cell-reports/aims )--perhaps the authors could tease out the "work in progress" versus "short report" hypotheses by testing if the word count of COVID-19 preprints is associated with higher rates of publication or faster turnaround in peer-reviewed journals.
I can think of a few other explanations for the shorter length of COVID-19 preprints that could be tested, e.g., perhaps epidemiology papers tend to be shorter than papers in other fields of study and epidemiological studies are overrepresented among the COVID-19 preprints. Similarly, the authors mention elsewhere that relatively more COVID-19 preprints tend to have only a single author, which could also partially explain the shorter length, since there are ostensibly fewer person-hours invested in the writing than a multi-author study. There might also be cultural differences that contribute to paper length--do authors from China or India (the two countries noted as having the greatest increase in representation among COVID-19 preprints) tend to write shorter papers overall? Later, the authors recognize that COVID-19 preprints contain fewer references, which could itself contribute to the shorter length, as there is less need to situate new results against existing literature (in which case we might expect these preprints to have gotten longer as the pandemic has progressed). It should be straightforward to apply a regression model to assess the relationship between paper length and COVID-19 focus, adjusting for topic, author count, author country, date posted, etc.
In the section "Extensive access of preprint servers for COVID-19 research", how much are the average number of abstract views/downloads influenced by outliers like the Santa Clara seroprevalence study and the withdrawn "uncanny inserts" study? More generally, are abstract views/downloads strongly correlated with attention on social media?
Lines 267-270: it would be good to provide some numbers here on the total number of original tweets and hashtags analyzed. Also, take some space to elaborate on why hydroxychloroquine is a controversial topic and why certain conspiracy theories have latched onto these top 10 most tweeted preprints. The wordcloud in Supplemental Fig 4A shows some extraordinary evidence of politicization that isn't mentioned, including the QAnon conspiracy theory ("qanon" and "wwg1wga"), xenophobia ("chinazi"), and US-specific right-wing populism ("maga", "foxnews", "firefauci") (I also think this figure could be moved to the main text). Given that the authors don't shy away from denouncing the politicisation of science as "a polarising issue [that] must be prevented at all costs" (line 409), this section feels much too short in its current form.
The paper makes several references to research that is "poor-quality" or "controversial" but does not rigorously define or classify such preprints. With all of the data at hand, a cool deliverable might be to isolate particular attributes associated with low quality or propensity for controversy. Even some simple descriptives of a curated subset of such preprints would be interesting.
I understand the following request might not be feasible, so consider it optional, but it would be great to see the results of this paper updated to include preprints published more recently than April 30--there's a full 6 months of data that are ignored (spanning the first big peak of cases in the US in July and the ongoing second wave), and there are potentially some really interesting stories to tell--not just about the overall characteristics of COVID-19 preprints, but how they have evolved over time.
MINOR COMMENTS
Fig 1a: since case and death counts are shown together on the same panel, this figure would be more readable with the y-axis on a log scale
Fig 1b: Since all of the other figures use the same color scheme for COVID-19 preprints vs non-COVID preprints, it would be better to use a different color scheme to describe preprints vs. journal articles here.
Line 55: closing parenthesis should come after "case"
Line 70: "certified" has strong connotations--better to just say preprints have not gone through formal peer review yet
Line 98: Maybe say "*at least* 186 articles," unless the authors are certain this is an exhaustive count
Line 109: Were any preprints that went on to be published in peer-reviewed journals double-counted among the 16,000 COVID-19 articles mentioned here?
Line 121: spell out OASPA acronym
Author response to Decision Letter 1
Collection date 2021 Apr.
Submitted filename: Response to reviewers.docx
Decision Letter 2
Thank you for submitting your revised Research Article entitled "Preprinting the COVID-19 pandemic" for publication in PLOS Biology. I have now obtained advice from three of the original reviewers and have discussed their comments with the Academic Editor.
Based on the reviews, we will probably accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers. Please also make sure to address the following data and other policy-related requests.
a) Please attend to the remaining requests from reviewers #1 and #3.
b) Please could you choose a more informative Title. We suggest something like "Analysing the role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape," but please see the relevant comment by reviewer #1 and feel free to choose something that you think reflects the analysis and findings.
c) Please supply a blurb in the box in the submission form.
d) Many thanks for providing the data and code so fully in Github and Zenodo. Please could you cite the URLs/DOIs clearly in all relevant main and supplementary Figure legends (e.g. "The data underlying this Figure may be found in https://github.com/preprinting-a-pandemic/pandemic_preprints and https://zenodo.org/record/4501924 ").
As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.
We expect to receive your revised manuscript within two weeks.
To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:
- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list
- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)
- a track-changes file indicating any changes that you have made to the manuscript.
NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:
https://journals.plos.org/plosbiology/s/supporting-information
*Published Peer Review History*
Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:
*Early Version*
Please note that an uncorrected proof of your manuscript will be published online ahead of the final version, unless you opted out when submitting your manuscript. If, for any reason, you do not want an earlier version of your manuscript published online, uncheck the box. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.
To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods
Please do not hesitate to contact me should you have any questions.
------------------------------------------------------------------------
DATA NOT SHOWN?
- Please note that per journal policy, we do not allow the mention of "data not sown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or add figures presenting the results and the data underlying the figure(s).
This revision by Fraser et al. is an interesting analysis and a valuable contribution to the field. They have addressed all of my major concerns, and the expansion of the dataset through October has enabled them to provide a much more comprehensive characterization of the relevant patterns. There are a few minor issues left:
MINOR NOTES:
I defer to the editor on whether this is a concern, but the current title, "Preprinting the pandemic," does not seem as descriptive as it could be. It reads as a feature headline but doesn't reveal any of the findings, nor does it describe the analysis in a useful way.
Line 218: The paper states, "Critics have previously raised concerns that by forgoing the traditional peer-review process, preprint servers could be flooded by poor-quality research." A citation would be helpful here.
Lines 245-253: A lot of space and effort is spent here explaining that downloads for individual preprints taper off over time—this would be a valuable visualization to add, perhaps as a panel in Figure 5. It would help evaluate whether the average "debut month" was shrinking over time, which could indicate popular interest in COVID is waning. An example of a "downloads in first month" figure is available as Figure 2, figure supplement 3(a) in Abdill & Blekhman 2019 [1]—the x-axis could be something like "Month posted," and the y-axis would be "Downloads in first month." Using a visualization such as a box-and-whisker plot could illustrate how many downloads were received in, say, March, for preprints posted in March, followed by downloads in April for preprints posted in April, and so on.
Figure 1: In panel C, it's difficult to see differences between the smaller servers, particularly because the number of segments means the colors are all very similar. It might be better to push the 7 smallest ones into the "other" bucket. This is not a critical issue and we leave it to the authors and editors whether to make this change.
Figure 3: In panel B, it's difficult to compare the two categories because of the dramatically different counts. The distributions (as in panel A) would be much more informative, particularly because the difference in distributions seems to be the most relevant result.
Figure 6: The content of Panel E is limited by the poor match between the data and the scale of the y-axis. While it's logical that all panels would have the same y-axis, the primary comparison for readers doesn't seem to be between panels, but within them. The authors might consider altering the y-axis of this panel to make the differences easier to see.
Several panels in figures 2, 4, 5 and 6 appear to use German abbreviations for month names, while others use English abbreviations. I'm not aware of specific language requirements from the journal, but consistency would be helpful.
References:
[1] Abdill & Blekhman 2019. https://doi.org/10.7554/eLife.45133
Fraser et al. present a series of good improvements for an already interesting manuscript.
I hope that prior publication, they could remove the last remnant of my prior main criticism - namely the reference to a "paradigm shift" contained in the abstract. The phrase "cultural shift" which they now use in the discussion section is very appropriate and fitting as their findings document an (important) shift in publication practices. In contrast, the original meaning of "paradigm shift" within studies of science refers to the way scientists probe phenomena. Kuhn emphasized this point in the foreword of later editions of The Structure of Scientific Revolutions as he noted that his phrase of "paradigm shift" had already become read in a more general and unintended manner by some. Similarly - at least for genes - research on COVID-19 appears to often use research patterns of the past (Stoeger et Amaral, eLife 2020). This would argue against a "paradigm shift" in its original sense.
Figure labels for months use German abbreviations.
While the authors have now greatly clarified points about the disproportional usage of preprint servers for COVID-19 this argument could possibly be extended in a half sentence that extends beyond their share in preprint servers, relative to other pandemics, toward the comparison of the share of COVID-19 in preprint servers against their share in journals. Such an option would seem supported by an approximate estimate of there being 100,000 COVID-19 publications according to LitCovid, and there being around 2.5 million papers added to MEDLINE every year (thus the ~25% of COVID-19 publications in preprint servers would exceed the ~4% anticipated for journals).
The reformulated text around lines 251 now seems to yield the originally intended message of the authors, which I had not noted in the initial review. As the analysis of this paragraph now stands, I would see their current suggestions as only one of two different options. The other option would be that COVID-19, but not non-COVID-19, preprints are subjected to an additional time-dependent factor (which one could suspect to be the changes of the number of publications available in the literature through journals). I believe that this point could be clarified by formulating the sentence again in a more general manner, or by doing an additional analysis (e.g. as the authors will have at least two time-stamped data queries, with one corresponding to the original submission, and the other corresponding to the update of the revision).
The authors have thoroughly addressed all of my previous comments.
Author response to Decision Letter 2
Submitted filename: Reviewer comments_Resubmission.docx
Decision Letter 3
On behalf of my colleagues and the Academic Editor, Ulrich Dirnagl, I'm pleased to say that we can in principle offer to publish your Research Article "The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape" in PLOS Biology, provided you address any remaining formatting and reporting issues. These will be detailed in an email that will follow this letter and that you will usually receive within 2-3 business days, during which time no action is required from you. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have made the required changes.
Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/ , click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.
PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with [email protected] . If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.
We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/ .
Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Biology.
Sincerely,
Roland G Roberts, PhD
Senior Editor
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data availability statement.
- View on publisher site
- PDF (3.3 MB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
IMAGES
COMMENTS
Noah Deitrick and Adam Streff provided excellent research assistance. All errors that remain are ours. The views expressed herein are those of the authors and do not necessarily reflect the views ... COVID-19 on Honors students' academic outcomes is consistently smaller than the impact on non-Honors students. 3This approach has been used ...
PDF | Most of our work focuses on large problems that humanity has faced for a long time - such as child mortality, natural disasters, poverty and... | Find, read and cite all the research you ...
COVID-19 is both a global health crisis and an international economic threat. The ... research on prior economic contractions suggests may have adverse - and lethal - health effects (e.g., Popovici & French, 2013). By organizing our experiences as researchers in a wide array of
The spread of the "Severe Acute Respiratory Coronavirus 2" (SARS-CoV-2), the causal agent of COVID-19, was characterized as a pandemic by the World Health Organization (WHO) in March 2020 and has triggered an international public health emergency [].The numbers of confirmed cases and deaths due to COVID-19 are rapidly escalating, counting in millions [], causing massive economic strain ...
Pediatric Research - COVID-19 impact on research, lessons learned from COVID-19 research, implications for pediatric research. ... Download PDF. Download PDF. Comment; Published: 16 June 2020;
The development of effective COVID-19 vaccines, treatments and public health interventions alone has provided humanity with some hope for the future. This updated report once again brings a spotlight to the immense and tireless global research effort to control COVID-19. The global coordination and support for the world's leading scientists and
COVID-19, the mortality risk of COVID-19 for persons above the age of 80 is more than 10 times higher than persons between the ages of 18 to 39 (Shruti Gupta et al. 2020). In a study of 21 countries, COVID-19 mortality rates were 62 times higher for persons above the age of 65 compared to those below 54 (Yanez et al. 2020).
The research community has responded by publishing an impressive number of scientific reports related to COVID-19. The world was alerted to the new disease at the beginning of 2020 [1], and by mid-March 2020, more than 2000 articles had been published on COVID-19 in scholarly journals, with 25% of them containing original data [5].
Coronavirus disease 2019 (COVID-19) Situation Report - 94 HIGHLIGHTS • The Global Outbreak Alert and Response Network (GOARN) has launched a GOARN COVID-19 Knowledge hub. The hub is designed as a central repository of quality public health information, guidance, tools and webinars which can be accessed freely at any point.
Given the novelty of the COVID-19 research field and rapid speed at which preprints are being posted, we hypothesised that researchers may be posting preprints in a less mature state, or based on a smaller literature base than for non-COVID preprints. ... Comparison of PDF downloads for COVID-19 and non-COVID-19 preprints across multiple ...