Open source, open science: the coronavirus crisis is when openness comes into its own

Posted on Apr 15, 2020 by Glyn Moody

Open source figures frequently on this blog. That’s in part because Private Internet Access is a long-time supporter of free software, and is in the process of open-sourcing its own software. But more generally, privacy is deeply bound up with open source, for reasons a recent post explained. The importance of open source in the context of privacy is underlined by developments in the fast-moving world of the coronavirus pandemic. Many governments want to use smartphone apps to help trace people who have been in close proximity to those infected with Covid-19. That’s a laudable intention, but privacy organizations are rightly worried that this new form of surveillance might become a permanent addition to the authorities’ toolkit for controlling citizens. Hence a new emphasis on building privacy safeguards into such tracing apps from the start. One essential prerequisite for that is releasing the code as open source. Closed-source software cannot be trusted, which means that “black box” tracing apps need to be viewed with suspicion and avoided.

It’s not just for tracing apps that people are realizing the benefits of openness. Faced by a completely new virus infecting millions, speed is of the essence, and so researchers around the world are changing radically how they work, and embracing open science. This emerging movement promotes the rapid, free and public posting of scientific findings, in contrast to traditional ways of disseminating information. As early as 31 January, around a hundred organisations working in fields related to Covid-19 called for researchers, journals and funders to ensure that research findings and data relevant to this outbreak are shared rapidly and openly to inform the public health response and help save lives. It’s already starting to happen. As a post on the Fred Hutchinson Cancer Research Center Web site notes:

Scientists throughout the world are publicly sharing brief clinical reports and gene sequences of the novel coronavirus, bypassing the careful curation and peer review that still dominates the distribution of most scientific information.

In practice, that means that virus genomes – the subtly-varying genetic code of the virus samples collected around the world – are being released after just a few days, rather than hoarded for personal use by the researchers who obtain them. Sequences are posted to GISAID, which has become an important open resource for coronavirus data:

Prior to the birth of the GISAID Initiative, many scientists hesitated to share influenza data through traditional public-domain data archives, in part due to their legitimate concern about being scooped, a term frequently used when peers using data, are able to publish scholarly articles more quickly than they themselves are able to. In some cases, their scientific contributions would also fail to be properly acknowledged, or recognized.

This, in its turn, allows the real-time tracking of viral evolution thanks to sites like Nextstrain, “an open-source project to harness the scientific and public health potential of pathogen genome data”.

As well as core genomic data, scientists are sharing their Covid-19 experiences, results and analyses in the form of papers. Because traditional academic publishing is too slow – typically taking weeks, or even months from submission to publication, scientists are turning to a different form of open publishing – preprints. Rather than undergoing peer review before publishing, as with traditional journals, preprints are subject to a kind of crowdsourced peer review after they have appeared. By their nature they are works in progress, rather than definitive results. Although they have been used for decades, there’s no doubt that the coronavirus pandemic has given an extra impetus to their adoption, since they allow every kind of result and experience to be shared quickly and globally. Two of the most important sites for preprints dealing with Covid-19 are medRxiv and bioRxiv. At the time of writing, there were over 1500 papers on the new coronavirus, all freely available. The sites were inspired by the first preprint server, arXiv, set up in 1991. It has had close links with free software since the earliest days.

The unprecedented availability of huge numbers of related research papers has led the Allen Institute for AI to create the Covid-19 Open Research Dataset, a free resource of over 51,000 scholarly articles, including over 40,000 with full text, about Covid-19 and the coronavirus family of viruses for use by the global research community. The hope is that researchers can apply the latest natural language processing techniques to glean new information that will help people working on coronavirus topics. That wouldn’t be possible if the articles were not made freely available for this purpose.

The call for openness is now being taken up by global organizations such as UNESCO, which held a meeting with 122 nations to promote the use of open science. The Director-General of the World Health Organization, Dr Tedros Adhanom Ghebreyesus, went even further, writing on Twitter:

I call on all countries, companies and research institutions to support open data, open science and open collaboration so that all people can enjoy the benefits of science and research

In another sign of a move towards openness, an international coalition of scientists and lawyers has called on organizations to make their patents freely available for the fight against COVID-19 by signing the Open COVID Pledge, and using a new Open COVID license It’s great news that so many leading organizations and companies are waking up to the power of openness, and promoting open source, open data and open science. They know that doing so will speed up the process of discovering ways to tackle the global pandemic, which will, in turn, save thousands of lives. But given that’s the case, the question has to be: why isn’t openness the norm that applies all the time, rather than the exception that is only adopted in times of emergency?

Featured image by Beko.