Privacy and population-wide whole-genome sequencing in the age of Google
Back in May, Privacy News Online noted that the legal basis of a data-sharing agreement between Google’s AI subsidiary DeepMind and the UK’s National Health Service (NHS) was suspected to be “inappropriate”. That rather vague judgement has now been clarified, with a ruling from the UK’s independent body created to uphold information rights, the Information Commissioner’s Office (ICO), that the hospital supplying the data had failed to comply with the UK’s Data Protection Act when it provided patient details to DeepMind.
Specifically, the ICO found that patients were not adequately informed that their records would be processed for the purpose of clinical safety testing, and it was unconvinced that analyzing 1.6 million patient records was “necessary and proportionate.” Despite the clear data processing breach, the NHS hospital and DeepMind were let off with little more than a slap on the wrist: they were simply required to comply with the current UK law in the future, and also to carry out a privacy impact assessment. DeepMind’s response raises a number of interesting points, which apply more generally to projects involving highly personal patient data:
“In our determination to achieve quick impact when this work started in 2015, we underestimated the complexity of the NHS and of the rules around patient data, as well as the potential fears about a well-known tech company working in health. We were almost exclusively focused on building tools that nurses and doctors wanted, and thought of our work as technology for clinicians rather than something that needed to be accountable to and shaped by patients, the public and the NHS as a whole. We got that wrong, and we need to do better.”
Those concerns are even more pertinent in the light of a major announcement this week that the UK’s NHS intends to make sequencing a patient’s complete DNA a routine part of its treatment. This will extend existing UK research with the self-explanatory name of the “100,000 Genomes Project.” As well as allowing individual problems to be spotted and dealt with, another key driver for the latest move that will see millions of genomes fully sequenced is to allow big data analysis:
“[England’s chief medical officer, Sally Davies,] says that individual patients have everything to gain from the pooling of data which allows scientists to compare hundreds of thousands of genomes, to find out why some have small mutations or errors in the code that lead to illness. She talked of a new “social contract”, in which the public recognises that they and everybody else will benefit if they allow data about their own genome to be studied.”
However, the main 256-page report acknowledges that there are new privacy issues that will need to be addressed:
“The success of genomic medicine will depend on patients having confidence that the way genomic information is generated, held and used will properly protect their interests. This requires re-examining the traditional rules around confidentiality, which focused on secrecy and the keeping of information as separate and private. Such a rigid separation cannot operate in genomics, which requires clinicians to consider the patient’s specific genetic situation in comparison with knowledge gleaned from others.”
There is a fundamental problem here. To realize the full promise of large stores of genomic information, DNA needs to be pooled so that complex comparative analyses can be conducted. The dataset must be as complete as possible, since it is not known in advance where the important differences are to be found – the danger is that it could be lurking in any portion not included. The only way to find out is to look at everything. But the full set of 3 billion or so DNA “letters” that make up our genome also specify us uniquely. It is not possible to strip out our identity from such full genome DNA data, because the digital data itself is effectively a very large reference number.
The inherently personal nature of the genome makes sharing it highly problematic, since traditional assurances that sensitive data will be anonymized cannot be given. That being the case, there are (at least) two key issues.
The first is which organizations or companies will be given access to the data, and what will they do with it. DeepMind mentioned this aspect in their blog post on the ICO’s ruling against them, where the Google subsidiary noted “the potential fears about a well-known tech company working in health.” That’s particularly the case when something as personal and revealing as DNA is concerned. After all, your DNA is not just about you: it contains within it important details about your parents and more distant ancestry, as well as about other near relations. So-called “familial searches” of DNA have been routine for many years. Given DeepMind’s awareness of the problem, it is rather ironic that just this week we learn that Google has already met with Genomics England – the company set up to run the 100,000 Genomes Project – to discuss whether DeepMind could become involved with the analysis of the DNA.
The other issue raised by the fact that whole genome sequences cannot be anonymized is security. DNA is digital data, encoded in base 4 – the four chemical “letters” A, C, G and T – rather than base 2, using 0s and 1s. That makes it extremely easy to store – and extremely easy to copy. According to the relevant FAQ: “Genomics England will be using industry-standard tools and techniques to prevent unauthorised access.” That’s hardly comforting, since “industry-standard tools” routinely fail to protect sensitive data. Indeed, earlier this week it was revealed that personal health data of any Australian citizen, supposedly held securely, was available for sale online. It would be naive to expect that, once gathered, genomic information will not also be sold in the same way.
Moreover, criminals gaining access is not the only problem here. Once these vast stores of genomic data exist, governments’ black ops teams will doubtless be keen to use them for nefarious purposes like blackmail. That’s much easier when you have someone’s complete DNA readout. Imagine, for example, being able to threaten to leak before a key election that a leading politician’s genome showed a predisposition to certain mental disorders.
In its FAQ, Genomics England insists: “No data held by Genomics England will be accessible to other government agencies.” But it also goes on to qualify that with “In the unusual situation that a request for data is made by a court order then this will be referred to Genomics England’s Legal Counsel as promptly as possible so that all representations may be made to the court, for example, to limit the information requested being released.” Once governments start invoking “national security” or “terrorism”, as they do so frequently elsewhere, you can be sure that they will gain access to the DNA databases without any problems.
This is not to argue against population-wide whole-genome sequencing, or to suggest that health institutions should never work with companies like Google. But the fact that this week alone saw the announcement of the UK’s major genomics project, the news that DeepMind is already talking about getting involved in the analysis of data that will be generated, and the discovery that the sensitive medical details of any Australian citizen can be be bought online, underlines the extremely rapid pace of developments in this sector. It also makes clear the pressing need for an informed public debate about mass genome sequencing and privacy in the age of Google.
Featured image by Michael Ströck.