Code is law: why software openness and algorithmic transparency are vital for privacy
This blog has written a number of times about the growing threat that low-cost, rapid DNA sequencing represents for privacy. The increased use of genetic material by the police to identify suspects poses particular problems. A recent case in the US involving a DNA sample raises a new issue. Because of its importance, both the EFF and ACLU have been actively involved. An EFF blog post explains the background:
The case of New Jersey v. Pickett involves complex DNA analysis using TrueAllele software. The software analyzed a DNA sample obtained by swabbing a weapon, a sample that likely contained the DNA of multiple people. It then asserted that it was likely that the defendant, Corey Pickett, had contributed DNA to that sample, implicating him in the crime.
That might look like a routine application of DNA matching in order to pinpoint an individual allegedly involved in a crime. But in this case, something interesting happened. The legal defense team wanted to analyze how the TrueAllele software had arrived at the conclusion that Pickett’s DNA was present in the sample. The reasoning was that without checking the underlying software code, it was impossible to know whether that implicit accusation was valid. However, both the prosecutors and the software vendor claimed this code was a trade secret. The vendor had a commercial interest in preventing competitors from understanding and copying its approach, and claimed that this outweighed the right of the accused to check the inner logic of the program. Fortunately, an appeals court in New Jersey agreed with the defendant:
As technology proliferates, so does its use in criminal prosecutions. Courts must endeavor to understand new technology – here, probabilistic genotyping – and allow the defense a meaningful opportunity to examine it. Without scrutinizing its software’s source code – a human-made set of instructions that may contain bugs, glitches, and defects – in the context of an adversarial system, no finding that it properly implements the underlying science could realistically be made. Consequently, affording meaningful examination of the source code, which compels the critical independent analysis necessary for a judge to make a threshold determination as to reliability at a Frye hearing, is imperative.
The “Frye hearing” refers to the process of determining whether scientific evidence is admissible. In this case, the issue is whether the TrueAllele software provides valid evidence. The judge ruled that defense expert witnesses need access to the software code in order to offer their opinions on the matter.
For readers of this blog, this will be an obvious conclusion. After all, it was over 20 years ago that Lawrence Lessig pointed out in his seminal book, Code and Other Law of Cyberspace, that “code is law”. That is, the details of how software is coded effectively create laws of their own, separate from traditional ones. Without access to the underlying code of the TrueAllele program, it is hard, or even impossible, to establish the assumptions – and errors – that have been written into the code. And yet those assumptions and errors may play a crucial role in the case of programs that are used to identify suspects, falsely accusing the innocent, and absolving the guilty.
In the current case, the code will not be publicly disclosed, but made available to the defense team only. Arguably any code that has such important consequences for people’s lives should be publicly accessible to allow the fullest possible scrutiny. That’s true not just for specialised programs analyzing genetic material, but also for the important new class of systems that involve automated decision making (ADM). These typically use some form of artificial intelligence, for example machine learning. As a major new report on the area from Algorithm Watch underlines, the biggest problem with such systems is the fact that it is extremely hard to scrutinize their inner workings:
The message for policy-makers couldn’t be clearer. If we truly want to make the most of their potential, while at the same time respecting human rights and democracy, the time to step up, make those systems transparent, and put ADM wrongs right, is now.
The report makes a number of policy recommendations designed to increase the transparency of ADM systems. Although they apply specifically to the EU legal context, there are general points that are relevant globally. For example, Algorithm Watch calls for a public register of ADM systems used by the public sector. It wants a legal obligation to explain the purpose of the system, its underlying logic, and information about who developed it. Since many of these ADM systems rely on training data, Algorithm Watch suggests that this should be made available to researchers, journalists and civil society organizations so that they can check that there are no hidden biases there that might skew the results. It would also like to see a more thoroughgoing approach to auditing ADM systems, but admits that is doesn’t have any easy answers to the question of how that should be done.
As ADM systems are increasingly applied to personal data, with inevitable implications for privacy, the importance of transparency will increase. Opening up the black boxes of ADM is not going to be easy, but the case of the TrueAllele software shows why it is something that must be done sooner or later. Thinking about the issues now will make it easier to come up with solutions as the need grows more urgent.
Featured image by JJ Jones.