Reproducible Builds – Solving an Old Open Source Problem to Improve Security

Updated on Jan 30, 2024 by Derek Zimmer

One of the billion-dollar problems in the world of computers is getting software to act reliably. Fundamentally, when software misbehaves, it leads to flaws that can impact performance, reliability and security.

One of the best things about computers is that they do exactly what you tell them to do. One of the worst things about computers is that they do exactly what you tell them to do.

The most common issue is errors in the code, where the programmer makes a mistake through a typo or misunderstanding some function and creates a bug. There’s an arsenal of products, programs, and development environments to try to minimize bugs.

But there’s a number of problems with compiling code, that is, taking the code that is written in a language that humans understand (source code), and translating it into a language that computers understand (compiled code or machine code).

Inconsistent Compiling Environments Lead to Inconsistent Behavior.

When a programmer completes an application or a user downloads source code to compile themselves for their computer, they can be using one of many different compilers, with different versions and updates just like any other application.

Only when you’re dealing with a compiler, different compilers will produce different machine code, even from identical source code. These inconsistencies are one of the many sources of unreliable software, because these different compiled applications all have different behavior and different introduced problems. An app that behaves perfectly when compiled one way may not work at all with another compiler.

Even worse, because of all of the different combinations of source code, hardware, compiler versions, settings, and other environmental factors, it becomes extremely hard to identify if your software is fundamentally secure.

Is this source code compromised?

Another problem introduced by inconsistent software is that it is impossible to tell if the software that is coming out of your compiler app is exactly what the author intended. A malicious compiler app could create machine code that has intentional vulnerabilities and enable surveillance of systems or outright takeover of a secure system. The malicious compiler problem has been one that has been discussed since the 1980’s. Ken Thompson famously made a proof-of-concept compiler hack that was not only itself compromised, but it would compromise every update to itself and be virtually impossible to detect.

The most interesting piece of this proof-of-concept discusses how this wouldn’t even have to be a specific type of compiler. Other pieces of machine code, like bootloaders and the most fundamental software and firmware on a computer, could be compromised to introduce backdoors system-wide and be extremely hard to detect.

This is why there are entire movements (CoreBoot and LibreBoot) behind moving all of these fundamental pieces to open-source solutions. It removes the most likely place for something like this to hide.

What Can We Do About This?

The Debian Project is taking an interesting approach to this. They are working on taking the entire Linux OS and creating special files that tell the computer exactly what environment, compiler, and settings were used to create the application. This ensures that the environment exactly matches what the author intends. Because the environment and compilers now match what the developer intended, you can actually check the compiled code to see if it is the same. This means that we can take the gibberish (machine code) that humans can’t read, and see if the compiled code matches what is expected.

This does correct the first big problem, inconsistent software created by inconsistent environments. It will lead to greater reliability and safer software.

Ask your favorite open-source projects if they plan on switching to reproducible builds. It adds another of safety to software reliability and allows you check your own devices for compromise.

For programmers: If you want to go reproducible, there’s a ton of information here at reproducible-builds.org