China is quietly building a national voiceprint database to allow automated speaker recognition

Updated on Jul 27, 2020 by Glyn Moody

It’s hardly a secret that China conducts massive surveillance of all kinds, as Privacy News Online has reported many times. And yet it seems that the authorities there are still coming up with new ways to check on their 1.4 billion citizens. For example, Human Rights Watch (HRW) has just written a fascinating description of moves to extend surveillance to include voiceprint recognition.

The Chinese voiceprint project has been underway for some years, and the HRW post pulls together the scant evidence of what has been happening. For example, in 2012, China’s Ministry of Public Security announced that the construction of national voice pattern database had begun, and designated Anhui province, located in the eastern region of the country, as one of the areas where pilot schemes would be run. Anhui’s leading role in the project is confirmed by subsequent orders issued by the provincial police bureau to accelerate the database construction, and tender documents from other police stations across the region seeking bids to install voice pattern collection systems locally.

Similar purchases were made in 2016 by the police bureaus in Xinjiang, a vast region with around 10 million ethnic minority Uyghurs, following a “Notice to Fully Carry Out the Construction of Three-Dimensional Portraits, Voice Pattern, and DNA Fingerprint Biometrics Collection System”. A local police station reported that front-line officers are given monthly quotas for biometric collection.

Along with Tibet, the Turkic-speaking Xinjiang is one of the most sensitive regions for China, so the roll-out of voice pattern surveillance there is particularly noteworthy. The HRW report lists other ways in which the local Uyghur population are now routinely required to provide voiceprints to the authorities, including as part of passport applications.

In April of this year, the state-owned flagship news site “The Paper” provided a glimpse of how the systems were being used in Anhui. It described how police there were running an automated speaker recognition system to monitor phone conversations in real time, picking out the targeted voice patterns of known scammers and sending an alert:

“A woman in Huainan, Anhui, received a scam call … just as the scammer was instructing her, step-by-step, how to transfer her money … the voice pattern recognition system, recognizing the scammers’ voice patterns, alerted the police; the police then directly cut off the phone conversation.”

That anecdote suggests that the voice pattern recognition system is being applied to all phone calls in the region, since there is no indication that the woman in question had been picked out for special monitoring. That’s significant, but what happened next is also telling. China Digital Times reports that the government authorities issued a censorship order to local media, instructing them to “Find and delete The Paper’s Article ‘Voiceprint Analysis Can Recognize Swindlers: Causes 80% Drop in Fake Legal Case Phone Scams in Anhui’; do not hype related technical content.” The hurried take-down of an official story suggests that despite the large-scale nature of the project, the Chinese authorities are trying to roll out the new surveillance system without its people being aware of its far-reaching capabilities.

The same China Digital Times post offers further information about the voice pattern recognition technology and the company behind it:

“The technology involved is reported to have been developed by the Intelligent Speech Technology Public Security Key Laboratory, established in 2012 by Anhui public security authorities and the University of Science and Technology of China’s Xunfei Information Technology, also known as iFlyTek. Its new Telephone Fraud Monitoring and Interception Platform can identify known scammers based on the voiceprint created by their unique biometric and behavioral characteristics. iFlyTek chairman Liu Qingfeng has claimed that the system, now integrated with local phone networks, reduced phone scams involving fake legal cases by 80% in Anhui in 2015, even as they rose by almost 70% nationally. The company claims that its voiceprint identification is more than 95% accurate, and besides fighting phone fraud can offer an extra layer of security in contexts like credit cards, remote stock trading, and social security.”

iFlyTek’s Web site makes the following points about its product:

“Voiceprint recognition delivers high security performance comparable to other biological recognition technologies (such as fingerprint, palm print, and iris identification). In addition, the technology requires only a telephone set and a microphone, and does not need any special equipment. The data collection is convenient. Therefore, it is the simplest, securest, and most reliable and cost-effective identity recognition method. The speaker’s voice can be recognized securely at any time based on the unique voiceprint. It is the only contactless biological recognition technology which can be used for remote control over a telephone channel.”

That “contactless” power to identify a speaker without them being aware becomes particularly worrying when you combine it with the fact that iFlyTek’s technology is ubiquitous in China. According to an article in Technology Review, iFlyTek’s developer product, called iFlytek Open Platform, provides “voice-based AI technologies to over 400,000 developers in various industries such as smart home and mobile Internet.” Over 500 million people have iFlytek Input installed on their smartphones. It’s an AI-based program that can translate and act on spoken input. iFlyTek’s technology is found widely in a range of environments:

“Court systems use its voice-recognition technology to transcribe lengthy proceedings; business call centers use its voice synthesis technology to generate automated replies; and Didi, a popular Chinese ride-hailing app, also uses iFlytek’s technology to broadcast orders to drivers.”

Like many modern AI-based systems, iFlyTek’s voice recognition technology draws on real-world data to improve its performance. According to the company: “The total number of users of the iFLYTEK voice cloud exceeds 0.89 billion, and the average number of daily interactions exceeding 3 billion.” In other words, the company’s AI system is constantly listening to what all its users are saying. Couple that with the voiceprint product, and the company potentially not only knows what is being said, but who is saying it – the perfect, automated and scaleable phone surveillance system for a government.

As the HRW post points out, other countries are already using voiceprint technologies, even if they lag behind China in terms of scale:

“Other governments have used automated speech recognition programs, including the United States for monitoring prison calls and Australia for verifying callers accessing social services; the Spanish police have more than 3,500 voice samples from people convicted of crimes.”

As the technology advances, and hardware costs continue to fall, we can expect to see more Western governments experimenting with voiceprint recognition for monitoring phone calls. The usual justifications will be trotted out – that this additional surveillance is needed to combat terrorism or organized crime. But once in place, voiceprint databases will doubtless be applied much more widely, just as other surveillance technologies have spread beyond their original narrow uses. And as voiceprints start to threaten privacy, there will presumably be technical solutions to mitigate their effect, followed by government pushback against their availability. But perhaps the real problem is not so much this uptake by the authorities, but the public’s naïve enthusiasm for new technologies – whether it’s Facebook or AI-based voice recognition – which they embrace without much thought about the possible privacy downsides.

Featured image by Henri Adolphe Laissement/Hampel Kunstauktionen.