The first computing era was based on the keyboard. Mainframes, minicomputers, personal computers – they were all controlled using fingers typing out commands. Later on, the graphical front end of the Macintosh and Windows allowed people to point and click, but the keyboard was still there for text to be entered at some point.
The big shift away from that keyboard-based world – with a little graphical user interface (GUI) decoration on top – came with smartphones, where a mouse-based GUI is not an option. Instead, the primary means of entering data and controlling the operation of the device is through the touchscreen. Direct entry of text is minimized, since it is a clumsy operation on a small screen. It’s no accident that Twitter and Instagram are among the most popular uses of smartphones – neither requires much writing.
The next shift beyond touchscreen control is already underway. People are starting to use voice commands on their phones for some operations. But the smartphone is not the best device for voice commands. Often, we use them in public, where background noise may be a problem, and lack of privacy certainly is. The natural domain for voice commands is the home, which is usually quiet, and generally private.
The main platform for voice-based systems is the so-called “smart” speaker, for example, Amazon’s Echo, or Google’s Home. It might seem an exaggeration to call these devices a platform, since their sales are still relatively low, and their capabilities are quite limited. But that’s only because the West is not in the vanguard here: China is leading the way. Smart speakers are taking off rapidly there, for a number of reasons. For example, they sidestep the issue of how best to input Chinese characters – certainly possible, but less convenient than inputting Western letters. They are also good first devices for older users who may not have computers or the keyboard skills to use them – a huge potential market in China. Another crucial factor is that privacy issues arising from products that eavesdrop on everything we say in our homes are less to the fore in a country where the government has built the world’s most complete surveillance society.
As an article in the South China Morning Post explains, the main high-tech companies in China – Alibaba, Baidu and Tencent – are pouring money into this sector, which they see as the next big battlefield in the digital world. So great is the desire to build market share quickly, that some devices are being sold for as little as $15 each. Companies are willing to offer models at these knock-down prices because what they want for tomorrow is more important than a few dollars more today. High-volume sales of smart speakers will give them huge quantities of voice data for training and improving their back-end AI systems, and a major share of the new platform that is already making money in China:
Baidu is counting on smart speakers to generate income from paid content and co-branding efforts with online merchants. “There are over 30,000 paid items accessible via Xiaodu speakers, like Meituan food delivery and a kids story app,” Baidu’s Jing said.”
Previous posts on this blog have already noted that smart speakers pose serious threats to privacy. If, as seems likely, they become one of the main ways that people control and interact with technology, a key question becomes: how can we ensure that privacy is not jettisoned in the process?
At the heart of the problem with smart speakers is that they depend on huge cloud-based computing facilities to analyze and respond to spoken commands. This means users have to trust what companies do with all the highly-personal information sent from the speaker unit, because there is no way to check what is happening, or to control it. Open source software has obvious advantages here, since in general it allows people – well, experts – to inspect the code running on a computer and to check, for example, how personal data is being used. But that approach doesn’t work well in the cloud. If modifications are made to a free software program running in the cloud, but those changes are not distributed, there is no requirement to make the modifications available for scrutiny. The special Affero General Public License (AGPL) was created to address this loophole in other free software licenses. Unfortunately, the AGPL has not been widely adopted, and so is not much help in practice.
If open source in the cloud is not going to help protect privacy on smart speakers, another option is to use free software on the device itself, and to carry out local analysis of personal information. That way, there is no risk that sensitive data will be misused, because it remains under the control of the user. Creating a free version of the new AI systems processing voice commands, which are about to invade not just homes, but also offices, factories and transport systems, is challenging. However, one project trying to do just that is Mycroft. Here’s how the original Kickstarter page for the project described the initiative:
Mycroft is the world’s first open source, open hardware home A.I. platform. It is a state of the art A.I. based on Raspberry Pi 2 and Arduino – two of the world’s most popular open development platforms.
Mycroft’s voice stack uses the following open source elements: Mimic, a machine learning Text-to-Speech engine; Precise, a Wake Word listener; Adapt, a library for converting natural language into machine-readable data structures; and Padatious, a neural network intent parser. The team is working with Mozilla to build DeepSpeech, an open Speech-to-Text technology, and supporting Mozilla’s WebThings to make IoT control systems that are both easy to use and easy to set up.
Mycroft has been underway for a while, and is currently working on Mycroft Mark II, but has recently hit some problems. An obvious issue is that the project’s resources are dwarfed by those of Amazon, Google and the Chinese giants. The open source community urgently needs to rally around Mycroft, which is notable for its respect for user privacy: “We promise to never sell your data or give you advertisements on our technology.” Support can be provided in a number of ways. Alongside obvious ones like helping to produce code and documentation, writing blog posts and making video tutorials, there are others that involve extending the voice features of the project through recording new voices and new languages.
In addition to bolstering Mycroft, it would also be good to help advance the other free software voice assistant projects that are at various stages of development. Time is running out, though. The big companies are staking territorial claims to large swathes of the third platform, and will soon have impressive installed bases that will make disseminating free software versions harder. If more is not done in this area soon, the battle to preserve privacy in a world full of eavesdropping smart speakers will be lost, maybe forever.
Featured image by Mycroft.