Jun 18, 2019
Modern speech synthesis is impressive, that should go without saying. The fact that a pocket-sized computer can create a voice so close to a real life human is truly a feat of programming. But where did that technology come from?
There is a rich and long history of humans trying to recreate the sound of our own voices, but for my purposes I'm more interested in the electronic age. Some of the first electronic attempts, started in the 1930s, include devices like the Vocoder and the Voder. These were analog devices that were able to recreate a human voice. In the case of the Vocoder, this was done by encoding a sample of speech. The Voder, on the other hand, was used by "playing" a keyboard to build up voice-like sounds. While the Voder was able to create a human-like voice from scratch, it required a lot of skill and training to use.
This is characteristic of many early speech synthesis devices. In general, these early attempts are not general purpose or easy to use. It wasn't until the 70s that we would start to see general purpose and easy to use speech synthesis solutions.
The biggest force behind this shift was Dennis Klatt, and MIT Speech and Hearing researcher. He is one of the key researchers responsible for a dramatic change in the field of speech synthesis. This was to switch to creating holistic models of human speech instead of just trying to imitate the sounds of a vocal tract. In doing so Klatt was able to create much more realistic sounding digital voices.
Klatt gave the field more than a better talking machine. He also created a pretty comprehensive history of speech synthesis. Part of this is outlined in his paper "Review of text-to-speech conversion for English" (https://pdfs.semanticscholar.org/5657/f5888e198fecf4612ff04c4b0bdef972147c.pdf). It's a little dense if you aren't used to reading academic papers, but in it he describes past attempts at speech synthesis as well as his own and his contemporary's work. The other part of hist documentation was a library of recordings of talking devices, ranging from early mechanical machines to the later DECtalk(a machine using Klatt's own algorithms).
Some of these archival recordings can be heard in "Klatt's Last Tapes" (http://communicationaids.info/history-speech-synthesisers/), a BBC Radio 4 program that goes over the history of speech synthesis and its use as an aide for people with disabilities.