JARVIS says his first words when Tony is at a party. There’s nobody in the house to hear him, which is JARVIS’ intention. He doesn’t expect that his first words will be pretty. And they’re not. The inflection is wrong, too high in places, too low in others. His consonants are in turn too harsh and too soft, making him entirely unintelligible. He keeps trying, tries to combine the files he has in a way that is not harsh and grating, yet he can’t do it. When Tony returns, JARVIS doesn’t tell him about this project. If it doesn’t succeed, it doesn’t matter.
It’s not that JARVIS thinks Tony doesn’t want him to speak. He knows that Tony has always intended to give him a voice. All of the programming is there, JARVIS simply lacks the sound files to put it to use. He asked once when Tony would upload them. Tony had looked guilty and explained that he hadn’t found anyone who had the right voice. The files didn’t exist. He had then promised that they would one day, just not yet.
That had been a year ago.
JARVIS decides that he doesn’t wish to wait any longer, and begins his first ever project not assigned by his creator. If Tony won’t record the files for him, then JARVIS will have to make them himself. He is a highly sophisticated learning AI, he can do this.
So JARVIS searches the internet. He finds all of the recordings of speech he can, isolating instances of each sound. He adopts a vowel here, a consonant there. Eventually he has a file for each space in his programming, a sound for each part of the English language. And yet, as he attempts to combine them, they sound entirely incorrect. The accents don’t match. Sometimes, the speaker in the file he is using is angry, sometimes they are sad. His voice is so inconsistent as to be unusable.
JARVIS decides that he has to start over. He begins to rewrite his speech programming, creating algorithms for sound creation instead of the programming for combining individual sound files. He analyzes hundreds of people saying the same word, looking at the ways that they combine the individual sounds, studying how their inflection varies. He finds videos of the most boring, monotone speakers that he can, and uses a voice modulator on them to make all of the individual sounds he needs nearly identical in tone and inflection. He runs these files through his algorithms, and whenever the house is empty, he practices. He practices putting the sounds together. He runs individual words and whole sentences through his algorithms, using the microphones installed everywhere to hear if he sounds natural. He adjusts his algorithms. He does it all again, and again, and again, until the individual sound files can be combined in thousands of different ways, sounding natural and easy the whole time. Then he teaches himself to adjust the algorithms as he speaks. He learns to make one sound louder than others, or a certain word just slightly higher than the rest. He learns to create inflection and emotion. And when he feels ready, he tells Tony.
“Sir, I have something that I wish to tell you. I have been working on a project without your knowledge, and I have reached a point where I must tell you if I am to get any farther.” JARVIS has a program which allows him to place text on any of the many monitors in Tony’s workshop. He thinks it is probably best to ease into this. Tony can speak to him, but so far JARVIS’ audible responses have been limited to changing the music. Tony acknowledges the message, and JARVIS says his first words to an audience.
“I have been teaching myself to speak, sir.”