We call Jamie Bosch, CEO and co-founder of Voicemod, to enquire about their recent funding round, but firstly we pose a query that may become increasingly necessary in the world of advancing generative AI: Is this actually your real voice?
Bosch’s business has been experimenting with sound impacts for almost 10 years, utilizing digital signal processing (DSP). To begin with, the primary center was on making fun “sound emoji” impacts and reactions for gamers to energize their voice visits. As the present, gamers are as yet its essential user-base. AI is driving developments in the sound field and Voicemod’s group is intending to utilize this to make new use-cases and significantly more clients for their devices.
The technology of DSP was used to modify a person’s actual voice, but modern advancements in AI are allowing companies like Voicemod to offer the power to create unnatural voices. Furthermore, this gives the user the ability to “steal” those voices in real-time, making it sound like they are speaking with a voice that isn’t theirs. It is an audio version of Snapchat lenses, TikTok’s teenage filter, or Reface’s celebrity face-swaps.
Artificial Intelligence (AI) technology can enable someone to alter the sound of their voice so it appears as if somebody else is speaking. Not only for casually discussing random topics, but more importantly for what’s generally referred to as “sing-to-sing voice conversion.” Open karaoke night up to a host of incredible new possibilities with the ability to sing as if you were in Freddie Mercury’s voice. You even could go one step further and sing Bohemian Rhapsody as Freddie, Brian May and Roger Taylor, if you have enough AI models and microphones. Oh my goodness!
Artificial intelligence is the driving force behind this potential development, yet there are legal and moral considerations which could prevent it from being widely utilised. Banks should pay attention as they propose for customers to offer a special voice signature to unlock their accounts.
Last year, Voicemod purchased Voctro Labs, a startup focusing on audio effects. According to Bosch, they are hoping to combine their competitive advantages to develop a powerful platform. This pairing has enabled the launch of a novel text-to-song feature in December, which utilizes AI to create vocal music from custom lyrics. Even more capabilities are to come, with the highlight being a sing-to-sing offering.
Voctro’s technology was featured in the creation of a virtual replica of the singing voice of Holly Herndon, which was depicted in a well-known Ted Talk last year. In the video, both Herndon’s AI clone and Pher’s actual voice sang simultaneously, making it an impressive audio-visual show. This is a small sample of what users can expect from Voicemod in the near future.
Bosch informs us that Voctro Labs will be introducing more products to enhance self-expression through artificial intelligence. In addition to music-related technology, such as converting text to song and linking two people together as they sing in real-time, there are numerous projects and new items in development.
We are looking to bolster our speech-to-speech AI real-time software, by mixing our tech with theirs. This fused technology will be more powerful than either one alone. We plan to join their sing-to-sing technology with our DSP tech to create an autotune feature. This could potentially help musicians adjust the tone of their voice. This will be quite intriguing.
Along with offering audio tools straight to the consumer/developer, its technologies are also available through SDKs and APIs which can be used by other companies to add to their own products like games, apps, and hardware. As such, their tech is broadly spread across the video gaming and creator industries, with demand forming at the source.
The use of generative AI to bring about disruption in the audio field is reflective of changes taking place in other areas such as graphics and illustration due to deep learning and the introduction of image-generation tools. Examples of this include DALL-E and Stable Diffusion. Advancements are being made in the written word, too, with large language models that are being used to develop AI chatbots such as ChatGPT which can compose song lyrics or a full essay when prompted. Google recently exhibited their AI music composer which can make tunes that align with the character of the music being requested. Though Google stated they will not be publicly releasing this system, other developers may make it available.
It is evident that AI is revolutionizing what one person is capable of creating. This is both exhilarating and worrying because how it is utilized matters.
It is anticipated that the near future will be focused on understanding how people utilize such effective AI programs readily available to them.
Voicemod is making a toolkit for content makers so that they will be able to not only exist in an ever-changing world, but also thrive. It’s aiming to create a sonic identity and voice avatars for the virtual universe, but it will also work to make sure you have sensational audio during video conferences. It’s similar to putting on make-up – use only when necessary.
Bosch believes that with AI, anyone can be a maker regardless of their expertise. They can create music and even generate voices.
This software could become immensely popular for social media sites like TikTok, YouTube Shorts, and Instagram. It may even turn into something like karaoke and be available on game consoles. As technology improves, this could turn into a resource for professionals who want to create music. Those wishing to develop vocal sounds for films or role-playing game characters could benefit from this.
We firmly trust that users should generate their own content, and that’s why we are devising tools to help them create their own voices and sounds. By employing technology, they can craft these, and with time, ultimately have their skills reach a professional level.
Bosch predicts that in the near future, individuals will have the capability to create their own voice using generative AI, which presently requires a team of sound engineers and designers to accomplish.
He is unsure if user experiences in the near future will be mainly utilizing prompts or more tools with AI technology embedded. However, he is excited that AI is allowing more people to become creators.
Although advancements in technology may make some roles obsolete, voice actors may still have a practical role in the development of AI. Machines are not able to accurately display pitch, intonation, or emotion through speech, so humans may need to be used to fill in these gaps. It’s a robotic copy of a voice with no emotion. Nick Cave might say that these AI voices lack any connection to reality; they are not familiar with the difficulties of being alive.
Bosch suggests that, despite the quality of voice, some human element is still required when sampling voices. It is not simply a matter of talking in a natural way and sounding like a famous person, as there is a need to control the cadence, rhythm and tone of the words to glean an impression. Thus, some acting is necessary. I believe that humans have the capability to express themselves in a key manner.
Could artificial intelligence with generative capabilities be taught to express emotion, through access to the right datasets provided by humans, and further its mimicry to evoke certain emotions like joy, sadness, love, or hatred when requested?
Bosch replied, “We’ll have to wait and see. As of now, from my perspective, AI is just something that humans use. However, we don’t know where it will end up.”
Voicemod is preparing for the unexpected in the future with their new injection of financing. The organization, founded in 2014, has been receiving revenue for a long period, in the forms of paid services, such as its main product, Voicemod for Desktop, with over 40 million downloads up to this point and 3.3 million active users every month as per Bosch. In addition, the company just earned $14.5 million for expansion capital, shortly after their $8 million Series A round in the summer of 2020. Leadwind, a growth fund managed out of Madrid by Kfund, made the largest investment in the round, with Minifund (Eros Resmini, formerly the CMO at Discord) and Bitkraft Ventures taking part as well.
Kfund partner Jamie Novoa is thrilled with the potential of generative AI in the creative sector, particularly in regards to building on the work of creatives in the audio industry. Generative AI has seen a surge in the last few months, but Novoa expects even greater things to come.
Many of the newest technologies have been unable to establish effective and successful business models, however Voicemod stands out from the crowd. It is used daily by millions of people and has generated satisfactory revenues. We are enthusiastic and eager to see what Jaime and his Voicemod team will achieve in the future.
Voicemod states that the additional funds will go toward increasing the technology behind its real-time Artificial Intelligence voice recognition features and making their services more attractive to the younger generation, gamers, content creators, and people of any level of expertise who need assistance with expressing themselves through their voices on the internet.
According to Bosch, a contributing factor for the requirements of more funding is because of their acquisition of Voctro Labs. Ultimately, he noted that it was a matter of utilizing the burgeoning chances stemming from the surge of AI equipment.
He expresses that the world is currently in the throes of a huge transformation due to the advancement of AI. In order to create and offer new technology, the team needs to be sufficiently funded. This is where their edge lies – because they have established a presence in the market and have a good following, they have the capacity to make this technology accessible to users quickly. He wishes to ensure they have the resources they need to do this, even if current market conditions are not ideal. The primary goal is to develop the most cutting-edge AI technology and provide it to consumers while also constructing applications that let them create their own content.
The first new software is due to arrive next month and will be a desktop version for Mac computers (it is currently only available on PCs). The eventual goal is to create a product that can be used on all platforms. The company plans to launch a mobile app at the start of the following quarter. Lastly, the company’s CEO commented that more is likely to come.
He also shares that the startup has been creating a watermarking system which it anticipates to make available in the second quarter of this year—giving websites a technique to detect AI-generated sounds which appear online.
A function like this could be an invaluable resource when it comes to stopping the various illicit activities (ill-gotten gains, cheatings, manipulation, mistreatment, harassment, provocation, etc etc) people could create as a result of voice-altering tools that let you sound like someone else.
Bosch explains that they have created an algorithm to watermark audio. Moderation of audio is complex as it depends on platforms its played on. For this reason, it is better that the channel be responsible for moderation. Their watermarking system will be provided to channels to determine whether the audio is real or created by a synthetic voice.
He posited that technology has the capacity of being employed for either positive or negative intentions, which is why they have put measures in effect to combat any misuse of it.
When it comes to acquiring permission to use training data, the laws in regards to artificial intelligence (AI) and generative AI are not up to date with current technology. This leaves startups that work in this area to decide whether to take advantage of the lack of laws in that area, or whether to be more cautious and deliberate in their decisions. Other companies in this area include Voice AI, Koe and ElevenLabs.
Bosch suggests that Voicemod utilizes a system of hiring (paid) voice actors to construct datasets which will aid in the development of AI models. If they want to utilize any content that is original, it is necessary to arrange an agreement with the IP provider and decide on their licensing terms. Therefore, it is an exciting time to be an IP lawyer.
He asserts that they are taking a pioneering position without laws. They attempt to be moral and take the proper action concerning their data. There is presently no legal ownership of the personal voice; their imprint is held by them. There is not a personal ownership of the print of the speaker’s voice. As of now.
It seems a bit like something out of a futuristic novel, yet perhaps one day we will possess something associated with our voice.
For the record, Bosch was speaking to me in his actual voice. The company’s real-time voice-altering technology hasn’t yet been perfected to be used on phones. However, he says that will be available in the near future. Hold on tight because the future of synthetic voices is sure to be an exhilarating experience.