Meta Unveils Voicebox, Text-to-Speech Generative AI Tool

[ad_1]

Voicebox is Meta’s breakthrough in generative speech AI, which transforms textual content into real looking and expressive speech. The AI software, which works equally to ChatGPT or Dall-E, is a complicated AI mannequin able to performing speech era duties like content material modifying, sampling, and magnificence conversion, even with out particular coaching, because of in-context studying.

Meta Unveils Voicebox, Text-to-Speech Generative AI Tool

Printed: 19 June 2023, 12:00 pm Up to date: 19 Jun 2023, 11:32 am

It units itself aside from different text-to-speech fashions by excelling in varied duties equivalent to noise elimination, text-to-speech synthesis and cross-lingual fashion switch, pushing the boundaries of artificial speech era. Voicebox additionally surpasses present fashions in velocity, working at a 20 occasions quicker charge.

Voicebox underwent intensive coaching utilizing a dataset comprising over 50,000 hours of unfiltered audio. The AI mannequin was educated utilizing Meta’s progressive “Move Matching” method, a flexible different to diffusion-based studying strategies employed by different generative fashions.

Meta’s coaching dataset contains recorded speech and transcripts from public-domain audiobooks in a number of languages, equivalent to English, French, Spanish, German, Polish, and Portuguese.

In keeping with Mark Zuckerberg, Voicebox is “the primary ever generative AI speech mannequin that may do duties it wasn’t particularly educated on.”

Supply: Mark Zuckerberg

Sooner or later, Voicebox and comparable AI fashions can present natural-sounding voices for digital assistants and non-player characters within the metaverse. They will additionally allow visually impaired people to listen to written messages in acquainted voices by way of AI and provide creators straightforward instruments for modifying audio tracks in movies.

Voicebox and the Risks of Deepfakes

Nevertheless, Voicebox would possibly pose some moral and social challenges, particularly within the context of deepfakes. Deepfakes, created by AI fashions, are artificial media that manipulate an individual’s voice, usually maliciously. Voicebox might create convincing deepfakes that impersonate somebody’s voice or make them say issues they by no means mentioned. This might have critical implications for privateness, safety, and belief.

Microsoft’s president Brad Smith raised issues final month concerning the hurt brought on by deepfakes. He emphasised the necessity for mechanisms to distinguish between real and AI-generated materials, significantly in circumstances of malicious intent. He referred to as for accountability and security measures to take care of human management over essential infrastructure ruled by AI methods. Moreover, he proposed a system the place builders monitor utilization and supply transparency to determine manipulated movies, just like a KYC strategy.

Meta claims that it’s conscious of the potential hurt that Voicebox might trigger and that the corporate is engaged on an efficient technique to distinguish between genuine speech and audio generated by Voicebox. Whereas Voicebox remains to be present process growth and never at present accessible to the general public, Meta acknowledges the potential dangers related to superior AI expertise.

Learn extra:

[ad_2]

Source link