Meta and Google lead the way to the pinnacle of AI translation

Meta and Google lead the way to the pinnacle of AI translation

As Meta AI celebrates its 10th anniversary, they are announcing an exciting breakthrough – the open-sourcing of their ‘Seamless Communication’ model. At the same time, Google launched Translation 3, a major milestone in unsupervised speech translation.These innovations breathe new life into speech translation technology, building a wider path to bridge languages across the globe.

Meta’s ‘seamless communication’ model: the new pinnacle of speech translation

Meta’s Seamless model is not just a technological innovation, it is a revolution in the field of speech translation. This open-source “Grand Unified Model” integrates all the features of three SOTA models, SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2, to form an unparalleled voice translation tool. nuances of speech, making translation more humanized. And SeamlessStreaming’s low latency and high accuracy opens a new chapter in the AI version of ‘simultaneous translation’. In this way, Meta AI, with Seamless as its flagship, helps users realize more natural and efficient voice communication in a multilingual environment.

Meta’s technological innovations: the power to change hidden in the details

Seamless’ excellence lies in its clever blend of technologies: SeamlessExpressive introduces an expression encoder that not only captures nuances such as pause and rate of speech, but also preserves the speaker’s voice style for a more humanized translation. SeamlessStreaming’s adaptive read/write strategy enables the model to intelligently determine when to output the next target text or speech segment, achieving a smoother voice translation experience. In addition, the upgraded SeamlessM4T v2 introduces more data for low-resource languages through the new SeamlessAlign, enabling the model to outperform the previous SOTA model on multiple tasks.

The Other Side of Technology: The Debut of SeamlessAlignExpressive

In addition to the staple Seamless model, Meta AI has also introduced SeamlessAlignExpressive – the first expressive speech alignment program. This innovation further improves the efficiency of speech alignment by automatically discovering audio segments with the same overall expressiveness from the raw data. To give users a better experience of multilingual translation, Meta AI has also created SeamlessAlignExpressive, a large benchmark test dataset, which undoubtedly opens up more possibilities for future speech alignment technologies.

Google’s ‘Translation 3’: a whole new chapter in unsupervised voice translation

After the shock of Meta AI, Google is also standing on the cusp of technological innovation with the launch of Translation 3. This new chapter in unsupervised speech translation employs a number of technologies such as SpecAugment, MUSE embedding, and reverse translation, which not only efficiently handles the translated vocabulary, but also deals with non-textual speech nuances such as pauses, speed of speech, speaker identity, and more. The uniqueness of Translation 3 is that it not only does not require direct supervision of the target language, but also learns directly from monolingual data, freeing it from the reliance on parallel data. This makes this model far superior to traditional systems in terms of translation quality, speaker similarity, and speech naturalness.

Three key aspects behind the technology: the mystery of the Translatotron 3

Behind the success of Translatotron 3 are three key aspects of technological innovation. First, pre-training with SpecAugment improves the encoder’s generalization ability. Second, unsupervised embedding mapping based on MUSE, on the other hand, allows the model to learn the shared embedding space between the source and target languages, providing a broader application prospect for multilingual translation. Finally, the reconstruction loss based on reverse translation allows the encoder to learn more meaningful multilingual representations. Together, these three innovations form a powerful cornerstone of Translatotron 3 unsupervised speech translation.

Conclusion: Opening a whole new era of AI voice translation

Meta’s “Seamless Communication” model and Google’s “Translation 3” mark a brand new era in AI voice translation technology. This not only builds a smoother bridge for communication between global languages, but also provides users with a more natural and intelligent voice translation experience. The emergence of these two technologies will surely bring far-reaching impacts in the social, commercial and cultural fields, and build a more closely connected world!


  • DeepMind's GNoME: AI drives materials science into a new era

    DeepMind’s GNoME: AI drives materials science into a new era

    Great breakthroughs in AI technology in materials science. Learn about GNoME’s technical principles, training process, and its potential impact on materials science. This AI tool has revolutionized materials science by successfully predicting millions of new crystal structures. Explore GNoME’s guided searches in materials space and the real-world successful applications scientists have achieved in electric car batteries, superconductors, and more.

    BLOG 2023-12-02
  • Unleashing the power of AI in application development: A look at Microsoft's leading framework

    Unleashing the power of AI in application development: A look at Microsoft’s leading framework

    In the fast-paced world we live in, the role of artificial intelligence (AI) in app development has become increasingly vital. Microsoft, the IT and Software giant, is taking significant steps to empower and foster a thriving AI ecosystem. Microsoft firmly believes that AI is the defining technology of this era. At the recently held Microsoft Build developers conference in 2023, both physically at the Seattle Convention Center and digitally, Microsoft outlined a comprehensive framework to empower app developers to seamlessly create AI-powered applications, AI plugins, and copilots. They recognize that AI will be the game-changer, acting as a catalyst to enhance user experiences and improve productivity. Microsoft is actively launching new products, platforms, and systems, with a laser-sharp focus on enabling app developers to unleash next-gen AI innovations. Through an expanded AI plugin ecosystem and innovative tools, Microsoft is revolutionizing the way app developers integrate AI into their projects, making…

    Industry News 2024-01-25
  • China's commercialization of autonomous driving takes a key step forward: new guidelines detail specifications and safety guarantees

    China’s commercialization of autonomous driving takes a key step forward: new guidelines detail specifications and safety guarantees

    China’s autonomous driving has taken a major turn as the Ministry of Transportation and Communications (MOTC) released the Guidelines on Transportation Safety Services for Autonomous Vehicles (for Trial Implementation). The guide regulates in detail the application scope, safety and security, and accident handling of self-driving cars, providing comprehensive guidance for the commercialization of self-driving. Learn about the safety equipment, operational specifications and accident records of self-driving technology in urban transportation.

    BLOG 2023-12-06

Contact Us

Call Us:
Working Hours:
Contact us, dear customer, we serve you wholeheartedly 24 hours