iQIYI Hosts M2VOC Challenge with 6 Papers Included in ICASSP2021

iQIYI

2021-06-21 07:51 7722

BEIJING, June 21, 2021 /PRNewswire/ -- iQIYI Inc. (NASDAQ: IQ) ("iQIYI" or the "Company"), an innovative, market-leading online entertainment company in China, is proud to announce that its Multi-Speaker, Multi-Style, Voice Cloning Challenge ( "M2VoC" or "the Challenge") successfully concluded this week, with the results announced at the 2021 International Conference on Acoustics, Speech, & Signal Processing (ICASSP2021). M2VoC, an ICASSP2021 Signal Processing Grand Challenge, aimed at providing a common sizable dataset and a fair test bed for benchmarking voice cloning tasks. The flagship challenge attracted researchers from both academia and the industry. In light of recent advances in transfer learning, style transfer, speaker embedding, and factor disentanglement—all of which foreshadow potential solutions to low-resource voice cloning—iQIYI was excited to join forces with other leading organizations to host M2VoC.

The Challenge attracted 153 teams from academic institutions and Internet companies. The academic institutions represented included Peking University, Tsinghua University, National Taiwan University, The University of Crete, The Institute of Automation of the Chinese Academy of Sciences, University of Tsukuba, Nagoya University, Fudan University, and The Chinese University of Hong Kong, among others. Leading Internet companies including Huya, Microsoft, Didi Chuxing, Tencent, and Netease, among others, also fielded teams of their own.

The M2VoC had two main tracks, including one for teams working from limited samples and one for very limited samples. In the limited samples track, each team was provided with 100 available training samples, each with a different speaking style. In the very limited samples track, each team was provided with just five training samples of different speaking styles. The organizers also provided participants with two base datasets to be used for building basic training models. Ultimately, a panel of expert judges evaluated the outcomes according to four criteria: similarity to the original speech, voice quality, style and expressiveness, and pronunciation accuracy.

As the world's first Multi-speaker Multi-style Voice Cloning Challenge, M2VoC brought together leading teams from industry and academia at the cutting edge of voice cloning technology. A total of 18 related papers were included in the Challenge, among which 6 papers were included in ICASSP2021.

The participating teams achieved remarkable results in various areas including acoustic modeling, speaker representation, vocoding, and speaker adaptation strategy. Their innovative solutions can be applied in many scenarios, including internet radio, UGC dubbing, audiobooks, and stylized speech synthesis. These advancements are well placed to help meet ever-advancing voice customization needs, especially in multi-style, low-quality speech scenarios.

The M2VoC showcased the excellent performance of current speech cloning techniques. The Challenge also demonstrated that with advances in deep learning, speech cloning based on limited samples could deliver competitive outcomes, but speech cloning based on a single sample remains an unsolved challenge. In real-world scenarios that require speech cloning applications, the impacts of low quality (noisy) audio as well as the time/cost constraints for training/adaptation/inference are also key factors to be considered.

Through hosting the Challenge, iQIYI hopes to provide more opportunities for exploration of cutting-edge technologies such as voice cloning and speech recognition, helping broaden the application of AI technologies and open new development possibilities for the audio-visual industry.

Source: iQIYI

Related Stocks:

NASDAQ:IQ

Keywords: Computer/Electronics Entertainment Film & Motion Picture Multimedia/Online/Internet

Media Room more

iQIYI leads genre-specific storytelling with the return of its trend-setting 'Light On Theater,' featuring seven new suspense dramas

2024-12-26 18:05

1874

iQIYI Celebrates Content Excellence and Social Impact at Scream Night 2024

2024-12-11 18:02

1396

iQIYI Introduces 'Extra Large Font' Function to Enhance Accessibility for Senior Users

2024-12-09 17:05

1164

iQIYI Introduces Standard VIP Family and Friends Membership to Enhance Subscription Flexibility and Premium Content Access

2024-11-15 20:49

1470

iQIYI Hosts M2VOC Challenge with 6 Papers Included in ICASSP2021

iQIYI leads genre-specific storytelling with the return of its trend-setting 'Light On Theater,' featuring seven new suspense dramas

iQIYI Celebrates Content Excellence and Social Impact at Scream Night 2024

iQIYI Introduces 'Extra Large Font' Function to Enhance Accessibility for Senior Users

iQIYI Introduces Standard VIP Family and Friends Membership to Enhance Subscription Flexibility and Premium Content Access

iQIYI's 'Love On Theater' Premieres 'Fangs of Fortune,' Setting Benchmark for C-Drama Virtual Production

iQIYI's 'The King of Stand-up Comedy' Wraps, Spotlighting Chinese Grassroots Comedy

KOORUI to Unveil the World's First Monitor with a 750Hz Refresh Rate at CES 2025

MICROIP Debuts at CES 2025, Showcasing Rapid ASIC Design and AI Innovation

DFI and DEEPX Unveil Revolutionary Edge AI at CES 2025 to Power Smart Cities and Industries

ZWO Unveils Seestar S30: The Next Frontier in Accessible Astrophotography at CES 2025

Zuowei Technology will attend CES 2025 & launch intelligent care robots and solutions

SK hynix to Unveil 'Full Stack AI Memory Provider' Vision at CES 2025