Mozilla's voice data crowdsourcing project Common Voice launches in Simplified Chinese Mandarin

Mozilla

2019-05-09 10:15 3979

English

Mozilla is now supporting voice data collection in Simplified Chinese Mandarin to build a publicly available voice dataset for everyone to use
Voice data collection in so far 27 languages, with 72 more in progress
With 18 different languages, adding up to almost 1,400 hours of recorded voice data from more than 42,000 contributors, the latest Common Voice data release marks the largest to-date public domain transcribed voice dataset.

TAIPEI, , May 8, 2019 /PRNewswire/ -- Mozilla, the non-profit organization behind the Open Source Firefox browser, is excited to announce that Common Voice, its initiative to crowdsource a large dataset of human voices for use in speech technology, has launched in Simplified Chinese Mandarin. Thanks to Mozilla's communities and our deeply engaged language partners people can now donate their voice at https://voice.mozilla.org/zh-CN.

Speech interfaces are the next frontier for the Internet. In-car assistants, smart watches, lightbulbs, bicycles and thermostats - the number of speech-enabled devices is increasing daily. However, there are barriers to global innovation: Startups, researchers or anyone else who wants to build voice-enabled technologies need large amounts of high quality, transcribed voice data on which to train machine learning algorithms. But publicly available datasets are limited, both in terms of quantity and language representation, and the cost of proprietary voice data -- owned by only a handful of companies -- is enormous.

Launched in June 2017, Mozilla's project Common Voice aims to change the current market dynamics by building a global corpus of open voice data that can power the voice interfaces of the future. Mozilla believes these interfaces shouldn't be controlled by a few companies as gatekeepers to voice-enabled services, and Mozilla wants users to be understood consistently, in their own languages and accents.

Voice data collection in so far 27 languages, including Simplified Chinese Mandarin

Since Mozilla enabled multi-language support in June 2018, Common Voice has grown to be more global and more inclusive. Over the last 10 months, volunteer communities have enthusiastically rallied around the project, launching data collection efforts in 27 languages with currently 72 more in progress on the Common Voice website.

Our latest addition is Simplified Chinese Mandarin. Speakers from around the world can now donate their voice or validate samples from others at https://voice.mozilla.org/zh-CN.

Mozilla’s voice data crowdsourcing project Common Voice launches in Simplified Chinese Mandarin.

Voice contributors also have the option to create a saved profile, which allows them to keep track of their progress. Providing some optional demographic profile information also improves the audio data used in training speech recognition accuracy.

As for all Common Voice languages, our goal for Simplified Chinese Mandarin is to capture about 10,000 validated hours of audio. This is approximately the number required to train a production speech recognition system. And the good thing is: Literally everyone can help reaching this goal and making voice recognition better. On the commute to work, on the bus, during lunch time, at home or together with friends and family. Either via voice.mozilla.org or the iOS app. All you need is your phone or your computer.

George Roter, Director Open Innovation Programs at Mozilla, said: "You may just record or listen for a few seconds - but imagine if hundreds of thousands of people did this! The more people help, the faster this dataset becomes valuable for everyone."

Multi-language dataset release

Following its promise Mozilla will continue to make the collected voice data available for use. In February this year Mozilla shared our first multi-language dataset with 18 languages represented, including English, French, German and Traditional Chinese Mandarin, but also for example Welsh and Kabyle. Altogether, the new dataset includes approximately 1,400 hours of voice clips from more than 42,000 people.

With this release, the continuously growing Common Voice dataset is now the largest of its kind, with tens of thousands of people contributing their voices and original written sentences to the public domain (CC0). The full dataset can be downloaded on the Common Voice website.

Mozilla’s voice data crowdsourcing project Common Voice launches in Simplified Chinese Mandarin.

George Roter added: "Mozilla aims to contribute to a more diverse and innovative voice technology ecosystem. Our goal is to both release voice-enabled products ourselves, while also supporting researchers and smaller players. We are thrilled to see the growing support we are getting to build the world's largest public multi-language voice dataset and we are grateful to all the volunteers who made the launch in simplified Mandarin Chinese possible."

Photo - https://photos.prnasia.com/prnh/20190508/2460230-1-a
Photo - https://photos.prnasia.com/prnh/20190508/2460230-1-b

Source: Mozilla

Keywords: Computer Software Computer/Electronics Internet Technology

Media Room more

Firefox Now Available with Enhanced Tracking Protection by Default Plus Updates to Facebook Container, Firefox Monitor and Lockwise

2019-06-05 08:30

5915

Mozilla's voice data crowdsourcing project Common Voice launches in Simplified Chinese Mandarin

Mozilla Proposes "Awakening" as the Word for 2020, Urges Internet Users to Wake up to the Reality of the Internet

Mozilla Announces Deal to Bring Firefox Reality to Pico Devices

Firefox Lite Transforms into Multi-function Mobile Browser Platform

Latest Firefox Brings Privacy Protections Front and Center

Firefox Reality Arrives for the Oculus Quest

Firefox Now Available with Enhanced Tracking Protection by Default Plus Updates to Facebook Container, Firefox Monitor and Lockwise

China Telecom Gulf Officially Launches in Saudi Arabia for Business

United Imaging Announces the First of Several New U.S. Product Launches For RSNA

Don't Miss These Incredible AiRROBO Black Friday Deals - Unbeatable Prices Await!

China Telecom Gulf Officially Launches in Saudi Arabia for Business

Hankyung.com introduces: MecKare, Leading the AI-powered Innovation in Health Monitoring Solution

Global Gen Z Views on Beijing: A Journey Through the City's Culture, Innovation, and Ecology