Instant Voice-to-Text: Transforming Work Styles with the Power of AI

SINGAPORE, Jan. 8, 2020 /PRNewswire/ -- While Robot Process Automation (RPA) - using robots to automate work processes - has produced great results in automating tasks like document creation and data entry, certain tasks like recording meeting minutes and transcribing speeches still need to be carried out manually. Learn how Toshiba's newly developed speech recognition AI helps convert speech to text with high accuracy and contribute to increased productivity in the workplace and beyond.

Taira Ashikawa, Head of Research, Media AI Laboratory, Toshiba Corporate R&D Center

Hiroshi Fujimura, Lead researcher, Media AI Laboratory, Toshiba Corporate R&D Center

Photo (automatic speech subtitling system (left) and image of displayed subtitles (right))

More>

The technology behind the accuracy in speech recognition

In 2015, when Toshiba first began developing this form of AI, there was increasing momentum around the world in the field of information accessibility, which aims to create environments that enables the hearing-impaired to access and input information.

With insights from hearing-impaired employees wanting to participate in meetings and lectures in real time, Toshiba's development of speech recognition AI started with two points in mind -- to expand information accessibility for the hearing-impaired, and increase productivity.

Algorithms form the core of AI, and the development team explored a variety of approaches to increase accuracy. Toshiba's speech recognition AI not only recognizes speech with high accuracy, but also detects fillers and hesitation markers. By using the increasingly popular model known as Long Short-Term Memory (LSTM) as well as Connectionist Temporal Classification (CTC), the AI was taught about speech peculiarities such as fillers and hesitation markers that are exclusive to human beings.

Using lectures as an opportunity for verification testing, the speech recognition AI has achieved an average speech recognition ratio of 85%, recognizing the contents of speech above a certain level without editing or advance learning. Toshiba will continue to work on the improvement of this technology to achieve a fully accurate speech recognition offering with the goal of creating an environment where speakers of different languages will be able to enjoy a smooth conversation with one another.

Toshiba also sees potential in applying speech recognition AI to the manufacturing sector, where there is a need for hands-free voice collection and recording in factories during maintenance and inspections. In the future, Toshiba aims to use its accumulated knowledge and know-how on manufacturing facilities to seamlessly integrate speech recognition into their operations.

Instant Voice-to-Text: Transforming Work Styles with the Power of AI

SOCIAL MEDIA

MEDIA CONTACT

RELATED LINKS

LANGUAGES