When the dialect meets the AI, will the intelligent voice assistant be defeated by the dialect?


Speech recognition technology has been evolving since the release of IBM's Shoebox and Worlds of Wonder doll Julie doll. By the end of 2018, Google Assistant supports more than 30 different languages.

Qualcomm has also developed a speech recognition system that recognizes words and phrases with an accuracy rate of 95%. In addition, Microsoft's intelligent voice service is more accurate and efficient than manual call services.

However, although machine learning has made the development of speech recognition technology soaring, these speech recognition systems are still not perfect, and the most serious problem is geographical discrimination.


According to a recent study by the Washington Post, popular smart voice assistants developed by Google and Amazon recognize non-US local accents with a 30% lower accuracy than US local accents.

Companies like IBM and Microsoft use the Switchboard corpus to reduce the voice assistant's error rate. But it turns out that the corpus can't completely solve the accent recognition problem of the voice assistant.

“The data is confusing because the data reflects humanity,” said Rumman Chowdhury, Accenture's global responsibility AI supervisor. “This is where the algorithm works best: seeking human behavior patterns.”

Algorithm bias represents the degree to which a machine learning model biases data or design. Many news reports have produced a lot of bias on facial recognition systems (especially the image recognition Rekognition of Amazon Web Services).

Moreover, algorithmic biases can occur in other areas, such as predicting whether defendants will be automated in future crime systems and content recommendation algorithms behind apps such as Google News.

Microsoft and AI industry leaders including IBM, Qualcomm and Facebook have developed automated tools to detect and reduce bias in AI algorithms, but few have been able to come up with specific solutions to accent identification problems.

There are only two companies that really come up with a solution. One is Speechmatics and the other is Nuance.

Solve the problem of accent gap
Speechmetrics, a Cambridge technology company specializing in enterprise speech recognition software, implemented an ambitious program 12 years ago to develop a more accurate and comprehensive language recognition system than any product on the market.

The company originally studied statistical language modeling and recurrent neural networks. It developed a machine learning model that can handle sequences of memory output. In 2014, it took the first step by accelerating the development of its statistical language modeling with a gigabyte corpus.

In 2017, it has taken another milestone: in collaboration with the Qatar Computing Institute (QCRI), the Arabic language conversion service was developed.

“We have found that we need to develop a speech recognition system that works in all languages ​​in one mode, no longer has an accent problem, and it recognizes the accuracy of an Australian accent as high as a Scottish accent.” Speechmatics Chief Executive Officer Benedikt von Thüngen said.

They successfully developed a speech recognition system called Global English in July this year. It has thousands of hours of voice data and tens of billions of words in more than 40 countries, supporting voice and text conversion for all English accents.

In addition, the establishment of Global English is inseparable from Speechmatic's Automatic Linguist, an artificial intelligence framework that learns the language foundation of a new language by leveraging patterns recognized in known languages.

"Suppose you want to talk to Americans on the other side, and you have to communicate with Australians on the other side, and this American once lived in Canada, so there is a Canadian accent, and most speech recognition systems will have a hard time recognizing this difference. The language of the accent, but our speech recognition system does not have to worry about this problem," Ian Firth, vice president of products at Speechmatics, said in an interview.

In the test, Global English performed better than Google's Cloud Speech API and IBM's Cloud in recognizing specific accents. Thüngen said that in the high-end field, its accuracy is 23% to 55% higher than other products.

Speechmatics is not the only company that wants to solve the accent identification problem.

Nuance, based in Burlington, Mass., says it will use a variety of methods to ensure that its speech recognition system recognizes nearly 80 languages ​​with the same high accuracy.

In its English language model, it collects speech and text data for 20 specific dialect regions, including words that are unique to each dialect (such as the word "cob" specifically for bread rolls) and its pronunciation. Therefore, this Nuance speech recognition system recognizes 52 different expressions of “Heathrow”.

Today, the Nuance speech recognition system has a new development. The updated version of Dragon is a combination of custom voice text conversion software developed by Nuance. Its machine learning model automatically switches between several different dialects based on the user's accent.

Compared with the old version without dialect automatic switching function, the new version of the speech recognition system recognizes the accuracy of English with a Spanish accent by 22.5%, and the accuracy of identifying the southern American dialect by 16.5%, which identifies the accuracy of Southeast Asian English. The rate is 17.4% higher.

The more data, the better
Ultimately, the accent of speech recognition is due to insufficient data. The higher the quality of the corpus, the more diverse the language model, and at least theoretically the higher the accuracy of the speech recognition system.

In the Washington Post study, the Google Home Intelligent Voice Assistant recognized the accuracy of the Southern American language at a rate that was 3% lower than the accuracy of identifying the Western American language. The accuracy of Amazon's Echo recognition of the Midwestern language is 2% lower.

A spokesperson for Amazon told the Washington Post that as more users speak in different accents, Alexa's speech recognition capabilities will continue to improve. And, Google said in a statement that they will continue to improve Google Assistant's speech recognition technology by expanding their own databases.

As more and more users use voice recognition systems, their functionality will be further enhanced. According to market research firm Canalys, nearly 100 million smart voice systems are sold globally by 2019. And, by 2022, about 55% of American households will have an intelligent voice system.

Don't expect a solution that completely solves the accent problem. "According to current technology development, you can't develop a speech recognition system that has the highest accuracy and is suitable for users all over the world," Faith said. "The best thing you can do is to ensure that these speech recognition systems accurately recognize the accents of the users who are using them."

Follow Me
Link:Tenco

                                                             ——END——

评论

此博客中的热门博文

RoboMaster Ends: Very Cool Robot Design Competition

The sixth generation of Xiao Bing is online. Why did Microsoft spend four years exploring emotional AI?

These raspberry pi products let you feel its charm and fun.