What technology area does this patent fall under?

Primary CPC classification G10L15/187. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Active learning for lexical annotations

US9508341B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9508341-B1
Application number	US-201414476075-A
Country	US
Kind code	B1
Filing date	Sep 3, 2014
Priority date	Sep 3, 2014
Publication date	Nov 29, 2016
Grant date	Nov 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Features are disclosed for active learning to identify the words which are likely to improve the guessing and automatic speech recognition (ASR) after manual annotation. When a speech recognition system needs pronunciations for words, a lexicon is typically used. For unknown words, pronunciation-guessing (G2P) may be included to provide pronunciations in an unattended (e.g., automatic) fashion. However, having manually (e.g., by a human) annotated pronunciations provides better ASR than having automatic pronunciations that may, in some instances, be wrong. The included active learning features help to direct these limited annotation resources.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a computer-readable memory storing executable instructions; and one or more physical computer processors in communication with the computer-readable memory, wherein the one or more physical computer processors are programmed by the executable instructions to at least: automatically predict pronunciations using a grapheme-to-phoneme model for a list of words having manually provided pronunciations, wherein the grapheme-to-phoneme model is adapted to generate a predicted pronunciation and a confidence score for the predicted pronunciation based upon an input word; generate a prediction performance model for the grapheme-to-phoneme model based on: a comparison of automatically predicted pronunciations of the list of words with the manually provided pronunciations for the list of words, and confidence scores for the automatically predicted pronunciations, wherein the prediction performance model is adapted to generate performance information for the grapheme-to-phoneme model, the performance information indicating a degree of confidence in the grapheme-to-phoneme model to predict a new word added to the system based upon an input predicted pronunciation generated by the grapheme-to-phoneme model for the new word and the confidence score for the input predicted pronunciation generated by the grapheme-to-phoneme model; receive an electronic record including candidate words to be added to an automatic speech recognition lexicon, the automatic speech recognition lexicon including words having the automatically predicted pronunciations generated by the grapheme-to-phoneme model; generate, from the grapheme-to-phoneme model, a predicted pronunciation for a candidate word and a confidence score for a candidate word, wherein the candidate word is included in the candidate words; generate, from the prediction performance model, an annotation priority for the candidate word based on the confidence score for the predicted pronunciation of the candidate word and the predicted pronunciation of the candidate word; determine that the annotation priority for the candidate word exceeds a priority threshold; and route the candidate word to a manual pronunciation generator. 2. The system of claim 1 , wherein a confidence score for a word is determined using one or more of: a character language model score, a phoneme language model score, a length of the word, or a predicted pronunciation length. 3. The system of claim 1 , wherein the one or more physical computer processors are further programmed by the executable instructions to generate the prediction performance model by at least generating a performance regression model based on the manually provided pronunciation for the list of words, the predicted pronunciation for the list of words, and confidence scores for the list of words. 4. The system of claim 1 , wherein the one or more physical computer processors are further programmed by the executable instructions to at least: generate, from the prediction performance model, annotation priorities for predicted pronunciations of other candidate words based on the confidence scores for the predicted pronunciations of the other candidate words and the predicted pronunciations of the other candidate words; and determine that the annotation priority of the candidate word exceeds the annotation priorities of the other candidate words. 5. The system of claim 1 , the one or more physical computer processors are further programmed by the executable instructions to at least receive automatic speech recognition metrics for the candidate word, wherein generating the annotation priority for the candidate word is further based on the received automatic speech recognition metrics for the candidate word. 6. The system of claim 5 , wherein the automatic speech recognition metrics for a word includes linguistic frequency of the word within the lexicon, presentation frequency of the word to an automatic speech recognition system, skip rate of the word, or a correction rate for the word. 7. The system of claim 1 , wherein the one or more physical computer processors are programmed by the executable instructions to route the candidate word by at least: obtaining manual annotation resource information identifying an annotation resource; and generating a manual annotation route for the candidate word to the annotation resource based on the manual annotation resource information. 8. The system of claim 7 , wherein the one or more physical computer processors are further programmed by the executable instructions to at least: receive manually provided pronunciation information for the candidate word; and retrain the grapheme-to-phoneme model based on the generated manual pronunciation. 9. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, generating a prediction performance model for a pronunciation prediction model based on: a comparison of an automatically predicted pronunciation of a word with a manually provided pronunciation of the word, and confidence scores for the automatically predicted pronunciations, wherein the prediction performance model is adapted to generate performance information for a grapheme-to-phoneme model, the performance information indicating a degree of confidence in the grapheme-to-phoneme model to predict a new word added to a lexicon based upon an input predicted pronunciation generated by the grapheme-to-phoneme model for the new word and the confidence score for the input predicted pronunciation generated by the grapheme-to-phoneme model; generating, from the pronunciation prediction model, a pronunciation prediction for a candidate word to be added to the lexicon and a confidence score for the candidate word; generating, from the prediction performance model, an annotation priority for the candidate word based on the confidence score for the pronunciation prediction of the candidate word and the predicted pronunciation of the candidate word; generating a value identifying a relationship between the annotation priority for the candidate word and a threshold; and routing the candidate word to one of a manual pronunciation generator or an automatic pronunciation generator based on the value. 10. The computer-implemented method of claim 9 , wherein the pronunciation prediction model comprises a grapheme-to-phoneme model. 11. The computer-implemented method of claim 9 , wherein a confidence score for a word is determined using one or more of: a character language model score, a phoneme language model score, a length of the word, or a predicted pronunciation length. 12. The computer-implemented method of claim 9 , wherein generating the prediction performance model comprises generating a regression model based on the manually provided pronunciation for the word, the predicted pronunciation for the word, and a confidence score for the word. 13. The computer-implemented method of claim 9 , wherein routing the candidate word comprises: generating, from the prediction performance model, annotation priorities for predicted pronunciations of other candidate words based on the confidence scores for the predicted pronunciations of the other candidate words and the predicted pronunciations of the other candidate words; and determining that the annotation priority of the candidate word exceeds the annotation priorities of the other candidate words. 14. The computer-implemented method of claim 9 , further comprising receiving automatic speech recognition metrics for the candidate word, wherein sel

Assignees

Amazon Tech Inc

Inventors

Classifications

G10L15/187Primary
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
G10L13/00
Speech synthesis; Text to speech systems · CPC title
G10L15/18Primary
using natural language modelling · CPC title

Patent family

Related publications grouped by family.

View patent family 57352001

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9508341B1 cover?: Features are disclosed for active learning to identify the words which are likely to improve the guessing and automatic speech recognition (ASR) after manual annotation. When a speech recognition system needs pronunciations for words, a lexicon is typically used. For unknown words, pronunciation-guessing (G2P) may be included to provide pronunciations in an unattended (e.g., automatic) fashion.…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).