Active learning for lexical annotations

US9508341B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9508341-B1
Application numberUS-201414476075-A
CountryUS
Kind codeB1
Filing dateSep 3, 2014
Priority dateSep 3, 2014
Publication dateNov 29, 2016
Grant dateNov 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Features are disclosed for active learning to identify the words which are likely to improve the guessing and automatic speech recognition (ASR) after manual annotation. When a speech recognition system needs pronunciations for words, a lexicon is typically used. For unknown words, pronunciation-guessing (G2P) may be included to provide pronunciations in an unattended (e.g., automatic) fashion. However, having manually (e.g., by a human) annotated pronunciations provides better ASR than having automatic pronunciations that may, in some instances, be wrong. The included active learning features help to direct these limited annotation resources.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a computer-readable memory storing executable instructions; and one or more physical computer processors in communication with the computer-readable memory, wherein the one or more physical computer processors are programmed by the executable instructions to at least: automatically predict pronunciations using a grapheme-to-phoneme model for a list of words having manually provided pronunciations, wherein the grapheme-to-phoneme model is adapted to generate a predicted pronunciation and a confidence score for the predicted pronunciation based upon an input word; generate a prediction performance model for the grapheme-to-phoneme model based on: a comparison of automatically predicted pronunciations of the list of words with the manually provided pronunciations for the list of words, and confidence scores for the automatically predicted pronunciations, wherein the prediction performance model is adapted to generate performance information for the grapheme-to-phoneme model, the performance information indicating a degree of confidence in the grapheme-to-phoneme model to predict a new word added to the system based upon an input predicted pronunciation generated by the grapheme-to-phoneme model for the new word and the confidence score for the input predicted pronunciation generated by the grapheme-to-phoneme model; receive an electronic record including candidate words to be added to an automatic speech recognition lexicon, the automatic speech recognition lexicon including words having the automatically predicted pronunciations generated by the grapheme-to-phoneme model; generate, from the grapheme-to-phoneme model, a predicted pronunciation for a candidate word and a confidence score for a candidate word, wherein the candidate word is included in the candidate words; generate, from the prediction performance model, an annotation priority for the candidate word based on the confidence score for the predicted pronunciation of the candidate word and the predicted pronunciation of the candidate word; determine that the annotation priority for the candidate word exceeds a priority threshold; and route the candidate word to a manual pronunciation generator. 2. The system of claim 1 , wherein a confidence score for a word is determined using one or more of: a character language model score, a phoneme language model score, a length of the word, or a predicted pronunciation length. 3. The system of claim 1 , wherein the one or more physical computer processors are further programmed by the executable instructions to generate the prediction performance model by at least generating a performance regression model based on the manually provided pronunciation for the list of words, the predicted pronunciation for the list of words, and confidence scores for the list of words. 4. The system of claim 1 , wherein the one or more physical computer processors are further programmed by the executable instructions to at least: generate, from the prediction performance model, annotation priorities for predicted pronunciations of other candidate words based on the confidence scores for the predicted pronunciations of the other candidate words and the predicted pronunciations of the other candidate words; and determine that the annotation priority of the candidate word exceeds the annotation priorities of the other candidate words. 5. The system of claim 1 , the one or more physical computer processors are further programmed by the executable instructions to at least receive automatic speech recognition metrics for the candidate word, wherein generating the annotation priority for the candidate word is further based on the received automatic speech recognition metrics for the candidate word. 6. The system of claim 5 , wherein the automatic speech recognition metrics for a word includes linguistic frequency of the word within the lexicon, presentation frequency of the word to an automatic speech recognition system, skip rate of the word, or a correction rate for the word. 7. The system of claim 1 , wherein the one or more physical computer processors are programmed by the executable instructions to route the candidate word by at least: obtaining manual annotation resource information identifying an annotation resource; and generating a manual annotation route for the candidate word to the annotation resource based on the manual annotation resource information. 8. The system of claim 7 , wherein the one or more physical computer processors are further programmed by the executable instructions to at least: receive manually provided pronunciation information for the candidate word; and retrain the grapheme-to-phoneme model based on the generated manual pronunciation. 9. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, generating a prediction performance model for a pronunciation prediction model based on: a comparison of an automatically predicted pronunciation of a word with a manually provided pronunciation of the word, and confidence scores for the automatically predicted pronunciations, wherein the prediction performance model is adapted to generate performance information for a grapheme-to-phoneme model, the performance information indicating a degree of confidence in the grapheme-to-phoneme model to predict a new word added to a lexicon based upon an input predicted pronunciation generated by the grapheme-to-phoneme model for the new word and the confidence score for the input predicted pronunciation generated by the grapheme-to-phoneme model; generating, from the pronunciation prediction model, a pronunciation prediction for a candidate word to be added to the lexicon and a confidence score for the candidate word; generating, from the prediction performance model, an annotation priority for the candidate word based on the confidence score for the pronunciation prediction of the candidate word and the predicted pronunciation of the candidate word; generating a value identifying a relationship between the annotation priority for the candidate word and a threshold; and routing the candidate word to one of a manual pronunciation generator or an automatic pronunciation generator based on the value. 10. The computer-implemented method of claim 9 , wherein the pronunciation prediction model comprises a grapheme-to-phoneme model. 11. The computer-implemented method of claim 9 , wherein a confidence score for a word is determined using one or more of: a character language model score, a phoneme language model score, a length of the word, or a predicted pronunciation length. 12. The computer-implemented method of claim 9 , wherein generating the prediction performance model comprises generating a regression model based on the manually provided pronunciation for the word, the predicted pronunciation for the word, and a confidence score for the word. 13. The computer-implemented method of claim 9 , wherein routing the candidate word comprises: generating, from the prediction performance model, annotation priorities for predicted pronunciations of other candidate words based on the confidence scores for the predicted pronunciations of the other candidate words and the predicted pronunciations of the other candidate words; and determining that the annotation priority of the candidate word exceeds the annotation priorities of the other candidate words. 14. The computer-implemented method of claim 9 , further comprising receiving automatic speech recognition metrics for the candidate word, wherein sel

Assignees

Inventors

Classifications

  • G10L15/187Primary

    Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Speech synthesis; Text to speech systems · CPC title

  • G10L15/18Primary

    using natural language modelling · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9508341B1 cover?
Features are disclosed for active learning to identify the words which are likely to improve the guessing and automatic speech recognition (ASR) after manual annotation. When a speech recognition system needs pronunciations for words, a lexicon is typically used. For unknown words, pronunciation-guessing (G2P) may be included to provide pronunciations in an unattended (e.g., automatic) fashion.…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).