System and method for crowdsourcing of word pronunciation verification
US-2015095031-A1 · Apr 2, 2015 · US
US9508341B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9508341-B1 |
| Application number | US-201414476075-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 3, 2014 |
| Priority date | Sep 3, 2014 |
| Publication date | Nov 29, 2016 |
| Grant date | Nov 29, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Features are disclosed for active learning to identify the words which are likely to improve the guessing and automatic speech recognition (ASR) after manual annotation. When a speech recognition system needs pronunciations for words, a lexicon is typically used. For unknown words, pronunciation-guessing (G2P) may be included to provide pronunciations in an unattended (e.g., automatic) fashion. However, having manually (e.g., by a human) annotated pronunciations provides better ASR than having automatic pronunciations that may, in some instances, be wrong. The included active learning features help to direct these limited annotation resources.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a computer-readable memory storing executable instructions; and one or more physical computer processors in communication with the computer-readable memory, wherein the one or more physical computer processors are programmed by the executable instructions to at least: automatically predict pronunciations using a grapheme-to-phoneme model for a list of words having manually provided pronunciations, wherein the grapheme-to-phoneme model is adapted to generate a predicted pronunciation and a confidence score for the predicted pronunciation based upon an input word; generate a prediction performance model for the grapheme-to-phoneme model based on: a comparison of automatically predicted pronunciations of the list of words with the manually provided pronunciations for the list of words, and confidence scores for the automatically predicted pronunciations, wherein the prediction performance model is adapted to generate performance information for the grapheme-to-phoneme model, the performance information indicating a degree of confidence in the grapheme-to-phoneme model to predict a new word added to the system based upon an input predicted pronunciation generated by the grapheme-to-phoneme model for the new word and the confidence score for the input predicted pronunciation generated by the grapheme-to-phoneme model; receive an electronic record including candidate words to be added to an automatic speech recognition lexicon, the automatic speech recognition lexicon including words having the automatically predicted pronunciations generated by the grapheme-to-phoneme model; generate, from the grapheme-to-phoneme model, a predicted pronunciation for a candidate word and a confidence score for a candidate word, wherein the candidate word is included in the candidate words; generate, from the prediction performance model, an annotation priority for the candidate word based on the confidence score for the predicted pronunciation of the candidate word and the predicted pronunciation of the candidate word; determine that the annotation priority for the candidate word exceeds a priority threshold; and route the candidate word to a manual pronunciation generator. 2. The system of claim 1 , wherein a confidence score for a word is determined using one or more of: a character language model score, a phoneme language model score, a length of the word, or a predicted pronunciation length. 3. The system of claim 1 , wherein the one or more physical computer processors are further programmed by the executable instructions to generate the prediction performance model by at least generating a performance regression model based on the manually provided pronunciation for the list of words, the predicted pronunciation for the list of words, and confidence scores for the list of words. 4. The system of claim 1 , wherein the one or more physical computer processors are further programmed by the executable instructions to at least: generate, from the prediction performance model, annotation priorities for predicted pronunciations of other candidate words based on the confidence scores for the predicted pronunciations of the other candidate words and the predicted pronunciations of the other candidate words; and determine that the annotation priority of the candidate word exceeds the annotation priorities of the other candidate words. 5. The system of claim 1 , the one or more physical computer processors are further programmed by the executable instructions to at least receive automatic speech recognition metrics for the candidate word, wherein generating the annotation priority for the candidate word is further based on the received automatic speech recognition metrics for the candidate word. 6. The system of claim 5 , wherein the automatic speech recognition metrics for a word includes linguistic frequency of the word within the lexicon, presentation frequency of the word to an automatic speech recognition system, skip rate of the word, or a correction rate for the word. 7. The system of claim 1 , wherein the one or more physical computer processors are programmed by the executable instructions to route the candidate word by at least: obtaining manual annotation resource information identifying an annotation resource; and generating a manual annotation route for the candidate word to the annotation resource based on the manual annotation resource information. 8. The system of claim 7 , wherein the one or more physical computer processors are further programmed by the executable instructions to at least: receive manually provided pronunciation information for the candidate word; and retrain the grapheme-to-phoneme model based on the generated manual pronunciation. 9. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, generating a prediction performance model for a pronunciation prediction model based on: a comparison of an automatically predicted pronunciation of a word with a manually provided pronunciation of the word, and confidence scores for the automatically predicted pronunciations, wherein the prediction performance model is adapted to generate performance information for a grapheme-to-phoneme model, the performance information indicating a degree of confidence in the grapheme-to-phoneme model to predict a new word added to a lexicon based upon an input predicted pronunciation generated by the grapheme-to-phoneme model for the new word and the confidence score for the input predicted pronunciation generated by the grapheme-to-phoneme model; generating, from the pronunciation prediction model, a pronunciation prediction for a candidate word to be added to the lexicon and a confidence score for the candidate word; generating, from the prediction performance model, an annotation priority for the candidate word based on the confidence score for the pronunciation prediction of the candidate word and the predicted pronunciation of the candidate word; generating a value identifying a relationship between the annotation priority for the candidate word and a threshold; and routing the candidate word to one of a manual pronunciation generator or an automatic pronunciation generator based on the value. 10. The computer-implemented method of claim 9 , wherein the pronunciation prediction model comprises a grapheme-to-phoneme model. 11. The computer-implemented method of claim 9 , wherein a confidence score for a word is determined using one or more of: a character language model score, a phoneme language model score, a length of the word, or a predicted pronunciation length. 12. The computer-implemented method of claim 9 , wherein generating the prediction performance model comprises generating a regression model based on the manually provided pronunciation for the word, the predicted pronunciation for the word, and a confidence score for the word. 13. The computer-implemented method of claim 9 , wherein routing the candidate word comprises: generating, from the prediction performance model, annotation priorities for predicted pronunciations of other candidate words based on the confidence scores for the predicted pronunciations of the other candidate words and the predicted pronunciations of the other candidate words; and determining that the annotation priority of the candidate word exceeds the annotation priorities of the other candidate words. 14. The computer-implemented method of claim 9 , further comprising receiving automatic speech recognition metrics for the candidate word, wherein sel
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Speech synthesis; Text to speech systems · CPC title
using natural language modelling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.