Wake word evaluation
US-9275637-B1 · Mar 1, 2016 · US
US2016155437A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016155437-A1 |
| Application number | US-201414557751-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 2, 2014 |
| Priority date | Dec 2, 2014 |
| Publication date | Jun 2, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus are described for inducing a user of a speech recognition system to adjust their own behavior. For example, in one implementation, a speech recognition system that allows children to control electronic devices can improve the child's speech development, by encouraging the child to speak more clearly. To do so, the speech recognition system can generate a phonetic representation of a term spoken by the child, and can determine whether the phonetic representation matches a particular canonical pronunciation of the particular term that is deemed age-appropriate for the child. Upon determining that the particular canonical pronunciation that matches the phonetic representation of the term spoken by the child is not age-appropriate, the speech recognition system can select and implement a variety of remediation strategies for inducing the child to repeat the term using a pronunciation that is considered age-appropriate.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: receiving audio data corresponding to a user speaking a particular term; generating a phonetic representation of the particular term based on the audio data; determining that the phonetic representation matches a particular canonical pronunciation of a particular term, wherein the particular canonical pronunciation is associated with an indication of age-appropriateness; obtaining data that indicates an age of the user; determining, based on a comparison of (i) the data that indicates the age of the user and (ii) indication of age-appropriateness that is associated with the particular canonical pronunciation of the particular term, that the pronunciation of the particular term by the user is not age-appropriate; based on determining that the pronunciation of the particular term by the user is not age appropriate, selecting a remediation strategy for inducing the user to speak the particular term using an age-appropriate pronunciation. 2 . The computer-implemented method of claim 1 , comprising: selecting, from among a plurality of canonical pronunciations stored in a phonetic dictionary, the particular canonical pronunciation as a best match of the phonetic representation generated of the particular term. 3 . The computer-implemented method of claim 2 , comprising: storing, in the phonetic dictionary, a plurality of canonical pronunciations associated with the particular term, wherein the plurality of canonical pronunciations includes the particular canonical pronunciation selected for the particular term, and wherein two or more of the plurality of canonical pronunciations include an indication of age-appropriateness. 4 . The computer-implemented method of claim 1 , wherein the indication of age-appropriateness comprises a maximum age, and wherein, determining that the pronunciation of the particular term by the user is not age-appropriate comprises determining that the age of the user is greater than the maximum age. 5 . The computer-implemented method of claim 1 , wherein the remediation strategy for inducing the user to speak the particular term using an age-appropriate pronunciation involves prompting the user to speak the particular term again. 6 . The computer-implemented method of claim 1 , wherein the remediation strategy for inducing the user to speak the particular term using an age-appropriate pronunciation involves outputting audio data corresponding to a pronunciation of the particular term that not age-appropriate. 7 . The computer-implemented method of claim 6 , wherein outputting audio data corresponding to a pronunciation of the particular term that is not age-appropriate comprises outputting the received audio data corresponding to the user speaking the particular term. 8 . The computer-implemented method of claim 6 , wherein outputting audio data corresponding to a pronunciation of the particular term that is not age-appropriate comprises generating a text-to-speech output using the particular canonical representation that matches the phonetic representation. 9 . The computer-implemented method of claim 1 , wherein the remediation strategy for inducing the user to speak the particular term using an age-appropriate pronunciation involves (i) selecting another canonical pronunciation of the particular term that is determined to be age-appropriate, and (ii) outputting audio data corresponding to the selected other canonical pronunciation. 10 . The computer-implemented method of claim 1 , wherein the remediation strategy for inducing the user to speak the particular term using an age-appropriate pronunciation involves initiating an action associated with the particular term despite the determination that the pronunciation of the particular term by the user is not age-appropriate. 11 . The computer-implemented method of claim 1 , comprising: before selecting a remediation strategy, obtaining biometric data associated with the user; and determining that the biometric data satisfies a predetermined emotional threshold, wherein the remediation strategy is selected based on determining that the biometric data satisfies the predetermined emotional threshold. 12 . The computer-implemented method of claim 1 , wherein the remediation strategy for inducing the user to speak the particular term using an age-appropriate pronunciation involves (i) detecting another person within a predetermined distance of the user, and (ii) sending a message to the other person indicating that the pronunciation of the particular term by the user is not age-appropriate. 13 . The computer-implemented method of claim 1 , comprising: after selecting the remediation strategy, receiving, additional audio data corresponding to the user speaking the particular term again; generating a phonetic representation of the particular term based on the additional audio data; determining that the phonetic representation of the particular term in the additional audio data matches an age-appropriate canonical pronunciation of the particular term; and based on determining the phonetic representation of the particular term in the additional audio data matches an age-appropriate canonical pronunciation of the particular term, initiating an action associated with the particular term. 14 . A system, comprising: one or more computers programmed to perform operations comprising: receiving audio data corresponding to a user speaking a particular term; generating a phonetic representation of the particular term based on the audio data; determining that the phonetic representation matches a particular canonical pronunciation of a particular term, wherein the particular canonical pronunciation is associated with an indication of age-appropriateness; obtaining data that indicates an age of the user; determining, based on a comparison of (i) the data that indicates the age of the user and (ii) indication of age-appropriateness that is associated with the particular canonical pronunciation of the particular term, that the pronunciation of the particular term by the user is not age-appropriate; based on determining that the pronunciation of the particular term by the user is not age appropriate, selecting a remediation strategy for inducing the user to speak the particular term using an age-appropriate pronunciation. 15 . The system of claim 14 , wherein the operations further comprise selecting from among a plurality of canonical pronunciations stored in a phonetic dictionary, the particular canonical pronunciation as a best match of the phonetic representation generated of the particular term. 16 . The system of claim 14 , wherein the operations further comprise storing in the phonetic dictionary, a plurality of canonical pronunciations associated with the particular term, wherein the plurality of canonical pronunciations includes the particular canonical pronunciation selected for the particular term, and wherein two or more of the plurality of canonical pronunciations include an indication of age-appropriateness. 17 . The system of claim 14 , wherein the indication of age-appropriateness comprises a maximum age, and wherein determining that the pronunciation of the particular term by the user is not age-appropriate comprises determining that the age of the user is greater than the maximum age. 18 . A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Phonemes, fenemes or fenones being the recognition units · CPC title
for comparison or discrimination · CPC title
Speaking (with audible presentation of the material to be studied G09B5/04) · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.