System and method for user-specified pronunciation of words for speech synthesis and recognition

US9966060B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9966060-B2
Application numberUS-201715445863-A
CountryUS
Kind codeB2
Filing dateFeb 28, 2017
Priority dateJun 7, 2013
Publication dateMay 8, 2018
Grant dateMay 8, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet. The first set of phonemes is mapped to a second set of phonemes to generate a second phonetic representation, where the second set of phonemes is selected from a speech synthesis phonetic alphabet. The second phonetic representation is stored in association with a text string corresponding to the at least one word.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for learning word pronunciations, comprising: at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors: detecting an error in a speech based interaction with a digital assistant based on detecting a user input other than the speech based interaction; in response to detecting the error, receiving a speech input from a user, the speech input including a pronunciation of one or more words; determining, based on the pronunciation of the one or more words, a first set of phonemes from a speech recognition phonetic alphabet and a second set of phonemes from a speech synthesis phonetic alphabet; updating one or more databases to include the first set of phonemes and the second set of phonemes in association with a text string corresponding to the one or more words; and performing speech recognition or speech synthesis using the updated one or more databases. 2. The method of claim 1 , wherein the one or more words were received in a prior speech input provided by the user, and wherein the error is an error in speech recognition of the one or more words. 3. The method of claim 1 , wherein the one or more words were output in a speech output by the electronic device, and wherein the error is an error in speech synthesis of the one or more words. 4. The method of claim 1 , further comprising: receiving the speech input including the one or more words; performing speech recognition on the speech input to generate the text string corresponding to the one or more words; determining a confidence metric of the text string; and detecting the error based on a determination that the confidence metric does not meet a predetermined threshold. 5. The method of claim 1 , further comprising: synthesizing a speech output including the one or more words; and detecting the error based on an indication from the user that the one or more words were pronounced incorrectly. 6. The method of claim 1 , further comprising: performing speech recognition on the speech input to generate the text string corresponding to the one or more words; and wherein updating the one or more databases comprises updating a speech recognizer to associate the first set of phonemes with the text string. 7. The method of claim 6 , further comprising: receiving a second speech input including the one or more words; determining a third set of phonemes for the one or more words; determining that the one or more words in the second speech input correspond to the text string based on comparing the first set of phonemes and the third set of phonemes. 8. The method of claim 1 , further comprising: prior to receiving the speech input from the user and after detecting the error, prompting the user to provide the speech input, the speech input including a preferred pronunciation of the one or more words. 9. The method of claim 1 , further comprising: synthesizing a speech output including the one or more words; and displaying a textual version of the speech output on a display of the electronic device. 10. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device, cause the device to: detect an error in a speech based interaction with a digital assistant based on detecting a user input other than the speech based interaction; in response to detecting the error, receive a speech input from a user, the speech input including a pronunciation of one or more words; determine, based on the pronunciation of the one or more words, a first set of phonemes from a speech recognition phonetic alphabet and a second set of phonemes from a speech synthesis phonetic alphabet; update one or more databases to include the first set of phonemes and the second set of phonemes in association with a text string corresponding to the one or more words; and perform speech recognition or speech synthesis using the updated one or more databases. 11. The computer readable storage medium of claim 10 , wherein the one or more words were received in a prior speech input provided by the user, and wherein the error is an error in speech recognition of the one or more words. 12. The computer readable storage medium of claim 10 , wherein the one or more words were output in a speech output by the electronic device, and wherein the error is an error in speech synthesis of the one or more words. 13. The computer readable storage medium of claim 10 , wherein the instructions further cause the device to: receive the speech input including the one or more words; perform speech recognition on the speech input to generate the text string corresponding to the one or more words; determine a confidence metric of the text string; and detect the error based on a determination that the confidence metric does not meet a predetermined threshold. 14. The computer readable storage medium of claim 10 , wherein the instructions further cause the device to: synthesize a speech output including the one or more words; and detect the error based on an indication from the user that the one or more words were pronounced incorrectly. 15. An electronic device, comprising: one or more processors; and memory storing one or more programs, the one or more programs including instructions, which when executed by the one or more processors, cause the one or more processors to: detect an error in a speech based interaction with a digital assistant based on detecting a user input other than the speech based interaction; in response to detecting the error, receive a speech input from a user, the speech input including a pronunciation of one or more words; determine, based on the pronunciation of the one or more words, a first set of phonemes from a speech recognition phonetic alphabet and a second set of phonemes from a speech synthesis phonetic alphabet; update one or more databases to include the first set of phonemes and the second set of phonemes in association with a text string corresponding to the one or more words; and perform speech recognition or speech synthesis using the updated one or more databases. 16. The device of claim 15 , wherein the one or more words were received in a prior speech input provided by the user, and wherein the error is an error in speech recognition of the one or more words. 17. The device of claim 15 , wherein the one or more words were output in a speech output by the electronic device, and wherein the error is an error in speech synthesis of the one or more words. 18. The device of claim 15 , wherein the instructions further cause the one or more processors to: receive the speech input including the one or more words; perform speech recognition on the speech input to generate the text string corresponding to the one or more words; determine a confidence metric of the text string; and detect the error based on a determination that the confidence metric does not meet a predetermined threshold. 19. The device of claim 15 , wherein the instructions further cause the one or more processors to: synthesize a speech output including the one or more words; and detect the error based on an indication from the user that the one or more words were pronounced incorrectly.

Assignees

Inventors

Classifications

  • G10L13/027Primary

    Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title

  • Interactive procedures · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Training · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9966060B2 cover?
The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognitio…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L13/027. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 08 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).