System and method for out-of-vocabulary phrase support in automatic speech recognition
US-2021343277-A1 · Nov 4, 2021 · US
US11862152B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11862152-B2 |
| Application number | US-202117214462-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 26, 2021 |
| Priority date | Mar 26, 2021 |
| Publication date | Jan 2, 2024 |
| Grant date | Jan 2, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are system, apparatus, article of manufacture, method, and computer program product embodiments for adapting an automated speech recognition system to provide more accurate suggestions to voice queries involving media content including recently created or recently available content. An example computer-implemented method includes transcribing the voice query, identifying respective components of the query such as the media content being requested and the action to be performed, and generating fuzzy candidates that potentially match the media content based on phonetic representations of the identified components. Phonetic representations of domain specific candidates are stored in a domain entities index and is continuously updated with new entries so as to maintain the accuracy of the speech recognition of voice queries for recently created or recently available content.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, by a display device, for adapting an automatic speech recognition engine, comprising: receiving a voice query that includes an action and requested media content; generating a transcription of the voice query, wherein the transcription is generated using the automatic speech recognition engine and wherein the transcription includes a textual representation of the requested media content; parsing the transcription to identify an entity corresponding to the textual representation of the media content; generating a phonetic representation of the entity, wherein the phonetic representation includes at least one of a grapheme of the entity, a phoneme of the entity, and an N-gram of the entity; generating, based on the phonetic representation, a fuzzy candidate list comprising a plurality of fuzzy candidates representing potential matches to the requested media content, wherein generating the fuzzy candidate list comprises: utilizing a lossy phonetic form of the phonetic representation to generate a predetermined number of candidates; and utilizing a precise phonetic form of the phonetic representation to generate the fuzzy candidate list by reducing the predetermined number of candidates; ranking the fuzzy candidate list to form a ranked fuzzy candidate list including a highest ranked fuzzy candidate corresponding to a best potential match for the requested media content; displaying, on the display device, the ranked fuzzy candidate list; receiving, from a remote control in communication with the display device, user input for a selected fuzzy candidate from the ranked fuzzy candidate list; and performing the action on the selected fuzzy candidate. 2. The computer-implemented method of claim 1 , wherein the highest ranked fuzzy candidate corresponding to the best potential match for the requested media content is determined based on a ranking criteria including at least one of a phonetic edit distance, a popularity score, a match count, a longest common sequence score, and a nospace overlap score. 3. The computer-implemented method of claim 1 , wherein the action includes at least one of receiving a selection of the highest ranked fuzzy candidate from the display device, retrieving the highest ranked fuzzy candidate from a database, and sending the ranked fuzzy list including the highest ranked fuzzy candidate to the display device. 4. The computer-implemented method of claim 1 , further comprising: receiving, from an entertainment domain entity source, a second media content; converting the second media content into a second phoneme, wherein the second phoneme is a phonetic representation of the second media content; and storing the second media content and the second phoneme as an entry in a domain entity index. 5. The computer-implemented method of claim 1 , further comprising: determining an intent of the voice query based on the action and the requested media content. 6. The computer-implemented method of claim 5 , wherein the intent of the voice query is a content request and the action is a command to play the requested media content. 7. The computer-implemented method of claim 1 , wherein the generating further comprises: performing a grapheme search based on the grapheme of the entity to identify at least one fuzzy grapheme candidate based on a spelling comparison between the grapheme of the entity and the at least one fuzzy grapheme candidate, wherein the plurality of fuzzy candidates comprises the at least one fuzzy grapheme candidate. 8. The computer-implemented method of claim 7 , wherein the spelling comparison comprises: using the grapheme of the entity to search for a grapheme candidate in a domain entity index; and identifying the grapheme candidate as the at least one fuzzy grapheme candidate based on matching a spelling of the grapheme to a spelling of the grapheme candidate. 9. The computer-implemented method of claim 8 , wherein the domain entity index comprises an entry associated with the grapheme candidate, the computer-implemented method further comprising: populating the entry with the spelling of the grapheme candidate independently of the automatic speech recognition engine; and retrieving, from the entry, the spelling of the grapheme candidate. 10. The computer-implemented method of claim 9 , wherein the domain entity index comprises a plurality of entries, including the entry, associated with a plurality of grapheme candidates and wherein the domain entity index is updated on a continuous basis. 11. The computer-implemented method of claim 7 , wherein the generating further comprises: performing a phoneme search based on the phoneme of the entity to identify at least one fuzzy phoneme match based on a phonetic comparison between the phoneme of the entity and the at least one fuzzy phoneme candidate, wherein the plurality of fuzzy candidates further comprises the at least one fuzzy phoneme candidate. 12. The computer-implemented method of claim 11 , wherein the phonetic comparison comprises: using the phoneme of the entity to search for a phoneme candidate in a domain entity index; and identifying the phoneme candidate as the at least one fuzzy phone candidate based on a phonetic matching between the phoneme of the entity and the phoneme candidate. 13. The computer-implemented method of claim 12 , wherein the domain entity index comprises an entry associated with the phoneme candidate, the computer-implemented method further comprising: populating the entry with the phoneme candidate independently of the automatic speech recognition engine; and retrieving, from the entry, the phoneme candidate. 14. The computer-implemented method of claim 11 , wherein the generating further comprises: performing an N-gram search based on the N-gram of the entity to identify at least one fuzzy N-gram match based on an N-gram comparison between the entity and the at least one fuzzy N-gram candidate, wherein the plurality of fuzzy candidates further comprises the at least one fuzzy N-gram match. 15. The computer-implemented method of claim 14 , wherein the ranking further comprises: ranking the at least one fuzzy grapheme match, the at least one fuzzy N-gram match, the at least one fuzzy phoneme match in the fuzzy candidate list to form the ranked candidate list. 16. The computer-implemented method of claim 14 , wherein the N-gram comparison comprises: using the N-gram of the entity to search for an N-gram candidate in a domain entity index; and identifying the N-gram candidate as the at least one fuzzy N-gram candidate based on matching the N-gram of the entity to an N-gram of the N-gram candidate. 17. The computer-implemented method of claim 16 , wherein the domain entity index comprises an entry associated with the phoneme candidate, the computer-implemented method further comprising: retrieving, from the entry, the N-gram of the N-gram candidate. 18. An apparatus comprising: a memory; and a processor communicatively coupled to the memory and configured to: receive a voice query including an action and requested media content; generate a transcription of the voice query, wherein the transcription is generated using an automatic speech recognition engine and wherein the transcription includes a textual representation of the requested media content and wherein the textual representation is an imperfect match to the requested media content; generate a phonetic representation of the textual representation of the requested media content; generate,
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Parsing · CPC title
Named entity recognition · CPC title
Parsing for meaning understanding · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.