What technology area does this patent fall under?

Primary CPC classification G10L15/187. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 02 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic domain-adapted automatic speech recognition system

US11862152B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11862152-B2
Application number	US-202117214462-A
Country	US
Kind code	B2
Filing date	Mar 26, 2021
Priority date	Mar 26, 2021
Publication date	Jan 2, 2024
Grant date	Jan 2, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, apparatus, article of manufacture, method, and computer program product embodiments for adapting an automated speech recognition system to provide more accurate suggestions to voice queries involving media content including recently created or recently available content. An example computer-implemented method includes transcribing the voice query, identifying respective components of the query such as the media content being requested and the action to be performed, and generating fuzzy candidates that potentially match the media content based on phonetic representations of the identified components. Phonetic representations of domain specific candidates are stored in a domain entities index and is continuously updated with new entries so as to maintain the accuracy of the speech recognition of voice queries for recently created or recently available content.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, by a display device, for adapting an automatic speech recognition engine, comprising: receiving a voice query that includes an action and requested media content; generating a transcription of the voice query, wherein the transcription is generated using the automatic speech recognition engine and wherein the transcription includes a textual representation of the requested media content; parsing the transcription to identify an entity corresponding to the textual representation of the media content; generating a phonetic representation of the entity, wherein the phonetic representation includes at least one of a grapheme of the entity, a phoneme of the entity, and an N-gram of the entity; generating, based on the phonetic representation, a fuzzy candidate list comprising a plurality of fuzzy candidates representing potential matches to the requested media content, wherein generating the fuzzy candidate list comprises: utilizing a lossy phonetic form of the phonetic representation to generate a predetermined number of candidates; and utilizing a precise phonetic form of the phonetic representation to generate the fuzzy candidate list by reducing the predetermined number of candidates; ranking the fuzzy candidate list to form a ranked fuzzy candidate list including a highest ranked fuzzy candidate corresponding to a best potential match for the requested media content; displaying, on the display device, the ranked fuzzy candidate list; receiving, from a remote control in communication with the display device, user input for a selected fuzzy candidate from the ranked fuzzy candidate list; and performing the action on the selected fuzzy candidate. 2. The computer-implemented method of claim 1 , wherein the highest ranked fuzzy candidate corresponding to the best potential match for the requested media content is determined based on a ranking criteria including at least one of a phonetic edit distance, a popularity score, a match count, a longest common sequence score, and a nospace overlap score. 3. The computer-implemented method of claim 1 , wherein the action includes at least one of receiving a selection of the highest ranked fuzzy candidate from the display device, retrieving the highest ranked fuzzy candidate from a database, and sending the ranked fuzzy list including the highest ranked fuzzy candidate to the display device. 4. The computer-implemented method of claim 1 , further comprising: receiving, from an entertainment domain entity source, a second media content; converting the second media content into a second phoneme, wherein the second phoneme is a phonetic representation of the second media content; and storing the second media content and the second phoneme as an entry in a domain entity index. 5. The computer-implemented method of claim 1 , further comprising: determining an intent of the voice query based on the action and the requested media content. 6. The computer-implemented method of claim 5 , wherein the intent of the voice query is a content request and the action is a command to play the requested media content. 7. The computer-implemented method of claim 1 , wherein the generating further comprises: performing a grapheme search based on the grapheme of the entity to identify at least one fuzzy grapheme candidate based on a spelling comparison between the grapheme of the entity and the at least one fuzzy grapheme candidate, wherein the plurality of fuzzy candidates comprises the at least one fuzzy grapheme candidate. 8. The computer-implemented method of claim 7 , wherein the spelling comparison comprises: using the grapheme of the entity to search for a grapheme candidate in a domain entity index; and identifying the grapheme candidate as the at least one fuzzy grapheme candidate based on matching a spelling of the grapheme to a spelling of the grapheme candidate. 9. The computer-implemented method of claim 8 , wherein the domain entity index comprises an entry associated with the grapheme candidate, the computer-implemented method further comprising: populating the entry with the spelling of the grapheme candidate independently of the automatic speech recognition engine; and retrieving, from the entry, the spelling of the grapheme candidate. 10. The computer-implemented method of claim 9 , wherein the domain entity index comprises a plurality of entries, including the entry, associated with a plurality of grapheme candidates and wherein the domain entity index is updated on a continuous basis. 11. The computer-implemented method of claim 7 , wherein the generating further comprises: performing a phoneme search based on the phoneme of the entity to identify at least one fuzzy phoneme match based on a phonetic comparison between the phoneme of the entity and the at least one fuzzy phoneme candidate, wherein the plurality of fuzzy candidates further comprises the at least one fuzzy phoneme candidate. 12. The computer-implemented method of claim 11 , wherein the phonetic comparison comprises: using the phoneme of the entity to search for a phoneme candidate in a domain entity index; and identifying the phoneme candidate as the at least one fuzzy phone candidate based on a phonetic matching between the phoneme of the entity and the phoneme candidate. 13. The computer-implemented method of claim 12 , wherein the domain entity index comprises an entry associated with the phoneme candidate, the computer-implemented method further comprising: populating the entry with the phoneme candidate independently of the automatic speech recognition engine; and retrieving, from the entry, the phoneme candidate. 14. The computer-implemented method of claim 11 , wherein the generating further comprises: performing an N-gram search based on the N-gram of the entity to identify at least one fuzzy N-gram match based on an N-gram comparison between the entity and the at least one fuzzy N-gram candidate, wherein the plurality of fuzzy candidates further comprises the at least one fuzzy N-gram match. 15. The computer-implemented method of claim 14 , wherein the ranking further comprises: ranking the at least one fuzzy grapheme match, the at least one fuzzy N-gram match, the at least one fuzzy phoneme match in the fuzzy candidate list to form the ranked candidate list. 16. The computer-implemented method of claim 14 , wherein the N-gram comparison comprises: using the N-gram of the entity to search for an N-gram candidate in a domain entity index; and identifying the N-gram candidate as the at least one fuzzy N-gram candidate based on matching the N-gram of the entity to an N-gram of the N-gram candidate. 17. The computer-implemented method of claim 16 , wherein the domain entity index comprises an entry associated with the phoneme candidate, the computer-implemented method further comprising: retrieving, from the entry, the N-gram of the N-gram candidate. 18. An apparatus comprising: a memory; and a processor communicatively coupled to the memory and configured to: receive a voice query including an action and requested media content; generate a transcription of the voice query, wherein the transcription is generated using an automatic speech recognition engine and wherein the transcription includes a textual representation of the requested media content and wherein the textual representation is an imperfect match to the requested media content; generate a phonetic representation of the textual representation of the requested media content; generate,

Assignees

Roku Inc

Inventors

Classifications

G10L15/187Primary
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
G06F40/205
Parsing · CPC title
G06F40/295
Named entity recognition · CPC title
G10L15/1822
Parsing for meaning understanding · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

View patent family 80952365

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11862152B2 cover?: Disclosed herein are system, apparatus, article of manufacture, method, and computer program product embodiments for adapting an automated speech recognition system to provide more accurate suggestions to voice queries involving media content including recently created or recently available content. An example computer-implemented method includes transcribing the voice query, identifying respec…
Who is the assignee on this patent?: Roku Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 02 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).