Dynamic domain-adapted automatic speech recognition system
US-11862152-B2 · Jan 2, 2024 · US
US12374328B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12374328-B2 |
| Application number | US-202318511077-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 16, 2023 |
| Priority date | Mar 26, 2021 |
| Publication date | Jul 29, 2025 |
| Grant date | Jul 29, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are system, apparatus, article of manufacture, method, and computer program product embodiments for adapting an automated speech recognition system to provide more accurate suggestions to voice queries involving media content including recently created or recently available content. An example computer-implemented method includes transcribing the voice query, identifying respective components of the query such as the media content being requested and the action to be performed, and generating fuzzy candidates that potentially match the media content based on phonetic representations of the identified components. Phonetic representations of domain specific candidates are stored in a domain entities index and is continuously updated with new entries so as to maintain the accuracy of the speech recognition of voice queries for recently created or recently available content.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for adapting an automatic speech recognition engine implemented within a multimedia environment, comprising: receiving, at a user device, a voice query that includes identification of requested media content and an action to be performed on the requested media content, wherein the identification of the requested media content comprises a first entity representing a title of the requested media content and a second entity representing at least one metadata associated with the requested media content; generating a transcription of the voice query, wherein the transcription is generated using the automatic speech recognition engine, wherein the transcription includes an imperfect textual representation of the requested media content; parsing the transcription to identify the first entity and the second entity; generating a phonetic representation of the requested media content based on the first entity and the second entity; generating, based on the phonetic representation, a fuzzy candidate list comprising a plurality of fuzzy candidates representing potential matches to the requested media content, wherein each fuzzy candidate of the plurality of fuzzy candidates is associated with a popularity score; ranking the fuzzy candidate list to form a ranked fuzzy candidate list including a highest ranked fuzzy candidate corresponding to a best potential match for the requested media content, wherein the highest ranked fuzzy candidate corresponding to the best potential match for the requested media content is determined based on a comparison of the popularity score of each fuzzy candidate of the plurality of fuzzy candidates; and performing the action associated with the highest ranked fuzzy candidate. 2. The computer-implemented method of claim 1 , wherein the user device is one of a remote control, a media device, or a display device. 3. The computer-implemented method of claim 1 , wherein a first fuzzy candidate of the plurality of fuzzy candidates is associated with a first popularity score and a second fuzzy candidate of the plurality of fuzzy candidates is associated with a second popularity score. 4. The computer-implemented method of claim 3 , wherein each fuzzy candidate is associated with a match count including the first fuzzy candidate being associated with a first match count and the second fuzzy candidate being associated with a second match count, wherein the match count comprises a numerical value indicating a number of matching strategies that indicate each fuzzy candidate as being a quality match to the requested media content. 5. The computer-implemented method of claim 4 , wherein the matching strategies comprises at least two of grapheme spelling, grapheme n-gram, and phoneme. 6. The computer-implemented method of claim 3 , further comprising: determining the first popularity score of the first fuzzy candidate based on a first frequency of performed actions associated with the first fuzzy candidate within the multimedia environment; and determining the second popularity score of the second fuzzy candidate based on a second frequency of performed actions associated with the second fuzzy candidate with the multimedia environment. 7. The computer-implemented method of claim 6 , wherein the performed actions are based on metrics associated with the first fuzzy candidate and the second fuzzy candidate collected within the multimedia environment. 8. The computer-implemented method of claim 6 , wherein the first frequency of performed actions comprises at least one of a number of times the first fuzzy candidate was streamed within the multimedia environment or a number of times the first fuzzy candidate was requested within the multimedia environment. 9. An apparatus implemented within a multimedia environment comprising: a memory; and at least one processor communicatively coupled to the memory and configured to: receive a voice query that includes identification of requested media content and an action to be performed on the requested media content, wherein the identification of the requested media content comprises a first entity representing a title of the requested media content and a second entity representing at least one metadata associated with the requested media content; generate a transcription of the voice query, wherein the transcription is generated using an automatic speech recognition engine, wherein the transcription includes an imperfect textual representation of the requested media content; parse the transcription to identify the first entity and the second entity; generate a phonetic representation of the requested media content based on the first entity and the second entity; generate, based on the phonetic representation, a fuzzy candidate list comprising a plurality of fuzzy candidates representing potential matches to the requested media content, wherein each fuzzy candidate is associated with a popularity score; rank the fuzzy candidate list to form a ranked fuzzy candidate list including a highest ranked fuzzy candidate corresponding to a best potential match for the requested media content, wherein the highest ranked fuzzy candidate corresponding to the best potential match for the requested media content is determined based on a comparison of the popularity score of each fuzzy candidate of the plurality of fuzzy candidates; and perform the action associated with the highest ranked fuzzy candidate. 10. The apparatus of claim 9 , wherein the apparatus is implemented as one of a remote control, a media device, or a display device. 11. The apparatus of claim 9 , wherein a first fuzzy candidate of the plurality of fuzzy candidates is associated with a first popularity score and a second fuzzy candidate of the plurality of fuzzy candidates is associated with a second popularity score. 12. The apparatus of claim 11 , wherein each fuzzy candidate is associated with a match count including the first fuzzy candidate being associated with a first match count and the second fuzzy candidate being associated with a second match count, wherein the match count comprises a numerical value indicating a number of matching strategies that indicate each fuzzy candidate as being a quality match to the requested media content. 13. The apparatus of claim 12 , wherein the matching strategies comprises at least two of grapheme spelling, grapheme n-gram, and phoneme. 14. The apparatus of claim 11 , wherein the at least one processor is further configured to: determine the first popularity score of the first fuzzy candidate based on a first frequency of performed actions associated with the first fuzzy candidate within the multimedia environment; and determine the second popularity score of the second fuzzy candidate based on a second frequency of performed actions associated with the second fuzzy candidate with the multimedia environment. 15. The apparatus of claim 14 , wherein the first frequency of performed actions comprises at least one of a number of times the first fuzzy candidate was streamed within the multimedia environment or a number of times the first fuzzy candidate was requested within the multimedia environment. 16. A non-transitory computer-readable medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform operations comprising: receiving, at a user device, a voice query that includes identification of requested media content and an action to be performed on the requested media content, wherein the identification of the reques
using fuzzy logic · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Parsing for meaning understanding · CPC title
Named entity recognition · CPC title
Parsing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.