Systems and methods for performing ASR in the presence of heterographs

US9721564B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9721564-B2
Application numberUS-201414448308-A
CountryUS
Kind codeB2
Filing dateJul 31, 2014
Priority dateJul 31, 2014
Publication dateAug 1, 2017
Grant dateAug 1, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for performing ASR in the presence of heterographs are provided. Verbal input is received from the user that includes a plurality of utterances. A first of the plurality of utterances is matched to a first word. It is determined that a second utterance in the plurality of utterances matches a plurality of words that is in a same heterograph set. It is identified which one of the plurality of words is associated with a context of the first word. A function is performed based on the first word and the identified one of the plurality of words.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing automatic speech recognition (ASR) when a heterographic word is present, the method comprising: receiving verbal input from a user that comprises a plurality of utterances; matching a first of the plurality of utterances to a first word; determining a word that describes the context for the first word; determining that a second utterance in the plurality of utterances matches a plurality of words that are in a same heterograph set; combining a second word chosen from the plurality of words with the word that describes the context for the first word to generate a first combined set of words; storing a first value representing a distance between words in the first combined set of words; combining a third word chosen from the plurality of words with the word that describes the context for the first word to generate a second combined set of words; storing a second value representing a distance between words in the second combined set of words; in response to determining that the second value is smaller than the first value, performing a media guidance application function on an available media asset based on the second combined set of words. 2. The method of claim 1 further comprising: storing a knowledge graph of a relationship between words, wherein a distance between words in the knowledge graph is indicative of strength in relationship between the words; and calculating the first value and the second value based on the distance between the words in the first combined set of words and the distance between the words in the second combined set of words. 3. The method of claim 2 further comprising: identifying positions, in the knowledge graph, of the context of the first word and each of the plurality of words; and computing, based on the identified positions, a distance between the context of the first word and each of the plurality of words. 4. The method of claim 1 , wherein the first word is a name of a competitor in a sporting event, further comprising: setting the context to be the sporting event; and determining which of the plurality of words corresponds to the sporting event, wherein the third word corresponds to another competitor in the sporting event. 5. The method of claim 1 , wherein the plurality of words that are in the same heterograph set are phonetically similar to each other. 6. The method of claim 1 further comprising generating a recommendation based on the first word and the third word. 7. The method of claim 1 , wherein matching the first of the plurality of utterances to the first word comprises determining that the first utterance phonetically corresponds to the first word. 8. The method of claim 1 , wherein the first word is a name of an actor in a media asset, further comprising: setting the context to be the media asset; and determining which of the plurality of words corresponds to the media asset, wherein the third word corresponds to another actor in the media asset. 9. The method of claim 1 further comprising determining the context based on a conjunction between two of the plurality of utterances. 10. A system for performing automatic speech recognition (ASR) when a heterographic word is present, the system comprising: control circuitry configured to: receive verbal input from a user that comprises a plurality of utterances; match a first of the plurality of utterances to a first word; determine a word that describes the context for the first word; determine that a second utterance in the plurality of utterances matches a plurality of words that are in a same heterograph set; combine a second word chosen from the plurality of words with the word that describes the context for the first word to generate a first combined set of words; store a first value representing a distance between words in the first combined set of words; combine a third word chosen from the plurality of words with the word that describes the context for the first word to generate a second combined set of words; store a second value representing a distance between words in the second combined set of words; and in response to determining that the second value is smaller than the first value, perform a media guidance application function on an available media asset based on the second combined set of words. 11. The system of claim 10 , wherein the control circuitry is further configured to: store a knowledge graph of a relationship between words, wherein a distance between words in the knowledge graph is indicative of strength in relationship between the words; and calculate the first value and the second value based on a distance between the words in the first combined set of words and the words in the second combined set of words. 12. The system of claim 11 , wherein the control circuitry is further configured to: identify positions, in the knowledge graph, of the first word and each of the plurality of words; and compute, based on the identified positions, a distance between the first word and each of the plurality of words. 13. The system of claim 10 , wherein the first word is a name of a competitor in a sporting event, and wherein the control circuitry is further configured to: set the context to be the sporting event; determine which of the plurality of words corresponds to the sporting event, wherein the third word corresponds to another competitor in the sporting event. 14. The system of claim 10 , wherein the plurality of words that are in the same heterograph set are phonetically similar to each other. 15. The system of claim 10 , wherein the control circuitry is further configured to generate a recommendation based on the first word and the third word. 16. The system of claim 10 , wherein the control circuitry is further configured to match the first of the plurality of utterances to the first word by determining that the first utterance phonetically corresponds to the first word. 17. The system of claim 10 , wherein the first word is a name of an actor in a media asset, and wherein the control circuitry is further configured to: set the context to be the media asset; and determine which of the plurality of words corresponds to the media asset, wherein the third word corresponds to another actor in the media asset. 18. The system of claim 10 , wherein the control circuitry is further configured to determine the context based on a conjunction between two of the plurality of utterances.

Assignees

Inventors

Classifications

  • G10L15/187Primary

    Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Formal grammars, e.g. finite state automata, context free grammars or word networks · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9721564B2 cover?
Systems and methods for performing ASR in the presence of heterographs are provided. Verbal input is received from the user that includes a plurality of utterances. A first of the plurality of utterances is matched to a first word. It is determined that a second utterance in the plurality of utterances matches a plurality of words that is in a same heterograph set. It is identified which one of…
Who is the assignee on this patent?
Rovi Guides Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).