System and method of improving speech recognition using context

US9626963B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9626963-B2
Application numberUS-201313874304-A
CountryUS
Kind codeB2
Filing dateApr 30, 2013
Priority dateApr 30, 2013
Publication dateApr 18, 2017
Grant dateApr 18, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method are provided for improving speech recognition accuracy. Contextual information about user speech may be received, and then speech recognition analysis can be performed on the user speech using the contextual information. This allows the system and method to improve accuracy when performing tasks like searching and navigating using speech recognition.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system, comprising: a processor; a single microphone configured to both record user speech and to record ambient sounds; and a speech recognition module configured to: identify that the ambient sounds are of a particular type by comparing the ambient sounds to stored waveforms; select a dictionary based on the identified particular type of ambient sounds; identify, as contextual information, terms related to the identified particular type of ambient sounds based on identification of the identified particular type of ambient sounds, the terms being generated as contextual information; alter, in response to identification of the terms related to the identified particular type of ambient sounds, the dictionary such that the dictionary includes the terms related to the identified particular type of ambient sounds; assign, in the dictionary, score values to the terms related to the identified particular type of ambient sounds based on identifying that the terms are related to the identified particular type of ambient sounds; and analyze the user speech by comparing each potential output word or phoneme in the user speech to waveforms stored for the dictionary to attempt to match the potential output word or phoneme to a waveform corresponding to a particular word or phoneme in the dictionary, an analysis varying based on the assigned scores to the terms identified as contextual information. 2. The system of claim 1 , wherein the ambient sounds include music and the identification that the ambient sounds are of the particular type includes identifying that the ambient sounds are music and identifying the music, wherein the speech recognition module is further configured to retrieve identify, as the contextual information, terms related to the identified music. 3. The system of claim 1 , further comprising a sensor, and wherein the contextual information includes information identified from sensor information detected by the sensor. 4. The system of claim 3 , wherein the sensor is a global positioning system module and the contextual information includes location. 5. The system of claim 3 , wherein the sensor is a global positioning system module and the contextual information includes speed. 6. A method comprising: recording sounds using a single microphone; identifying, using one or more processors, potential output words and phonemes as well as ambient sounds in the sounds recorded by the single microphone; identifying that the ambient sounds are of a particular type by comparing the ambient sounds to stored waveforms; selecting a dictionary based on the identified particular type of ambient sounds; identifying, as contextual information, terms related to the identified particular type of ambient sounds based on identification of the identified particular type of ambient sounds, the terms being generated as contextual information; assigning, in the dictionary, score values to the terms related to the identified particular type of ambient sounds based on identifying that the terms are related to the identified particular type of ambient sounds; and analyzing user speech by comparing each potential output word or phoneme in the user speech to waveforms stored for the dictionary to attempt to match the potential output word or phoneme to a waveform corresponding to a particular word or phoneme in the dictionary, the analyzing varying based on the assigned scores to the terms identified as contextual information. 7. The method of claim 6 , wherein the contextual information includes user location. 8. The method of claim 6 , wherein the contextual information includes speed of movement of a user. 9. The method of claim 6 , wherein the ambient sounds include music and the identification that the ambient sounds are of the particular type includes identifying that the ambient sounds are music and identifying the music, wherein the speech recognition module is further configured to identify, as the contextual information, terms related to the identified music. 10. The method of claim 6 , further comprising altering the dictionary based on the contextual information such that the dictionary includes the terms related to the identified particular type of ambient sounds. 11. The method of claim 10 , wherein the dictionary is altered by replacing the dictionary with a different dictionary. 12. The method of claim 10 , wherein the dictionary is altered by adding words pertaining to the contextual information to the dictionary. 13. A non-transitory machine-readable storage medium comprising a set of instructions which, when executed by a processor, causes execution of operations comprising: recording sounds using a single microphone; identifying potential output words and phonemes as well as ambient sounds in the sounds recorded by the single microphone; identifying that the ambient sounds are of a particular type by comparing the ambient sounds to stored waveforms; selecting a dictionary based on the identified particular type of ambient sounds; identifying, as contextual information, terms related to the identified particular type of ambient sounds based on identification of the identified particular type of ambient sounds, the terms being generated as contextual information; assigning, in the dictionary, score values to the terms related to the identified particular type of ambient sounds based on identifying that the terms are related to the identified particular type of ambient sounds; and analyzing the user speech by comparing each potential output word or phoneme in the user speech to waveforms stored for the dictionary to attempt to match the potential output word or phoneme to a waveform corresponding to a particular word or phoneme in the dictionary, the analyzing varying based on the assigned scores to the terms identified as contextual information. 14. The non-transitory machine-readable storage medium of claim 13 , wherein the speech recognition analysis includes utilizing a hidden Markov model. 15. The non-transitory machine-readable storage medium of claim 13 , wherein the ambient sounds include music and the identification that the ambient sounds are of the particular type includes identifying that the ambient sounds are music and identifying the music, wherein the speech recognition module is further configured to identify, as the contextual information, terms related to the identified music.

Assignees

Inventors

Classifications

  • Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice (G10L15/14 takes precedence) · CPC title

  • G10L15/063Primary

    Training · CPC title

  • G10L15/30Primary

    Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • using non-speech characteristics · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9626963B2 cover?
A system and method are provided for improving speech recognition accuracy. Contextual information about user speech may be received, and then speech recognition analysis can be performed on the user speech using the contextual information. This allows the system and method to improve accuracy when performing tasks like searching and navigating using speech recognition.
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).