Speech recognition using associative mapping

US9299347B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9299347-B1
Application numberUS-201514685790-A
CountryUS
Kind codeB1
Filing dateApr 14, 2015
Priority dateOct 22, 2014
Publication dateMar 29, 2016
Grant dateMar 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by data processing apparatus, the method comprising: receiving, by a server system that provides an automated speech recognition service over a computer network, audio data for an utterance detected by a client device in communication with the server system over the computer network; accessing, by the server system, association data that indicates a plurality of associations, each association indicating (i) uncorrupted audio data indicating characteristics of an uncorrupted audio segment, and (ii) a corresponding key based on a corrupted version of the same uncorrupted audio segment, the associations being determined before receiving the audio data for the utterance; selecting, by the server system, uncorrupted audio data based on a comparison of (i) one or more keys based on the audio data for the utterance, with (ii) the keys based on the corrupted audio data; constructing, by the server system, a representation of the utterance comprising the selected uncorrupted audio data; and performing, by the server system, speech recognition on the constructed representation of the utterance to determine a transcription of the utterance. 2. The method of claim 1 , wherein selecting the uncorrupted audio data comprises: determining, based on the comparison, that a key corresponding to a particular portion of audio data for the utterance matches a particular key; and based on the determination, selecting the uncorrupted audio data that is associated with the particular key in the association data; wherein constructing the representation of the utterance comprises constructing the representation of the utterance to include the uncorrupted audio data that is associated with the particular key in place of the particular portion of the audio data for the utterance. 3. The method of claim 1 , wherein each key based on corrupted audio data is a corrupted feature vector indicating characteristics of a corrupted audio segment, and where selecting the uncorrupted audio data comprises: obtaining first feature vectors of the received audio data for the utterance; comparing the first feature vectors to the corrupted feature vectors; identifying, based on the comparison of the first feature vectors to the corrupted feature vectors, a particular corrupted feature vector for a particular first feature vector; and based on identifying the particular corrupted feature vector, selecting an uncorrupted feature vector that corresponds to the particular corrupted feature vector to include in the constructed representation of the utterance. 4. The method of claim 1 , wherein each of the one or more keys based on the audio data for the utterance comprises a hash value for a feature vector that indicates characteristics of the audio data for the utterance; wherein each of the keys based on the corrupted audio data comprises a hash value for a feature vector that indicates characteristics of the corresponding corrupted audio data. 5. The method of claim 1 , wherein each corrupted version of an uncorrupted audio segment is a version of an uncorrupted audio segment that has been modified to add noise, reverberation, echo, or distortion after the uncorrupted audio segment has been recorded. 6. The method of claim 1 , wherein each corrupted version of an uncorrupted audio segment is a version of an uncorrupted audio segment that has been modified, after the uncorrupted audio segment has been recorded, to include audio characteristics representative of one or more candidate environments. 7. The method of claim 1 , comprising: before receiving the audio data for the utterance: accessing data that includes uncorrupted audio segments; adding noise to the uncorrupted audio segments to generate, for each uncorrupted audio segment, one or more corrupted versions of the uncorrupted audio segment, wherein each of the one or more corrupted versions of the uncorrupted audio segment has different noise added to the uncorrupted audio segment; generating, for each of the corrupted versions of the uncorrupted audio segments, association data that indicates an association of (i) uncorrupted audio data indicating characteristics of a particular uncorrupted audio segment, and (ii) a corresponding key based on a corrupted version of the particular uncorrupted audio segment; and storing the association data. 8. The method of claim 1 , wherein the uncorrupted audio segments are segments of speech spoken by one or more users. 9. The method of claim 1 , wherein the representation of the utterance comprising the selected uncorrupted feature vectors has less noise than the first feature vectors indicating characteristics of the utterance. 10. The method of claim 1 , wherein generating the constructed representation comprises generating, as the constructed representation, a series of feature vectors that represents the utterance. 11. The method of claim 1 , wherein the utterance is an utterance of a first user, wherein the selected uncorrupted audio data comprises audio segments of utterances of speakers different from the first user; wherein generating the constructed representation comprises generating, as the constructed representation, a series of audio segments that includes the audio segments of utterances of speakers different from the first user. 12. The method of claim 1 , wherein the uncorrupted audio data for an uncorrupted audio segment comprises an uncorrupted feature vector that indicates characteristics of the uncorrupted audio segment; wherein the corrupted audio data for a corrupted audio segment comprises a corrupted feature vector indicating characteristics of the corrupted audio segment; wherein each key based on corrupted audio data comprises (i) a corrupted feature vector or (ii) an index value based on the corrupted feature vector, and wherein each of the one or more keys based on the audio data for the utterance comprises (i) a feature vector indicating characteristics of the audio data for the utterance or (ii) an index value based on the feature vector indicating characteristics of the audio data for the utterance. 13. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, over a computer network, audio data for an utterance detected by a client device in communication with the one or more computers over the computer network; accessing association data that indicates a plurality of associations, each association indicating (i) uncorrupted audio data indicating characteristics of an uncorrupted audio segment, and (ii) a corresponding key based on a corrupted version of the same uncorrupted audio segment, the associations being determined before receiving the audio data for the utterance; selecting uncorrupted audio data based on a comparison of (i) one or more keys based on the audio data for the utterance, with (ii) the keys based on the corrupted audio data; constructing a representation of the utterance comprising the selected uncorrupted audio data; and performing speech recognition on the constructed representation of the utterance to determine a transcription of the utterance. 14. The system of claim 13 , wherein selecting the uncorrupted audio data comprises: determining, based on the comparison, that a key corresponding to a particular portion of audio data for the utterance matches a particular key; and based on the determination, selecting the uncorrupted audio data that is associated with the particular k

Assignees

Inventors

Classifications

  • G10L15/02Primary

    Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Speech classification or search · CPC title

  • G10L15/10Primary

    using distance or distortion measures between unknown speech and reference templates · CPC title

  • characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9299347B1 cover?
Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association dat…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).