Cluster specific speech model

US9401143B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9401143-B2
Application numberUS-201514663610-A
CountryUS
Kind codeB2
Filing dateMar 20, 2015
Priority dateMar 24, 2014
Publication dateJul 26, 2016
Grant dateJul 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, where each cluster includes a plurality of vectors, and where each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, wherein each cluster includes a plurality of vectors, and wherein each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances. 2. The method of claim 1 , wherein the plurality of clusters are segmented based on vector distances to centroids of the clusters, and wherein selecting a cluster for the data comprises: determining a vector based on the data; determining that a vector distance between the vector and the cluster is a shortest distance compared to vector distances between the vector and other clusters of the plurality of clusters; and based on determining that the vector distance between the vector and the cluster is the shortest distance, selecting the cluster for the vector. 3. The method of claim 1 , wherein selecting a cluster for the data further comprises: receiving data indicative of latent variables of multivariate factor analysis of an audio signal of the user; and selecting an updated cluster using the latent variables. 4. The method of claim 1 , comprising: receiving a feature vector that models audio characteristics of a portion of an utterance of the user; and determining, using the feature vector as an input, a candidate transcription for the utterance based on an output of the neural network of the speech model. 5. The method of claim 1 , wherein providing the speech model for transcribing the one or more utterances comprises providing the speech model to a computing device of the user. 6. The method of claim 1 , wherein the acoustic characteristics of the user includes a gender of the user, an accent of the user, a pitch of an utterance of the user, background noises around the user, or age group of the user. 7. The method of claim 1 , wherein the data is an i-vector, and wherein the neural network is trained using the i-vectors in the cluster and one or more i-vectors in one or more neighboring clusters. 8. The method of claim 1 , wherein each cluster includes a distinct plurality of vectors, and wherein each cluster is associated with a distinct speech model. 9. A non-transitory computer-readable medium storing software having stored thereon instructions, which, when executed by one or more computers, cause the one or more computers to perform operations of: receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, wherein each cluster includes a plurality of vectors, and wherein each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances. 10. The non-transitory computer-readable medium of claim 9 , wherein the plurality of clusters are segmented based on vector distances to centroids of the clusters, and wherein selecting a cluster for the data comprises: determining a vector based on the data; determining that a vector distance between the vector and the cluster is a shortest distance compared to vector distances between the vector and other clusters of the plurality of clusters; and based on determining that the vector distance between the vector and the cluster is the shortest distance, selecting the cluster for the vector. 11. The non-transitory computer-readable medium of claim 9 , wherein selecting a cluster for the data further comprises: receiving data indicative of latent variables of multivariate factor analysis of an audio signal of the user; and selecting an updated cluster using the latent variables. 12. The non-transitory computer-readable medium of claim 9 , wherein the operations comprise: receiving a feature vector that models audio characteristics of a portion of an utterance of the user; and determining, using the feature vector as an input, a candidate transcription for the utterance based on an output of the neural network of the speech model. 13. The non-transitory computer-readable medium of claim 9 , wherein providing the speech model for transcribing the one or more utterances comprises providing the speech model to a computing device of the user. 14. The non-transitory computer-readable medium of claim 9 , wherein the data is an i-vector, and wherein the neural network is trained using the i-vectors in the cluster and one or more i-vectors in one or more neighboring clusters. 15. A system comprising: one or more processors and one or more computer storage media storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising: receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, wherein each cluster includes a plurality of vectors, and wherein each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances. 16. The system of claim 15 , wherein the plurality of clusters are segmented based on vector distances to centroids of the clusters, and wherein selecting a cluster for the data comprises: determining a vector based on the data; determining that a vector distance between the vector and the cluster is a shortest distance compared to vector distances between the vector and other clusters of the plurality of clusters; and based on determining that the vector distance between the vector and the cluster is the shortest distance, selecting the cluster for the vector. 17. The system of claim 15 , wherein selecting a cluster for the data further comprises: receiving data indicative of latent variables of multivariate factor analysis of an audio signal of the user; and selecting an updated cluster using the latent variables. 18. The system of claim 15 , wherein the operations comprise: receiving a feature vector that models audio characteristics of a portion of an utterance of the user; and determining, using the feature vector as an input, a candidate transcription for the utterance based on an output of the neural network of the speech model. 19. The system of claim 15 , wherein providing the speech model for transcribing the one or more utterances comprises providing the speech model to a computing device of the user. 20. The system of claim 15 , wherein the data is an i-vector, and wherein the neural network is trained using the i-vectors in the cluster and one or more i-vectors in one or more neighboring clusters.

Assignees

Inventors

Classifications

  • using context dependencies, e.g. language models · CPC title

  • G10L15/063Primary

    Training · CPC title

  • Creating reference templates; Clustering · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9401143B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, where each cluster includes a plurality of vectors, and where each cluster is associated with a speech model trained by a neural network using at least o…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).