Speaker recognition using neural networks
US-2016293167-A1 · Oct 6, 2016 · US
US11620989B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620989-B2 |
| Application number | US-201916452959-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 26, 2019 |
| Priority date | Jan 27, 2015 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. One of the methods includes generating, by a speech recognition system, a matrix from a predetermined quantity of vectors that each represent input for a layer of a neural network, generating a plurality of sub-matrices from the matrix, using, for each of the sub-matrices, the respective sub-matrix as input to a node in the layer of the neural network to determine whether an utterance encoded in an audio signal comprises a keyword for which the neural network is trained.
Opening claim text (preview).
What is claimed is: 1. A method performed by one or more computing devices, the method comprising: obtaining, by the one or more computing devices, a set of input values indicating acoustic characteristics of an utterance; receiving, by the one or more computing devices, the set of input values as input to a first layer of a neural network, the first layer of the neural network having nodes, each node of the first layer comprising a corresponding set of weights that is different than the corresponding set of weights of each other node of the first layer, and each node of the first layer is configured to receive, as input, a different respective subset of the set of input values, wherein the different respective subsets are non-overlapping; for each respective node of the first layer, generating, by the one or more computing devices, as output, a corresponding initial output value by applying the corresponding set of weights of the respective node to the respective subset of the set of input values; receiving, by the one or more computing devices, each of the initial output values as input to a second layer of the neural network, the second layer of the neural network having nodes, each node of the second layer is configured to receive, as input, a subset of the initial output values and generate, as output, a corresponding final output value; and determining, by the one or more computing devices, whether the utterance includes a keyword based on each of the final output values. 2. The method of claim 1 , wherein generating the corresponding initial output value comprises, for each respective node of the first layer, applying a different function to the respective subset of the set of input values. 3. The method of claim 1 , wherein one or more of the nodes of the first layer are configured to each receive a respective subset of the set of input values that are localized. 4. The method of claim 1 , wherein one or more of the nodes of the first layer are configured to each receive a respective subset of the set of input values that are localized in frequency. 5. The method of claim 1 , wherein determining whether the utterance includes the keyword based on each of the final output values comprises determining whether the utterance includes the keyword from among a set of predetermined keywords that are each designated as a signal that a mobile device should activate. 6. The method of claim 1 , wherein determining whether the utterance includes the keyword based on each of the final output values comprises determining whether the utterance contains the keyword spoken by a particular user. 7. The method of claim 1 , wherein the neural network is trained to determine whether the utterance includes the keyword. 8. The method of claim 1 , wherein each one of the final output values comprises a posterior probability score. 9. The method of claim 1 , wherein the set of input values comprises audio features derived from audio data of the utterance. 10. The method of claim 1 , wherein the first layer of the neural network comprises a first hidden layer of the neural network. 11. The method of claim 1 , wherein each node of the second layer corresponds to at least one node of the first layer. 12. A device comprising: one or more hardware processors and one or more data storage devices, the one or more hardware processors and the one or more data storage devices being configured to implement a keyword detection function by causing the device to perform operations comprising: obtaining a set of input values indicating acoustic characteristics of an utterance; receiving the set of input values as input to a first layer of a neural network, the first layer of the neural network having nodes, each node of the first layer comprising a corresponding set of weights that is different than the corresponding set of weights of each other node of the first layer, and each node of the first layer is configured to receive, as input, a different respective subset of the set of input values, wherein the different respective subsets are non-overlapping; for each respective node of the first layer, generating, as output, a corresponding initial output value by applying the corresponding set of weights of the respective node to the respective subset of the set of input values; receiving each of the initial output values as input to a second layer of the neural network, the second layer of the neural network having nodes, each node of the second layer is configured to receive, as input, a subset of the initial output values and generate, as output, a corresponding final output value; and determining whether the utterance includes a keyword based on each of the final output values. 13. The device of claim 12 , wherein generating the corresponding initial output value comprises, for each respective node of the first layer, applying a different function to the respective subset of the set of input values. 14. One or more non-transitory data storage devices storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations comprising: obtaining, by the one or more processing devices, a set of input values indicating acoustic characteristics of an utterance; receiving, by the one or more processing devices, the set of input values as input to a first layer of a neural network, the first layer of the neural network having nodes, each node of the first layer comprising a corresponding set of weights that is different than the corresponding set of weights of each other node of the first layer, and each node of the first layer is configured to receive, as input, a different respective subset of the set of input values, wherein the different respective subsets are non-overlapping; for each respective node of the first layer, generating, by the one or more processing devices, as output, a corresponding initial output value by applying the corresponding set of weights of the respective node to the respective subset of the set of input values; receiving, by the one or more processing devices, each of the initial output values as input to a second layer of the neural network, the second layer of the neural network having nodes, each node of the second layer is configured to receive, as input, a subset of the initial output values and generate, as output, a corresponding final output value; and determining, by the one or more processing devices, whether the utterance includes a keyword based on each of the final output values.
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Artificial neural networks; Connectionist approaches · CPC title
Word spotting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.