Theme detection for object-recognition-based notifications
US-12183330-B2 · Dec 31, 2024 · US
US2020111483A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020111483-A1 |
| Application number | US-201916710005-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 11, 2019 |
| Priority date | Dec 21, 2016 |
| Publication date | Apr 9, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is generated, as output of a first recurrent neural network in response to receiving the first vector sequence as input, where the first recurrent neural network has a transition matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary. An output vector sequence of a second recurrent neural network is generated. A transcription for the utterance is generated based on the output vector sequence generated by the second recurrent neural network. The transcription for the utterance is provided.
Opening claim text (preview).
What is claimed is: 1 . A method performed by one or more computers, wherein the method comprises: receiving, by the one or more computers, audio data indicating acoustic characteristics of an utterance; generating, by the one or more computers, an output vector by processing information from the audio data using: (1) one or more first neural network layers, (2) a matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary, and (3) one or more second neural network layers; determining, by the one or more computers, a transcription based on the output vector; and providing, by the one or more computers, the transcription for the utterance. 2 . The method of claim 1 , wherein the one or more second linear operators that are non-unitary are diagonal matrix multiplication operators, and wherein all other linear operators in the cascade are unitary operators. 3 . The method of claim 1 , wherein one or more second linear operators that are non-unitary introduce decay of retained data in memory of the first recurrent neural network. 4 . The method of claim 1 , wherein the cascade of linear operators includes at least one of each of the operators in a set comprising a Fourier transformation, an inverse Fourier transformation, a diagonal matrix multiplication, a column permutation, and a Householder reflection. 5 . The method of claim 1 , wherein the cascade of linear operators comprises a first diagonal matrix multiplication, a Fourier transform, a first Householder reflection, a column permutation, a second diagonal matrix multiplication, an inverse Fourier transformation, a second Householder reflection, and a third diagonal matrix multiplication. 6 . The method of claim 1 , wherein the cascade of linear operators comprises a sequence of operators that includes, in the following order, a first diagonal matrix multiplication, a Fourier transform, a first Householder reflection, a column permutation, a second diagonal matrix multiplication, an inverse Fourier transformation, a second Householder reflection, and a third diagonal matrix multiplication. 7 . The method of claim 1 , wherein the cascade of linear operators is limited to operators selected from a set consisting of Fourier transformations, inverse Fourier transformations, diagonal matrix multiplications, column permutations, and Householder reflections; and wherein the one or more second linear operators that are non-unitary are limited to diagonal matrix multiplications. 8 . The method of claim 1 , wherein the audio data comprises audio data for the utterance acquired using two or more microphones, wherein the first vector comprises audio features determined using audio data from the two or more microphones, and wherein the first recurrent neural network is configured to perform beamforming processing. 9 . The method of claim 1 , wherein the first recurrent neural network is configured to perform de-reverberation processing. 10 . The method of claim 1 , wherein the first recurrent neural network is configured to perform noise reduction processing. 11 . The method of claim 1 , wherein receiving the audio data comprises receiving the audio data from a client device over a network; and wherein providing the transcription comprises providing the transcription to a client device over a network. 12 . The method of claim 1 , wherein providing the transcription comprises providing the transcription to a computer system implementing a digital conversational assistant. 13 . The method of claim 1 , further comprising: determining an action specified by the transcription; and performing the determined action by the one or more computers or instructing a client device or server system to perform the determined action. 14 . A system comprising: one or more computers; and one or more computer-readable media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving, by the one or more computers, audio data indicating acoustic characteristics of an utterance; generating, by the one or more computers, an output vector by processing information from the audio data using: (1) one or more first neural network layers, (2) a matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary, and (3) one or more second neural network layers; determining, by the one or more computers, a transcription based on the output vector; and providing, by the one or more computers, the transcription for the utterance. 15 . The system of claim 14 , wherein the one or more second linear operators that are non-unitary are diagonal matrix multiplication operators, and wherein all other linear operators in the cascade are unitary operators. 16 . The system of claim 14 , wherein one or more second linear operators that are non-unitary introduce decay of retained data in memory of the first recurrent neural network. 17 . The system of claim 14 , wherein the cascade of linear operators includes at least one of each of the operators in a set comprising a Fourier transformation, an inverse Fourier transformation, a diagonal matrix multiplication, a column permutation, and a Householder reflection. 18 . The system of claim 14 , wherein the cascade of linear operators comprises a sequence of operators comprising a first diagonal matrix multiplication, a Fourier transform, a first Householder reflection, a column permutation, a second diagonal matrix multiplication, an inverse Fourier transformation, a second Householder reflection, and a third diagonal matrix multiplication. 19 . One or more non-transitory computer-readable media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving, by the one or more computers, audio data indicating acoustic characteristics of an utterance; generating, by the one or more computers, an output vector by processing information from the audio data using: (1) one or more first neural network layers, (2) a matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary, and (3) one or more second neural network layers; determining, by the one or more computers, a transcription based on the output vector; and providing, by the one or more computers, the transcription for the utterance. 20 . The one or more non-transitory computer-readable media of claim 18 , wherein the one or more second linear operators that are non-unitary are diagonal matrix multiplication operators, and wherein all other linear operators in the cascade are unitary operators.
Details of electrophonic musical instruments · CPC title
Neural networks · CPC title
Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation · CPC title
Artificial neural networks; Connectionist approaches · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.