Complex evolution recurrent neural networks

US2020111483A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020111483-A1
Application numberUS-201916710005-A
CountryUS
Kind codeA1
Filing dateDec 11, 2019
Priority dateDec 21, 2016
Publication dateApr 9, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is generated, as output of a first recurrent neural network in response to receiving the first vector sequence as input, where the first recurrent neural network has a transition matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary. An output vector sequence of a second recurrent neural network is generated. A transcription for the utterance is generated based on the output vector sequence generated by the second recurrent neural network. The transcription for the utterance is provided.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method performed by one or more computers, wherein the method comprises: receiving, by the one or more computers, audio data indicating acoustic characteristics of an utterance; generating, by the one or more computers, an output vector by processing information from the audio data using: (1) one or more first neural network layers, (2) a matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary, and (3) one or more second neural network layers; determining, by the one or more computers, a transcription based on the output vector; and providing, by the one or more computers, the transcription for the utterance. 2 . The method of claim 1 , wherein the one or more second linear operators that are non-unitary are diagonal matrix multiplication operators, and wherein all other linear operators in the cascade are unitary operators. 3 . The method of claim 1 , wherein one or more second linear operators that are non-unitary introduce decay of retained data in memory of the first recurrent neural network. 4 . The method of claim 1 , wherein the cascade of linear operators includes at least one of each of the operators in a set comprising a Fourier transformation, an inverse Fourier transformation, a diagonal matrix multiplication, a column permutation, and a Householder reflection. 5 . The method of claim 1 , wherein the cascade of linear operators comprises a first diagonal matrix multiplication, a Fourier transform, a first Householder reflection, a column permutation, a second diagonal matrix multiplication, an inverse Fourier transformation, a second Householder reflection, and a third diagonal matrix multiplication. 6 . The method of claim 1 , wherein the cascade of linear operators comprises a sequence of operators that includes, in the following order, a first diagonal matrix multiplication, a Fourier transform, a first Householder reflection, a column permutation, a second diagonal matrix multiplication, an inverse Fourier transformation, a second Householder reflection, and a third diagonal matrix multiplication. 7 . The method of claim 1 , wherein the cascade of linear operators is limited to operators selected from a set consisting of Fourier transformations, inverse Fourier transformations, diagonal matrix multiplications, column permutations, and Householder reflections; and wherein the one or more second linear operators that are non-unitary are limited to diagonal matrix multiplications. 8 . The method of claim 1 , wherein the audio data comprises audio data for the utterance acquired using two or more microphones, wherein the first vector comprises audio features determined using audio data from the two or more microphones, and wherein the first recurrent neural network is configured to perform beamforming processing. 9 . The method of claim 1 , wherein the first recurrent neural network is configured to perform de-reverberation processing. 10 . The method of claim 1 , wherein the first recurrent neural network is configured to perform noise reduction processing. 11 . The method of claim 1 , wherein receiving the audio data comprises receiving the audio data from a client device over a network; and wherein providing the transcription comprises providing the transcription to a client device over a network. 12 . The method of claim 1 , wherein providing the transcription comprises providing the transcription to a computer system implementing a digital conversational assistant. 13 . The method of claim 1 , further comprising: determining an action specified by the transcription; and performing the determined action by the one or more computers or instructing a client device or server system to perform the determined action. 14 . A system comprising: one or more computers; and one or more computer-readable media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving, by the one or more computers, audio data indicating acoustic characteristics of an utterance; generating, by the one or more computers, an output vector by processing information from the audio data using: (1) one or more first neural network layers, (2) a matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary, and (3) one or more second neural network layers; determining, by the one or more computers, a transcription based on the output vector; and providing, by the one or more computers, the transcription for the utterance. 15 . The system of claim 14 , wherein the one or more second linear operators that are non-unitary are diagonal matrix multiplication operators, and wherein all other linear operators in the cascade are unitary operators. 16 . The system of claim 14 , wherein one or more second linear operators that are non-unitary introduce decay of retained data in memory of the first recurrent neural network. 17 . The system of claim 14 , wherein the cascade of linear operators includes at least one of each of the operators in a set comprising a Fourier transformation, an inverse Fourier transformation, a diagonal matrix multiplication, a column permutation, and a Householder reflection. 18 . The system of claim 14 , wherein the cascade of linear operators comprises a sequence of operators comprising a first diagonal matrix multiplication, a Fourier transform, a first Householder reflection, a column permutation, a second diagonal matrix multiplication, an inverse Fourier transformation, a second Householder reflection, and a third diagonal matrix multiplication. 19 . One or more non-transitory computer-readable media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving, by the one or more computers, audio data indicating acoustic characteristics of an utterance; generating, by the one or more computers, an output vector by processing information from the audio data using: (1) one or more first neural network layers, (2) a matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary, and (3) one or more second neural network layers; determining, by the one or more computers, a transcription based on the output vector; and providing, by the one or more computers, the transcription for the utterance. 20 . The one or more non-transitory computer-readable media of claim 18 , wherein the one or more second linear operators that are non-unitary are diagonal matrix multiplication operators, and wherein all other linear operators in the cascade are unitary operators.

Assignees

Inventors

Classifications

  • Details of electrophonic musical instruments · CPC title

  • Neural networks · CPC title

  • Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation · CPC title

  • Artificial neural networks; Connectionist approaches · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020111483A1 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 09 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).