Auditory selection method and device based on memory and attention model

US10818311B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10818311-B2
Application numberUS-201816632373-A
CountryUS
Kind codeB2
Filing dateNov 14, 2018
Priority dateNov 15, 2017
Publication dateOct 27, 2020
Grant dateOct 27, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An auditory selection method based on a memory and attention model, including: step S1, encoding an original speech signal into a time-frequency matrix; step S2, encoding and transforming the time-frequency matrix to convert the matrix into a speech vector; step S3, using a long-term memory unit to store a speaker and a speech vector corresponding to the speaker; step S4, obtaining a speech vector corresponding to a target speaker, and separating a target speech from the original speech signal through an attention selection model. A storage device includes a plurality of programs stored in the storage device. The plurality of programs are configured to be loaded by a processor and execute the auditory selection method based on the memory and attention model. A processing unit includes the processor and the storage device.

First claim

Opening claim text (preview).

What is claimed is: 1. An auditory selection method based on a memory and attention model, comprising: encoding an original speech signal into a matrix containing time-frequency dimensions; encoding and transforming the matrix containing the time-frequency dimensions to convert the matrix containing the time-frequency dimensions into a speech vector using a bi-directional long short-term memory (BiLSTM) network model to encode the matrix containing the time-frequency dimensions in a sequential order and in a reverse order, respectively, to obtain a first hidden layer vector and a second hidden layer vector, respectively; wherein, the BiLSTM network model is configured to encode the matrix containing the time-frequency dimensions to obtain a hidden layer vector, and a formula of the BiLSTM network model comprises: i t =σ( W xi x t +W hi h t-1 +W ci c t-1 +b i ) f t =σ( W xf x t +W hf h t-1 +W cf c t-1 +b f ) c t =f t c t-1 +i t tan h ( W xc x t +W hc h t-1 +b c ) o t =σ( W xo x t +W ho h t-1 +W co c t-1 +b o ) h t =o t tan h ( c t ) where, i, f, c, o, and h respectively represent an input gate, a forget gate, a storage unit, an output gate, and the hidden layer vector of the BiLSTM network model, σ represents a Sigmoid function, x represents an input vector, and t represents a time; where, W xi , W hi ,and W ci respectively represent an encoding matrix parameter of an input vector x t in the input gate at a current time, an encoding matrix parameter of the hidden layer vector h t-1 in the input gate at a previous time, and an encoding matrix parameter of a memory unit C t-1 in the input gate at the previous time; b i represents an information bias parameter in the input gate; where, W xf , W hf , and W cf respectively represent an encoding matrix parameter of the input vector x t in the forget gate at the current time, an encoding matrix parameter of the hidden layer vector h t-1 in the forget gate at the previous time, and an encoding matrix parameter of the memory unit C t-1 in the forget gate at the previous time; b f represents an information bias parameter in the forget gate; where, W xc and W hc respectively represent an encoding matrix parameter of the input vector X t in the storage unit at the current time and an encoding matrix parameter of the hidden layer vector h t-1 in the storage unit at the previous time; b c represents an information bias parameter in the storage unit; and where, W xo , W ho , and W co respectively represent an encoding matrix parameter of the input vector x t in the output gate at the current time, an encoding matrix parameter of the hidden layer vector h t-1 in the output gate at the previous time, and an encoding matrix parameter of the memory unit C t-1 in the output gate at the previous time; b o represents an information bias parameter in the output gate; storing a speaker and a speech vector corresponding to the speaker in a long-term memory unit; obtaining a speech vector corresponding to a target speaker from the long-term memory unit; and according to the speech vector corresponding to the target speaker, separating a target speech from the original speech signal by an attention selection model. 2. The auditory selection method based on the memory and attention model according to claim 1 , wherein, before “encoding the original speech signal into the matrix containing the time-frequency dimensions”, the auditory selection method further comprises: resampling the original speech signal to form a resampled speech signal, and filtering the resampled speech signal to reduce a sampling rate of the original speech signal. 3. The auditory selection method based on the memory and attention model according to claim 2 , wherein, the step of “encoding and transforming the matrix containing the time-frequency dimensions to convert the matrix containing the time-frequency dimensions into the speech vector” comprises: fusing the first hidden layer vector with the second hidden layer vector at a time corresponding to the first hidden layer vector to obtain a third hidden layer vector; and converting the third hidden layer vector into the speech vector through a fully connected layer; wherein, the matrix containing the time-frequency dimensions is encoded in sequential order at a first time and the matrix containing the time-frequency dimensions is encoded in reverse order at a second time, and the first time corresponds to the second time. 4. The auditory selection method based on the memory and attention model according to claim 3 , wherein, the step of “fusing the first hidden layer vector with the second hidden layer vector at the time corresponding to the first hidden layer vector” comprises: adding the first hidden layer vector to the second hidden layer vector, or calculating an average value of the first hidden layer vector and the second hidden layer vector, or splicing the first hidden layer vector and the second hidden layer vector end to end. 5. The auditory selection method based on the memory and attention model according to claim 1 , wherein, the step of “storing the speaker and the speech vector corresponding to the speaker in the long-term memory unit” comprises: storing the speaker and the speech vector corresponding to the speaker in the long-term memory unit in a Key-Value form, wherein a Key is configured to store an index of the speaker and a Value is configured to store the speech vector corresponding to the speaker. 6. The auditory selection method based on the memory and attention model according to claim 5 , wherein, after “storing the speaker and the speech vector corresponding to the speaker in the long-term memory unit”, the auditory selection method further comprises: when the speaker generates a new speech, extracting a new speech vector of the new speech of the speaker, and updating the speech vector of the speaker stored in the long-term memory unit to replace an original speech vector of the speaker with the new speech vector. 7. The auditory selection method based on the memory and attention model according to claim 6 , wherein, the step of “updating the speech vector of the speaker” comprises: after the new speech vector of the speaker is extracted, adding the new speech vector to the original speech vector of the speaker in the long-term memory unit, normalizing amplitudes in an obtained result, wherein a formula of normalizing the amplitudes in the obtained result is as follows: v = q + v ⁢ 1  q + v ⁢ ⁢ 1  , where, q represents a new speech vector generated by the speaker, v 1 represents the original speech vector of the speaker, and V represents an updated speech vector of the speaker. 8. The auditory selection method based on the memory and attention model according to claim 1 , wherein,

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • the noise being separate speech, e.g. cocktail party · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10818311B2 cover?
An auditory selection method based on a memory and attention model, including: step S1, encoding an original speech signal into a time-frequency matrix; step S2, encoding and transforming the time-frequency matrix to convert the matrix into a speech vector; step S3, using a long-term memory unit to store a speaker and a speech vector corresponding to the speaker; step S4, obtaining a speech vec…
Who is the assignee on this patent?
Inst Automation Cas
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 27 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).