What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Token-wise training for attention based end-to-end speech recognition

Patent metadata
Field	Value
Publication number	US-11037547-B2
Application number	US-201916275971-A
Country	US
Kind code	B2
Filing date	Feb 14, 2019
Priority date	Feb 14, 2019
Publication date	Jun 15, 2021
Grant date	Jun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, the method comprising: performing cross-entropy training of a model, based on one or more input features of a speech signal; selecting a hypothesis with a longest correct prefix, from a plurality of hypotheses of the model of which the cross-entropy training is performed; determining a posterior probability vector at a time of a first wrong token included in the selected hypotheses, among one or more output tokens of the model of which the cross-entropy training is performed; determining a loss of the first wrong token at the time, based on the determined posterior probability vector; determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token; and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set. 2. The method of claim 1 , wherein the posterior probability vector at the time is determined as follows: p t =Decoder( s t−1 ∈{r t−1 ,y t−1 },H enc ), where t denotes the time, p t denotes the posterior probability vector at the time t, H enc denotes the one or more features that are encoded, y t−1 denotes an output token at a previous time t−1, r t−1 denotes a reference token at the previous time t−1, and s t−1 denotes a token randomly selected from {r t−1 ,y t−1 }. 3. The method of claim 1 , wherein the total loss of the training set is determined as follows: L ⁡ ( θ ) TWT = ∑ ( y , r ) ∈ ( Y , R ) ⁢ l θ ⁡ ( y t ω , r t ω ) , where L(θ) denotes the total loss of the training set, (Y,R) denotes hypothesis-reference pairs in the training set, t ω denotes the time, y t ω denotes the first wrong token at the time, r t ω denotes a reference token at the time, and l θ (y t ω , r t ω ) denotes the loss of the first wrong token. 4. The method of claim 3 , wherein the loss of the first wrong token is determined as follows: l θ ( y t ω ,r t ω )=−log p t ω ,r t ω , where p t ω ,r t ω denotes a posterior probability of the reference token at the time. 5. The method of claim 3 , wherein the loss of the first wrong token is determined as follows: l θ ( y t ω ,r t ω )=−log p t ω ,r t ω +log p t ω ,y t ω , where p t ω ,r t ω denotes a posterior probability of the reference token at the time, and p t ω ,y t ω denotes a posterior probability of the first wrong token at the time. 6. The method of claim 1 , wherein the total loss of the training set is determined as follows: L ⁡ ( θ ) TWTiB = ∑ ( y , r ) ∈ ( Y , R ) ⁢ l θ ⁡ ( y t jl , ω jl , r t jl , ω ) , where L(θ) denotes the total loss of the training set, (Y,R) denotes hypothesis-reference pairs in the training set, t jl,ω denotes the time, y t jl,ω jl denotes the first wrong token at the time, r t jl,ω row denotes a reference token at the time, and l θ (y t jl,ω jl ,r t jl,ω ) denotes the loss of the first wrong token. 7. An apparatus for attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, the apparatus comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code, the program code including: performing code configured to cause the at least one processor to perform cross-entropy training of a model, based on one or more input features of a speech signal; selecting code configured to cause the at least one processor to select a hypothesis with a longest correct prefix, from a plurality of hypotheses of the model of which the cross-entropy training is preformed; first determining

Assignees

Tencent America LLC

Inventors

Classifications

G06N3/047
Probabilistic or stochastic networks · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

View patent family 72042335

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11037547B2 cover?: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss …
Who is the assignee on this patent?: Tencent America LLC
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Minimum word error rate training for attention-based sequence-to-sequence models

Speech recognition with sequence-to-sequence models

Training sequence generation neural networks using quality scores

Unsupervised learning utilizing sequential output statistics

System and Method for End-to-End speech recognition

Deployed end-to-end speech recognition

Training device, speech detection device, training method, and computer program product

Frequently asked questions