What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Apr 30 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-task training architecture and strategy for attention-based speech recognition system

Patent metadata
Field	Value
Publication number	US-2020135174-A1
Application number	US-201816169512-A
Country	US
Kind code	A1
Filing date	Oct 24, 2018
Priority date	Oct 24, 2018
Publication date	Apr 30, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of performing sequence to sequence (Seq2Seq) speech recognition training by at least one processor, the Seq2Seq speech recognition training method comprising: acquiring, by the at least one processor, a training set comprising a plurality of pairs of input data and target data corresponding to the input data; encoding, by an encoder implemented by the at least one processor, the input data into a sequence of hidden states; performing, by the at least one processor, a connectionist temporal classification (CTC) model training based on the sequence of hidden states; performing, by the at least one processor, an attention model training based on the sequence of hidden states; and decoding, by a decoder implemented by the at least one processor, the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training. 2 . The Seq2Seq speech recognition training method of claim 1 , further comprising: additionally transforming the sequence of hidden states using additional layers to enable content match between query and context; and performing the attention model training based on the additionally transformed sequence of hidden states. 3 . The Seq2Seq speech recognition training method of claim 1 , further comprising: performing the CTC model training based on a CTC loss function. 4 . The Seq2Seq speech recognition training method of claim 1 , further comprising: performing the attention model training based on a cross entropy loss function. 5 . The Seq2Seq speech recognition training method of claim 1 , wherein the independently performing the CTC model training and the attention model training comprises: performing the CTC model training to minimize CTC loss during a first time period; and performing the attention model training to minimize cross entropy loss during a second period different from the first period. 6 . The Seq2Seq speech recognition training method of claim 1 , wherein the independently performing the CTC model training and the attention model training comprises: performing a mini-batch based alternate training in which one of the CTC model training and the attention model training is randomly selected for optimization in each mini-batch. 7 . The Seq2Seq speech recognition training method of claim 2 , wherein the decoding by the decoder comprises: generating a query information based on a previous target label and a previous prediction; generating a context information by calculating a soft alignment over all steps of the additionally transformed sequence of hidden states based on the query; and generating a target label based on the query information and the context information. 8 . The Seq2Seq speech recognition training method of claim 7 , wherein the context information is a summary of speech signals encoded in hidden layers of the encoder. 9 . The Seq2Seq speech recognition training method of claim 7 , wherein the context information is generated using scalar energy computed based on content similarity between the additionally transformed sequence of hidden states at each time step and the query information. 10 . The Seq2Seq speech recognition training method of claim 3 , wherein the CTC loss function is defined as a mean of normalized edit distance between hypothesis H(x) and the corresponding targets, Loss  ( H , S ) = 1  S   ∑ x , t ∈ S  editDistance  ( H  ( x ) , t )  t  where S=(x, t) is the training set containing all pairs of input x and its corresponding target t. 11 . A sequence to sequence (Seq2Seq) speech recognition training apparatus comprising: at least one memory operable to store program code; and at least one processor operable to read said program code and operate as instructed by said program code to: acquire a training set comprising a plurality of pairs of input data and target data corresponding to the input data; encode the input data into a sequence of hidden states; perform a connectionist temporal classification (CTC) model training based on the sequence of hidden states; perform an attention model training based on the sequence of hidden states; and decode the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training. 12 . The Seq2Seq speech recognition training apparatus of claim 11 , wherein the at least one processor is further configured to: additionally transform the sequence of hidden states using additional layers to enable content match between query and context; and perform the attention model training based on the additionally transformed sequence of hidden states. 13 . The Seq2Seq speech recognition training apparatus of claim 11 , wherein the at least one processor is further configured to: perform the CTC model training based on a CTC loss function. 14 . The Seq2Seq speech recognition training apparatus of claim 11 , wherein the at least one processor is further configured to: perform the attention model training based on a cross entropy loss function. 15 . The Seq2Seq speech recognition training apparatus of claim 11 , wherein the independently performing the CTC model training and the attention model training comprises: performing the CTC model training to minimize CTC loss during a first time period; and performing the attention model training to minimize cross entropy loss during a second period different from the first period. 16 . The Seq2Seq speech recognition training apparatus of claim 11 , wherein the independently performing the CTC model training and the attention model training comprises: performing a mini-batch based alternate training in which one of the CTC model training and the attention model training is randomly selected for optimization in each mini-batch. 17 . The Seq2Seq speech recognition training apparatus of claim 12 , wherein th

Assignees

Tencent America LLC

Inventors

Classifications

G10L15/063Primary
Training · CPC title
G10L25/03
characterised by the type of extracted parameters · CPC title
G10L25/54
for retrieval · CPC title
G10L15/10
using distance or distortion measures between unknown speech and reference templates · CPC title
G10L15/16
using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 70327075

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020135174A1 cover?: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification …
Who is the assignee on this patent?: Tencent America LLC
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Apr 30 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).