What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 30 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition

Patent metadata
Field	Value
Publication number	US-10699697-B2
Application number	US-201815940197-A
Country	US
Kind code	B2
Filing date	Mar 29, 2018
Priority date	Mar 29, 2018
Publication date	Jun 30, 2020
Grant date	Jun 30, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring a multi-talker mixed speech signal from a plurality of speakers, performing permutation invariant training (PIT) model training on the multi-talker mixed speech signal based on knowledge from a single-talker speech recognition model and updating a multi-talker speech recognition model based on a result of the PIT model training.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of performing speech recognition training performed by at least one processor, the method comprising: acquiring, by the at least one processor, a multi-talker mixed speech signal from a plurality of speakers; performing, by the at least one processor, permutation invariant training (PIT) model training on the multi-talker mixed speech signal based on knowledge from a single-talker speech recognition model; and updating, by the at least one processor, a multi-talker speech recognition model based on a result of the PIT model training. 2. The method of claim 1 , wherein the single-talker speech recognition model is a teacher model, and the multi-talker speech recognition model is a student model. 3. The method of claim 1 , wherein the PIT model training uses labels from the single-talker speech recognition model, the labels being posteriors from inputting a single-talker data corresponding to one or more of the plurality of speakers into the single-talker speech recognition model. 4. The method of claim 1 , further comprises: performing PIT model training on a single talker feature corresponding to one or more of the plurality of speakers; and transferring posteriors from the performing the PIT model training on the single talker feature as soft label input for the multi-talker speech recognition model. 5. The method of claim 1 , wherein the performing PIT model training comprises: performing a bidirectional long-short term memory (BLSTM) operation on the multi-talker mixed speech signal by assigning soft labels that are posteriors from inputting a single-talker data corresponding to one or more of the plurality of speakers into the single-talker speech recognition model and generating a plurality of estimated output segments for multi-talker mixed speech signal; and minimizing a minimal average cross entropy (CE) for utterances of all possible assignments between the plurality of estimated output segments and soft labels. 6. The method of claim 1 , wherein the minimal average cross entropy (CE) is determined based on equation (1) and (2): J = 1 S ⁢ min s ′ ∈ permu ⁡ ( S ) ⁢ ∑ s ⁢ ⁢ ∑ t ⁢ ∑ y ⁢ p ′ ⁡ ( y | o t s s ′ ) ⁢ ⁢ log ⁢ ⁢ p θ s ⁡ ( y | o t ) ( 1 ) p ′ ⁡ ( y | o t s s ′ ) = λ ⁢ ⁢ p teacher ⁡ ( y | o t s s ′

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G10L15/063Primary
Training · CPC title

Patent family

Related publications grouped by family.

View patent family 68055392

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10699697B2 cover?: Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring a multi-talker mixed speech signal from a plurality of speakers, performing permutation invariant training (PIT) model training on the multi-talker mixed speech signal based on knowledge from a single-talker speech recognition mod…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 30 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).