What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 19 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and device for acoustic language model training

US9396723B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9396723-B2
Application number	US-201314109845-A
Country	US
Kind code	B2
Filing date	Dec 17, 2013
Priority date	Feb 1, 2013
Publication date	Jul 19, 2016
Grant date	Jul 19, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of training an acoustic language model, comprising: at a device having one or more processors and memory: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model. 2. The method of claim 1 , wherein performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels further comprises: identifying, in a classification glossary, respective word class labels for one or more respective words in the initial word segmentation data containing no word class labels; and replacing the one or more respective words in the initial word segmentation data containing no word class labels with the identified respective word class labels to obtain the first word segmentation data containing word class labels. 3. The method of claim 1 , wherein using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels further comprises: identifying, in a classification glossary, respective word class labels for one or more respective words in the training samples in the training corpus; replacing the one or more respective words in the training samples with the identified respective word class labels to obtain new training samples containing word class labels; and conducting word segmentation for the new training samples using the first language model containing word class labels, to obtain the second word segmentation data containing word class labels. 4. The method of claim 3 , further comprising: after obtaining the second word segmentation data containing word class labels: comparing segmentation results of corresponding training samples in the first and the second word segmentation data; and in accordance with a determination that the first word segmentation data is consistent with the second word segmentation data, approving the second word segmentation data for use in the training of the acoustic language model. 5. The method of claim 4 , further comprising: after obtaining the second word segmentation data containing word class labels: in accordance with a determination that the first word segmentation data is inconsistent with the second word segmentation data, retrain the first language model using the second word segmentation data. 6. The method of claim 5 , further comprising: after the first language model is retrained, repeating the word segmentation for the second training sample using the first language model containing word class labels, to obtain revised second word segmentation data; and in accordance with a determination that the revised second word segmentation data is consistent with the second word segmentation data, approving the revised second word segmentation data for use in the training of the acoustic language model. 7. The method of claim 4 , wherein a determining that the first word segmentation data is consistent with the second word segmentation data further comprises a determination that respective word class label replacements in the first word segmentation data are identical to respective word class label replacements in the second word segmentation data. 8. A system for training an acoustic language model, comprising: one or more processors; and memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model. 9. The system of claim 8 , wherein performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels further comprises: identifying, in a classification glossary, respective word class labels for one or more respective words in the initial word segmentation data containing no word class labels; and replacing the one or more respective words in the initial word segmentation data containing no word class labels with the identified respective word class labels to obtain the first word segmentation data containing word class labels. 10. The system of claim 8 , wherein using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels further comprises: identifying, in a classification glossary, respective word class labels for one or more respective words in the training samples in the training corpus; replacing the one or more respective words in the training samples with the identified respective word class labels to obtain new training samples containing word class labels; and conducting word segmentation for the new training samples using the first language model containing word class labels, to obtain the second word segmentation data containing word class labels. 11. The system of claim 10 , wherein the operations further comprise: after obtaining the second word segmentation data containing word class labels: comparing segmentation results of corresponding training samples in the first and the second word segmentation data; and in accordance with a determination that the first word segmentation data is consistent with the second word segmentation data, approving the second word segmentation data for use in the training of the acoustic language model. 12. The system of claim 11 , wherein the operations further comprise: after obtaining the second word segmentation data containing word class labels: in accordance with a determination that the first word segmentation data is inconsistent with the second word segmentation data, retrain the first language model using the second word segmentation data. 13. The sys

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G06F40/40
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
G10L15/183
using context dependencies, e.g. language models · CPC title
G10L15/063Primary
Training · CPC title
G06F17/28
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 51260007

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9396723B2 cover?: A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 19 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).