What technology area does this patent fall under?

Primary CPC classification G10L15/1815. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for adding punctuations by detecting silences in a voice using plurality of aggregate weights which obey a linear relationship

US9779728B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9779728-B2
Application number	US-201414160808-A
Country	US
Kind code	B2
Filing date	Jan 22, 2014
Priority date	May 24, 2013
Publication date	Oct 3, 2017
Grant date	Oct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments by detecting silences in the voice file; one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on a linear combination associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for modifying a voice file comprising a plurality of words, the method comprising: applying a language model to the voice file as a whole, the language model comprising a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units; generating a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units; detecting silences in the voice file; dividing the voice file into multiple segments based on at least the detected silences; identifying in the segments one or more second feature units of the plurality of feature units; applying the language model to the segments, the application of the language model to the segments for generating a second aggregate weight R2 including a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more second feature units; generating a third aggregate weight R3 determined according to R3=a×R1+(1−a)×R2 where 0<a<1; and modifying the voice file so as to include one or more final punctuations based on at least the third aggregate weight R3. 2. The method of claim 1 , wherein the identification of the one or more first feature units and the identification of the one or more second feature units are performed approximately simultaneously with one another. 3. The method of claim 1 , wherein: the voice file is processed in parallel for identifying the one or more first feature units and for identifying the one or more second feature units respectively. 4. The method of claim 1 , wherein the one or more second feature units include an aggregation of one or more third feature units of the segments. 5. The method of claim 1 , wherein the dividing the voice file into multiple segments includes: determining a silence threshold according to a current application scenario; detecting a silence duration in the voice file; and if the silence duration exceeds the silence threshold, generating a segment from the voice file based on at least a silence location corresponding to the silence duration. 6. The method of claim 1 , further comprising establishing the language model using steps comprising: performing word separation to divide one or more sentences in a corpus into a plurality of words, the sentences in the corpus including one or more preliminary punctuations; searching for one or more third feature units in the corpus according to a predetermined feature template based on semantic features of the words in the corpus; and for a particular third feature unit including a word or phrase in the corpus: recording a number of occurrences of the preliminary punctuation state associated with the word or phrase of that particular third feature unit; determining the preliminary weight of that preliminary punctuation state based on at least the recorded number of occurrences; and mapping the particular third feature unit to the preliminary weight of that preliminary punctuation state. 7. The method of claim 1 , wherein: at least one of the one or more first feature units or at least one of the one or more of the second feature units includes a single-word feature unit, the single-word feature unit being acquired based on at least a single-word feature template; the single-word feature template includes a word satisfying a predetermined relative-position relationship with a present reference position and one or more semantic features of the word; the acquisition of the single-word feature unit includes: selecting a position of another word as the present reference position; determining the word satisfying the predetermined relative-position relationship with the present reference position; and identifying the single-word feature unit based on at least: the one or more semantic features, the single-word feature unit, and the predetermined relative-position relationship between the word and the present reference position. 8. The method of claim 1 , wherein: at least one of the one or more first feature units or at least one of the one or more second feature units includes a multi-word feature unit, the multi-word feature unit being acquired based on at least a multi-word feature template; the multi-word feature template includes acquisition of multiple words satisfying a predetermined relative-position relationship with a present reference position and one or more semantic features of the words; the acquisition of the multi-word feature unit includes: selecting a position of another word as the present reference position; determining the words satisfying the predetermined relative-position relationship with the present reference position; and identifying the multi-word feature unit based on at least: the one or more semantic features, the multi-word feature unit, and the predetermined relative-position relationship between the words and the present reference position. 9. The method of claim 1 , wherein: determining the first aggregate weight R1 includes: acquiring from the language model a mapping between the one or more first feature units and one or more of the preliminary weights for one or more of the preliminary punctuation states associated with the one or more first feature units; determining one or more word weights related to the one or more preliminary punctuation states based on at least the mapping; and calculating the first aggregate weight R1 based on at least the word weights. 10. The method of claim 1 , wherein: determining the second aggregate weight R2 includes: acquiring from the language model a mapping between the one or more second feature units and one or more of the preliminary weights for one or more of the preliminary punctuation states associated with the one or more second feature units; determining one or more word weights related to the one or more preliminary punctuation states based on at least the mapping; and calculating the second aggregate weight R2 based on at least the word weights. 11. A system for modifying a voice file comprising a plurality of words, the system comprising: a silence-detection module configured to detect silences in the voice file and to divide the voice file into multiple segments based on at least the detected silences, an identification module configured to: apply a language model to the voice file as a whole, the language model including a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units; and identifying in the segments one or more second feature units of the plurality of feature units; and a punctuation-addition module configured to: generate a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units; apply the language model to the segments, the application of the language model to the segments to generate a second aggregate weig

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G10L15/187
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
G06F40/226
Validation · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/04
Segmentation; Word boundary detection · CPC title
G10L15/183
using context dependencies, e.g. language models · CPC title

Patent family

Related publications grouped by family.

View patent family 51852489

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9779728B2 cover?: Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments by detecting silences in the voice file; one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggr…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/1815. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).