Method and system for adding punctuation to voice files
US-9442910-B2 · Sep 13, 2016 · US
US9779728B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9779728-B2 |
| Application number | US-201414160808-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 22, 2014 |
| Priority date | May 24, 2013 |
| Publication date | Oct 3, 2017 |
| Grant date | Oct 3, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments by detecting silences in the voice file; one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on a linear combination associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight.
Opening claim text (preview).
What is claimed is: 1. A method for modifying a voice file comprising a plurality of words, the method comprising: applying a language model to the voice file as a whole, the language model comprising a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units; generating a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units; detecting silences in the voice file; dividing the voice file into multiple segments based on at least the detected silences; identifying in the segments one or more second feature units of the plurality of feature units; applying the language model to the segments, the application of the language model to the segments for generating a second aggregate weight R2 including a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more second feature units; generating a third aggregate weight R3 determined according to R3=a×R1+(1−a)×R2 where 0<a<1; and modifying the voice file so as to include one or more final punctuations based on at least the third aggregate weight R3. 2. The method of claim 1 , wherein the identification of the one or more first feature units and the identification of the one or more second feature units are performed approximately simultaneously with one another. 3. The method of claim 1 , wherein: the voice file is processed in parallel for identifying the one or more first feature units and for identifying the one or more second feature units respectively. 4. The method of claim 1 , wherein the one or more second feature units include an aggregation of one or more third feature units of the segments. 5. The method of claim 1 , wherein the dividing the voice file into multiple segments includes: determining a silence threshold according to a current application scenario; detecting a silence duration in the voice file; and if the silence duration exceeds the silence threshold, generating a segment from the voice file based on at least a silence location corresponding to the silence duration. 6. The method of claim 1 , further comprising establishing the language model using steps comprising: performing word separation to divide one or more sentences in a corpus into a plurality of words, the sentences in the corpus including one or more preliminary punctuations; searching for one or more third feature units in the corpus according to a predetermined feature template based on semantic features of the words in the corpus; and for a particular third feature unit including a word or phrase in the corpus: recording a number of occurrences of the preliminary punctuation state associated with the word or phrase of that particular third feature unit; determining the preliminary weight of that preliminary punctuation state based on at least the recorded number of occurrences; and mapping the particular third feature unit to the preliminary weight of that preliminary punctuation state. 7. The method of claim 1 , wherein: at least one of the one or more first feature units or at least one of the one or more of the second feature units includes a single-word feature unit, the single-word feature unit being acquired based on at least a single-word feature template; the single-word feature template includes a word satisfying a predetermined relative-position relationship with a present reference position and one or more semantic features of the word; the acquisition of the single-word feature unit includes: selecting a position of another word as the present reference position; determining the word satisfying the predetermined relative-position relationship with the present reference position; and identifying the single-word feature unit based on at least: the one or more semantic features, the single-word feature unit, and the predetermined relative-position relationship between the word and the present reference position. 8. The method of claim 1 , wherein: at least one of the one or more first feature units or at least one of the one or more second feature units includes a multi-word feature unit, the multi-word feature unit being acquired based on at least a multi-word feature template; the multi-word feature template includes acquisition of multiple words satisfying a predetermined relative-position relationship with a present reference position and one or more semantic features of the words; the acquisition of the multi-word feature unit includes: selecting a position of another word as the present reference position; determining the words satisfying the predetermined relative-position relationship with the present reference position; and identifying the multi-word feature unit based on at least: the one or more semantic features, the multi-word feature unit, and the predetermined relative-position relationship between the words and the present reference position. 9. The method of claim 1 , wherein: determining the first aggregate weight R1 includes: acquiring from the language model a mapping between the one or more first feature units and one or more of the preliminary weights for one or more of the preliminary punctuation states associated with the one or more first feature units; determining one or more word weights related to the one or more preliminary punctuation states based on at least the mapping; and calculating the first aggregate weight R1 based on at least the word weights. 10. The method of claim 1 , wherein: determining the second aggregate weight R2 includes: acquiring from the language model a mapping between the one or more second feature units and one or more of the preliminary weights for one or more of the preliminary punctuation states associated with the one or more second feature units; determining one or more word weights related to the one or more preliminary punctuation states based on at least the mapping; and calculating the second aggregate weight R2 based on at least the word weights. 11. A system for modifying a voice file comprising a plurality of words, the system comprising: a silence-detection module configured to detect silences in the voice file and to divide the voice file into multiple segments based on at least the detected silences, an identification module configured to: apply a language model to the voice file as a whole, the language model including a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units; and identifying in the segments one or more second feature units of the plurality of feature units; and a punctuation-addition module configured to: generate a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units; apply the language model to the segments, the application of the language model to the segments to generate a second aggregate weig
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Validation · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Segmentation; Word boundary detection · CPC title
using context dependencies, e.g. language models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.