Method and system for adding punctuation to voice files

US9442910B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9442910-B2
Application numberUS-201414219704-A
CountryUS
Kind codeB2
Filing dateMar 19, 2014
Priority dateMay 24, 2013
Publication dateSep 13, 2016
Grant dateSep 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for adding punctuation to a voice file is disclosed. The method includes: utilizing silence or pause duration detection to divide a voice file into a plurality of speech segments for processing, the voice file includes a plurality of features units; identifying all features units that appear in the voice file according to every term or expression and semantics features of the every term or expression that form each of the plurality of speech segments; using a linguistic model to determine a sum of weight of various punctuation modes in the voice file according to all the feature units, the linguistic model is built upon semantics features of various parsed out terms or expressions from a body text of a spoken sentence according to a language library; and adding punctuations to the voice file based on the determined sum of weight of the various punctuation modes.

First claim

Opening claim text (preview).

What is claimed is: 1. An improved method for adding punctuations to a voice file, comprising: executing by a processor, program codes stored in a memory to configure a computing device to add punctuations to a voice file, comprising performing the following steps: utilizing silence or pause duration detection to divide the voice file into a plurality of speech segments for processing, wherein respective speech segments form respective sentences within the voice file, and each respective sentence of the voice file comprising a plurality of features units, wherein each feature unit comprises a single term or multi-terms expression having semantic features corresponding to the single term or multi-terms expression; identifying the plurality of features units that appear in the voice file according to every term or expression, and according to the semantic features corresponding to the every single term or multi-terms expression that form each of the plurality of speech segments, the semantic features comprising a word attribute and a composition within each respective sentence and wherein identifying the plurality of feature units is based on taking the respective location of each term as the current reference location, determine a single term whose relative location relationship with the current reference location comprises the semantic features of the single term feature or expression template according to the single term feature template and further continuing the identifying for multi-terms expression comprising the term based on each of the identified feature units; assigning a corresponding weight to each punctuation mode which is associated to the single term or multi-terms expression in each respective identified feature unit, wherein a punctuation mode being either no punctuation used or a particular punctuation being used in the single term or multi-terms expression; using a linguistic model to determine a maximum sum of weight as ultimate punctuation modes for the respective speech segments which form the respective sentences within the voice file, wherein a sum of weight is determined by summing all corresponding weights on occurrences of each of various possible punctuation modes in the voice file and according to all the respective identified feature units, wherein the linguistic model is built upon the semantic features of parsed out various single terms or multi-terms expressions from a body text of a spoken sentence according to a language library; adding respective punctuations to form respective punctuated sentences within the voice file based on the determined maximum sum of weight of the various punctuation modes; and transcribing the voice file with the added respective punctuations to output the punctuated sentences as text. 2. The method according to claim 1 , wherein the silence or pause detection comprises: determining a silence or pause duration threshold according to a current application scenario; detecting the silence or pause duration in the voice file to be processed, and when the silence or pause duration is longer than the silence threshold: separating the speech segments in the voice file at locations that correspond to the silence or pause duration. 3. The method according to claim 1 , wherein the identifying of the plurality of features units that appears in the voice file, comprising: gathering into a set, the respective identified feature units that appear in the plurality of speech segments. 4. The method according to claim 1 , wherein the building of the linguistic model comprises: parsing out the various single terms or multi-terms expressions from the body text of the spoken sentence, wherein punctuations have already been added in advance to the spoken sentence according to the language library; searching the respective identified feature unit according to the semantic features of each parsed out single term or multi-terms expression, and according to a preset feature template; recording a number of occurrences of each punctuation mode in each respective identified feature unit in the body text of the spoken sentence, according to the punctuation mode that follows the single term or multi-terms expression in the respective identified feature unit; determining a corresponding weight of each punctuation mode of each respective identified feature unit according to the number of occurrences of each punctuation mode of each respective identified feature unit; and building the linguistic model which comprises every respective identified feature unit and its respective punctuation mode with a corresponding weight relationship. 5. The method according to claim 1 , wherein the single term feature unit is acquired according to a single term feature template, and the multi-term expression feature unit is acquired according to a multi-term expression feature template; the single term or multi-terms expression feature template perform functions, comprising: acquiring the single term or multi-terms expression whose current reference location in relation to its relative location fulfills a predetermined requirement, and the semantic features of the single term or multi-terms expression, and the acquiring of the single term feature unit which is based on the single term feature template, comprising: taking a respective location of each single term as the current reference location, determining the single term whose relative location relationship with the current reference location fulfills the requirements of the single term feature template according to the single term feature template; identifying the single term feature unit according to the semantic features of the single term, the single term feature unit comprises the single term, the semantic features of the single term and the relative location relationship of the single term with the current reference location; and the multi-terms expression feature template includes acquiring the multi-terms expression whose relative location relationship with the current reference location fulfills the predetermined requirements and the semantics features of each of the multi-terms expression, and the multi-terms expression feature units perform functions, comprising: acquiring the respective location of each multi-terms expression as the current reference location, determining the multi-terms expression whose relative location relationship with the current reference location fulfills the requirements of the multi-terms expression feature template according to the multi-terms expression feature template; identifying the multi-terms expression feature unit according to the semantic features of each multi-terms expression, the multi-term expression feature unit comprises the multi-terms expression the semantic features of the multi-terms expression and the relative location relationship of the multi-terms expression with the current reference location. 6. The method according to claim 1 , wherein the determining of the maximum sum of weight on each of the various possible punctuation modes in the voice file and according to all the respective identified feature units, comprises: acquiring from the linguistic model corresponding relationships between each respective identified feature unit among all the respective identified feature units and the corresponding weights of the respective various possible punctuation modes; determining the corresponding weight of the punctuation mode of each single term or multi-terms expression in the voice file to be processed according to the acquired corresponding relationships, and determining the maximum sum of weight of the various possible punctuation modes of the voice file to be processed according to the corresponding weight of the punctuation mode of eac

Assignees

Inventors

Classifications

  • Segmentation; Word boundary detection · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • G06F40/166Primary

    Editing, e.g. inserting or deleting · CPC title

  • Physics · mapped topic

  • G06F17/24Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9442910B2 cover?
A method and system for adding punctuation to a voice file is disclosed. The method includes: utilizing silence or pause duration detection to divide a voice file into a plurality of speech segments for processing, the voice file includes a plurality of features units; identifying all features units that appear in the voice file according to every term or expression and semantics features of th…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).