Method, and device for matching speech with text, and computer-readable storage medium

US11152007B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11152007-B2
Application numberUS-201916543155-A
CountryUS
Kind codeB2
Filing dateAug 16, 2019
Priority dateDec 7, 2018
Publication dateOct 19, 2021
Grant dateOct 19, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of a method and device for matching a speech with a text, and a computer-readable storage medium are provided. The method can include: acquiring a speech identification text by identifying a received speech signal; comparing the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text, in a case that no first matching text is determined.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for matching a speech with a text, comprising: acquiring a speech identification text by identifying a received speech signal; comparing the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text, in response to not determining the first matching text, wherein comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in the second matching mode to determine the second matching text comprises: converting the speech identification text into the phonetic symbols of the speech identification text and converting the multiple candidate texts into the phonetic symbols of the multiple candidate texts; calculating a similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts; and determining a candidate text with a largest similarity as a matched candidate text in response to determining that the largest similarity is larger than a set threshold; and outputting the matched candidate text, wherein calculating the similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts is by the following formula: similarity = LCS ⁡ ( s , q ) len ⁡ ( s ) wherein s represents phonetic symbols of one of the multiple candidate texts, q represents the phonetic symbols of the speech identification text, LCS(s, q) represents a length of a longest common sequence between the phonetic symbols of the one of the multiple candidate texts and the phonetic symbols of the speech identification text, len(s) represents a length of the phonetic symbols of the one of the multiple candidate texts. 2. The method according to claim 1 , further comprising: outputting the first matching text as a matched candidate text, in response to determining the first matching text; and outputting the second matching text as the matched candidate text, in response to determining the second matching text. 3. The method according to claim 1 , further comprising: calculating a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts, in response to not determining the second matching text; and outputting a candidate text with a largest similarity as a matched candidate text. 4. The method according to claim 3 , wherein the calculating a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts comprises: segmenting the speech identification text and the multiple candidate texts into words; acquiring a word vector of each word; adding word vectors of words of the speech identification text to obtain the sentence vector of the speech identification text, and adding word vectors of words of one of the multiple candidate texts to acquire a sentence vector of the one of the multiple candidate texts; and calculating a cosine similarity between the sentence vector of the speech identification text and the sentence vector of the one of the multiple candidate texts, as the similarity between the sentence vector of the speech identification text and the sentence vector of the one of the multiple candidate texts. 5. A device for matching a speech with a text, comprising: one or more processors; and a storage device configured to store one or more programs, that, when executed by the one or more processors, cause the one or more processors to: acquire a speech identification text by identifying a received speech signal; compare the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and compare phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text, in response to not determining the first matching text, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors further to: convert the speech identification text into the phonetic symbols of the speech identification text and convert the multiple candidate texts into the phonetic symbols of the multiple candidate texts; calculate a similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts; determine a candidate text with a largest similarity as a matched candidate text in response to determining that the largest similarity is larger than a set threshold; and output the matched candidate text, wherein the similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts is calculated by the following formula: similarity = LCS ⁡ ( s , q ) len ⁡ ( s ) wherein s represents phonetic symbols of one of the multiple candidate texts, q represents the phonetic symbols of the speech identification text, LCS(s, q) represents a length of a longest common sequence between the phonetic symbols of one of the multiple candidate texts and the phonetic symbols of the speech identification text, len(s) represents a length of the phonetic symbols of the one of the multiple candidate texts. 6. The device according to claim 5 , wherein the one or more programs, when executed by the one or more processors, cause the one or more processors further to: output the first matching text as a matched candidate text, in response to determining the first matching text; and output the second matching text as the matched candidate text, in response to determining the second matching text. 7. The device according to claim 5 , wherein the one or more programs, when executed by the one or more processors, cause the one or more processors further to: calculate a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts, in response to not determining the second matching text; and output a candidate text with a largest similarity as a matched candidate text. 8. The device according to claim 7 , where

Assignees

Inventors

Classifications

  • G10L17/14Primary

    Use of phonemic categorisation or speech recognition prior to speaker recognition or verification · CPC title

  • G10L15/10Primary

    using distance or distortion measures between unknown speech and reference templates · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11152007B2 cover?
Embodiments of a method and device for matching a speech with a text, and a computer-readable storage medium are provided. The method can include: acquiring a speech identification text by identifying a received speech signal; comparing the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and comparing phonetic symbols of the …
Who is the assignee on this patent?
Baidu online network technology beijing co ltd, Baidu Online Network Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L17/14. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).