Discovery of semantic similarities between images and text
US-2017061250-A1 · Mar 2, 2017 · US
US11152007B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11152007-B2 |
| Application number | US-201916543155-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 16, 2019 |
| Priority date | Dec 7, 2018 |
| Publication date | Oct 19, 2021 |
| Grant date | Oct 19, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of a method and device for matching a speech with a text, and a computer-readable storage medium are provided. The method can include: acquiring a speech identification text by identifying a received speech signal; comparing the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text, in a case that no first matching text is determined.
Opening claim text (preview).
What is claimed is: 1. A method for matching a speech with a text, comprising: acquiring a speech identification text by identifying a received speech signal; comparing the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text, in response to not determining the first matching text, wherein comparing phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in the second matching mode to determine the second matching text comprises: converting the speech identification text into the phonetic symbols of the speech identification text and converting the multiple candidate texts into the phonetic symbols of the multiple candidate texts; calculating a similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts; and determining a candidate text with a largest similarity as a matched candidate text in response to determining that the largest similarity is larger than a set threshold; and outputting the matched candidate text, wherein calculating the similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts is by the following formula: similarity = LCS ( s , q ) len ( s ) wherein s represents phonetic symbols of one of the multiple candidate texts, q represents the phonetic symbols of the speech identification text, LCS(s, q) represents a length of a longest common sequence between the phonetic symbols of the one of the multiple candidate texts and the phonetic symbols of the speech identification text, len(s) represents a length of the phonetic symbols of the one of the multiple candidate texts. 2. The method according to claim 1 , further comprising: outputting the first matching text as a matched candidate text, in response to determining the first matching text; and outputting the second matching text as the matched candidate text, in response to determining the second matching text. 3. The method according to claim 1 , further comprising: calculating a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts, in response to not determining the second matching text; and outputting a candidate text with a largest similarity as a matched candidate text. 4. The method according to claim 3 , wherein the calculating a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts comprises: segmenting the speech identification text and the multiple candidate texts into words; acquiring a word vector of each word; adding word vectors of words of the speech identification text to obtain the sentence vector of the speech identification text, and adding word vectors of words of one of the multiple candidate texts to acquire a sentence vector of the one of the multiple candidate texts; and calculating a cosine similarity between the sentence vector of the speech identification text and the sentence vector of the one of the multiple candidate texts, as the similarity between the sentence vector of the speech identification text and the sentence vector of the one of the multiple candidate texts. 5. A device for matching a speech with a text, comprising: one or more processors; and a storage device configured to store one or more programs, that, when executed by the one or more processors, cause the one or more processors to: acquire a speech identification text by identifying a received speech signal; compare the speech identification text with multiple candidate texts in a first matching mode to determine a first matching text; and compare phonetic symbols of the speech identification text with phonetic symbols of the multiple candidate texts in a second matching mode to determine a second matching text, in response to not determining the first matching text, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors further to: convert the speech identification text into the phonetic symbols of the speech identification text and convert the multiple candidate texts into the phonetic symbols of the multiple candidate texts; calculate a similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts; determine a candidate text with a largest similarity as a matched candidate text in response to determining that the largest similarity is larger than a set threshold; and output the matched candidate text, wherein the similarity between the phonetic symbols of the speech identification text and the phonetic symbols of each of the multiple candidate texts is calculated by the following formula: similarity = LCS ( s , q ) len ( s ) wherein s represents phonetic symbols of one of the multiple candidate texts, q represents the phonetic symbols of the speech identification text, LCS(s, q) represents a length of a longest common sequence between the phonetic symbols of one of the multiple candidate texts and the phonetic symbols of the speech identification text, len(s) represents a length of the phonetic symbols of the one of the multiple candidate texts. 6. The device according to claim 5 , wherein the one or more programs, when executed by the one or more processors, cause the one or more processors further to: output the first matching text as a matched candidate text, in response to determining the first matching text; and output the second matching text as the matched candidate text, in response to determining the second matching text. 7. The device according to claim 5 , wherein the one or more programs, when executed by the one or more processors, cause the one or more processors further to: calculate a similarity between a sentence vector of the speech identification text and a sentence vector of each of the multiple candidate texts, in response to not determining the second matching text; and output a candidate text with a largest similarity as a matched candidate text. 8. The device according to claim 7 , where
Use of phonemic categorisation or speech recognition prior to speaker recognition or verification · CPC title
using distance or distortion measures between unknown speech and reference templates · CPC title
Matching criteria, e.g. proximity measures · CPC title
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.