Speech-to-text engine customization

US10832680B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10832680-B2
Application numberUS-201816201447-A
CountryUS
Kind codeB2
Filing dateNov 27, 2018
Priority dateNov 27, 2018
Publication dateNov 10, 2020
Grant dateNov 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer-readable media are described for automatically identifying potential errors in the text output of a domain-agnostic speech-to-text engine and generating text snippets that contain words representative of the potential errors and other words in the neighborhoods of such words for context. In this manner, a substantially reduced amount of text (i.e., the text snippets) can be reviewed for errors in the speech-to-text conversion rather than the entire text output, thereby significantly reducing the burden associated with error identification in the text output.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for automated identification of one or more potential errors in a text output of a speech-to-text engine, the method comprising: receiving, using a processor, the text output of the speech-to-text engine; determining, using the processor, a first vector representation of a first word in the text output; determining, using the processor, a second vector representation of a second word in the text output; determining, using the processor, that the first vector representation and the second vector representation satisfy a similarity threshold; determining that the first word and the second word form a synonym cluster based at least in part on determining that the first vector representation and the second vector representation satisfy the similarity threshold, wherein the synonym cluster is indicative of a potential error in the text output; and generating a text snippet from the text output, wherein the text snippet comprises at least the first word and the second word. 2. The computer-implemented method of claim 1 , wherein the first word and the second word forming a synonym cluster indicates that respective probabilities that one or more other words are in a first neighborhood of the first word are approximately equal to respective probabilities that the one or more other words are in a second neighborhood of the second word. 3. The computer-implemented method of claim 1 , further comprising learning the first vector representation of the first word, wherein learning the first vector representation comprises: performing matrix multiplication of the first vector representation with a hidden matrix to obtain a matrix product; determining, for each other word in the text output that is within a co-occurrence window of the first word, a respective backpropagation error between the matrix product and a respective vector encoding representation of the each other word; and adjusting one or more parameters of the first vector representation until each respective backpropagation error satisfies a threshold value. 4. The computer-implemented method of claim 1 , further comprising determining that the synonym cluster is indicative of an error in the text output by determining that the synonym cluster fails to satisfy the similarity threshold with respect to a public dataset. 5. The computer-implemented method of claim 1 , further comprising learning the first vector representation of the first word, wherein learning the first vector representation comprises: generating an information matrix V having dimensions W×W, wherein W is a size of a vocabulary associated with the speech-to-text engine, and wherein each entry of the information matrix V is a conditional probability that a respective word in the vocabulary appears in a respective neighborhood of another respective word in the vocabulary; and applying a singular value decomposition technique to approximate V as UAX T , wherein UA is a W×N matrix approximation of V and X T is a N×W hidden matrix, wherein a particular row of UA represents the first vector representation of the first word. 6. The computer-implemented method of claim 5 , wherein learning the first vector representation of the first word further comprises: determining an upper bound on a reconstruction error between the information matrix V and the approximation UAX T ; determining that the upper bound does not satisfy a threshold value; increasing the dimensionality N; and re-applying the singular value decomposition technique to approximate V as UAX T . 7. The computer-implemented method of claim 1 , wherein determining that the first vector representation and the second vector representation satisfy a similarity threshold comprises: determining a similarity metric between the first vector representation and the second vector representation; and determining that the similarity metric satisfies a threshold value. 8. The computer-implemented method of claim 1 , wherein it is determined that the first word is an erroneous output of the speech-to-text engine, and wherein the speech-to-text engine is customized to correctly recognize the first word. 9. A system for automated identification of one or more potential errors in a text output of a speech-to-text engine, the system comprising: at least one processor; and at least one memory storing computer-executable instructions, wherein the at least one processor is configured to access the at least one memory and execute the computer-executable instructions to: receive the text output of the speech-to-text engine; determine a first vector representation of a first word in the text output; determine a second vector representation of a second word in the text output; determine that the first vector representation and the second vector representation satisfy a similarity threshold; determine that the first word and the second word form a synonym cluster based at least in part on determining that the first vector representation and the second vector representation satisfy the similarity threshold, wherein the synonym cluster is indicative of a potential error in the text output; and generate a text snippet from the text output, wherein the text snippet comprises at least the first word and the second word. 10. The system of claim 9 , wherein the first word and the second word forming a synonym cluster indicates that respective probabilities that one or more other words are in a first neighborhood of the first word are approximately equal to respective probabilities that the one or more other words are in a second neighborhood of the second word. 11. The system of claim 9 , wherein the at least one processor is further configured to execute the computer-executable instructions to learn the first vector representation of the first word, and wherein the at least one processor is configured to learn the first vector representation by executing the computer-executable instructions to: perform matrix multiplication of the first vector representation with a hidden matrix to obtain a matrix product; determine, for each other word in the text output that is within a co-occurrence window of the first word, a respective backpropagation error between the matrix product and a respective vector encoding representation of the each other word; and adjust one or more parameters of the first vector representation until each respective backpropagation error satisfies a threshold value. 12. The system of claim 11 , wherein the at least one processor is further configured to execute the computer-executable instructions to learn the hidden matrix, wherein the at least one processor is configured to learn the hidden matrix by executing the computer-executable instructions to adjust one or more parameters of the hidden matrix until each respective backpropagation error satisfies a threshold value. 13. The system of claim 9 , wherein the at least one processor is further configured to learn the first vector representation of the first word, wherein the at least one processor is configured to learn the first vector representation by executing the computer-executable instructions to: generate an information matrix V having dimensions W×W, wherein W is a size of a vocabulary associated with the speech-to-text engine, and wherein each entry of the information matrix V is a conditional probability that a respective word in the vocabulary appears in a respective neighborhood of another respective word in the vocabulary; and apply a singular value decomposition technique to approximate V as UAX T , wherein UA is a W×N matrix approximation of V and X T

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

  • Recognition of textual entities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10832680B2 cover?
Systems, methods, and computer-readable media are described for automatically identifying potential errors in the text output of a domain-agnostic speech-to-text engine and generating text snippets that contain words representative of the potential errors and other words in the neighborhoods of such words for context. In this manner, a substantially reduced amount of text (i.e., the text snippe…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).