Contextualized speech to text conversion

US11711469B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11711469-B2
Application numberUS-202117316615-A
CountryUS
Kind codeB2
Filing dateMay 10, 2021
Priority dateMay 10, 2021
Publication dateJul 25, 2023
Grant dateJul 25, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, computer program products, and systems are presented. The methods, computer program products, and systems can include, for instance: determining, in performance of an interactive voice response (IVR) session, prompting data for presenting to a user, and storing text based data defining the prompting data into a data repository; presenting the prompting data to the user; receiving return voice string data from the user in response to the prompting data; generating a plurality of candidate text strings associated to the return voice string of the user; examining the text based data defining the prompting data; augmenting the plurality of candidate text strings in dependence on a result of the examining to provide a plurality of augmented candidate text strings associated to the return voice string data; and evaluating respective ones of the plurality of augmented candidate text strings associated to the return voice string data; and selecting one of the augmented candidate text strings as a returned transcription associated to the return voice string data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method comprising: determining, in performance of an interactive voice response (IVR) session, prompting data for presenting to a user, and storing text based data defining the prompting data into a data repository; presenting the prompting data to the user; receiving return voice string data from the user in response to the prompting data; generating a plurality of candidate text strings associated to the return voice string data of the user; examining the text based data defining the prompting data; augmenting the plurality of candidate text strings in dependence on a result of the examining to provide a plurality of augmented candidate text strings associated to the return voice string data, wherein the augmenting includes transforming the text based data defining the prompting data to provide transformed prompting data; evaluating respective ones of the plurality of augmented candidate text strings associated to the return voice string data; and selecting one of the augmented candidate text strings as a returned transcription associated to the return voice string data. 2. The computer implemented method of claim 1 , wherein the evaluating includes querying a predictive model provided by a general language model with the respective ones of the plurality of augmented candidate text strings associated to the return voice string data, and examining returned confidence level parameter values resulting from the querying. 3. The computer implemented method of claim 1 , wherein the method includes, prior to the examining and the augmenting, ascertaining, using the plurality of candidate text strings, performance of a predictive language model, the predictive language model for use in performing the evaluating, determining based on the ascertaining that the predictive language model will not perform satisfactorily for the return voice string data from the user, and performing the examining and the augmenting selectively in response to the determining. 4. The computer implemented method of claim 1 , wherein the augmenting includes identifying a certain text string within the text based data defining the prompting data that is referenced within a mapping data structure stored in a data repository, wherein the mapping data structure maps text strings to transformed text strings, wherein the augmenting includes using a certain transformed text string associated to the certain text string within the mapping data structure. 5. The computer implemented method of claim 1 , wherein the examining includes subjecting the text based data defining the prompting data to natural language processing for assignment of part of speech tags to the text based data, wherein the augmenting includes identifying a certain text string within the text based data defining the prompting data that matches a template text string within a mapping data structure stored in a data repository, wherein the template text string stored in data repository includes one or more term expressed in wildcard format as a part of speech. 6. The computer implemented method of claim 1 , wherein the presenting includes using text to speech conversion to present the prompting data to the user in synthesized voice. 7. The computer implemented method of claim 1 , wherein the examining includes subjecting the text based data defining the prompting data to natural language processing for assignment of part of speech tags to respective terms of the text based data, and using the part of speech tags to transform text based data defining the prompting data. 8. The computer implemented method of claim 1 , wherein the examining includes subjecting the text based data defining the prompting data to part of speech tagging to provide part of speech tags associated to the text based data defining the prompting data, and wherein the augmenting includes transforming the text based data defining the prompting data using a tag of the part of speech tags to provide a transformed text string, and prepending the transformed text string to a text string of plurality candidate text strings. 9. The computer implemented method of claim 1 , wherein the presenting includes using text to speech conversion to present the prompting data to the user in synthesized voice, wherein the evaluating includes querying a predictive model provided by a general language model with the respective ones of the plurality of augmented candidate text strings associated to the return voice string data. 10. The computer implemented method of claim 1 , wherein the presenting includes using text to speech conversion to present the prompting data to the user in synthesized voice, wherein the generating the plurality of candidate text strings associated to the return voice string of the user includes querying a predictive acoustic model, wherein the evaluating includes querying a predictive model provided by a general language model with the respective ones of the plurality of augmented candidate text strings associated to the return voice string data, and examining returned confidence level parameter values resulting from the querying. 11. The computer implemented method of claim 1 , wherein the augmenting includes identifying a certain text string within the text based data defining the prompting data. 12. The computer implemented method of claim 1 , wherein the augmenting includes identifying a certain text string within the text based data defining the prompting data, and transforming the certain text string into a transformed text string using data repository stored data. 13. The computer implemented method of claim 1 , wherein the augmenting includes prepending the transformed prompting data to respective candidate text strings of the plurality of candidate text strings. 14. The computer implemented method of claim 1 , wherein the augmenting includes adapting respective candidate text strings of the plurality of candidate text strings using the transformed prompting data. 15. The computer implemented method of claim 1 , wherein the examining includes subjecting the text based data defining the prompting data to natural language processing for assignment of part of speech tags to the text based data. 16. The computer implemented method of claim 1 , wherein the examining includes subjecting the text based data defining the prompting data to part of speech tagging to provide part of speech tags associated to the text based data defining the prompting data. 17. A system comprising: a memory; at least one processor in communication with the memory; program instructions executable by one or more processor via the memory to perform a method comprising: determining, in performance of an interactive voice response (IVR) session, prompting data for presenting to a user, and storing text based data defining the prompting data into a data repository; presenting the prompting data to the user; receiving return voice string data from the user in response to the prompting data; generating a plurality of candidate text strings associated to the return voice string data of the user; examining the text based data defining the prompting data; augmenting the plurality of candidate text strings in dependence on a result of the examining to provide a plurality of augmented candidate text strings associated to the return voice string data, wherein the augmenting includes transforming the text based data defining the prompting data to provide transformed prompting data; evaluating respective ones of the plurality of augmented candidate text strings a

Assignees

Inventors

Classifications

  • H04M3/4936Primary

    Speech interaction details (speech recognition per se G10L15/00) · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • in combination with interactive voice response systems or voice portals, e.g. as front-ends · CPC title

  • Execution procedure of a spoken command · CPC title

  • using speech recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11711469B2 cover?
Methods, computer program products, and systems are presented. The methods, computer program products, and systems can include, for instance: determining, in performance of an interactive voice response (IVR) session, prompting data for presenting to a user, and storing text based data defining the prompting data into a data repository; presenting the prompting data to the user; receiving retur…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04M3/4936. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jul 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).