Immersive telepresence anywhere
US-9215406-B2 · Dec 15, 2015 · US
US12488799B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12488799-B2 |
| Application number | US-202318396138-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 26, 2023 |
| Priority date | Feb 28, 2014 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: obtaining a first voice signal; obtaining a first text string that is a transcription of the first voice signal, the first text string generated by speech recognition technology using the first voice signal; obtaining a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the first voice signal and the second text string generated by speech recognition technology using the second voice signal; obtaining a third text string that is a transcription of the first voice signal, the third text string generated using speech recognition technology; and generating an output text string from the first text string, the second text string, and the third text string, wherein the output text string includes one or more first words from one of the first text string, the second text string, and the third text string and one or more second words from another one of the first text string, the second text string, and the third text string such that the output text string, does not include an entirety of any of the first text string, the second text string, and the third text string. 2 . The method of claim 1 , further comprising providing the output text string as a transcription of the first voice signal. 3 . The method of claim 1 , wherein the speech recognition technology used to generate the first text string includes a first model trained to caption a plurality of different voice signals. 4 . The method of claim 3 wherein the speech recognition technology used to generate the second text string includes a second model trained to caption the voice of a captioning assistant performing the revoicing. 5 . The method of claim 1 , wherein the output text string includes one or more first words from the first text string and one or more second words from the second text string. 6 . The method of claim 1 , further comprising correcting at least one word in one or more of: the output text string, the first text string, and the second text string based on input obtained from a device associated with the revoicing. 7 . The method of claim 1 , wherein the first text string, the second text string, and the third text string are hypothesis generated by the speech recognition technology for the same portion of the first voice signal. 8 . At least one non-transitory computer-readable media configured to store one or more instructions that in response to being executed by at least one computing system cause performance of the method of claim 1 . 9 . The method of claim 1 for use with an assisted user (AU) device and a hearing user (HU) device that participate in a communication session and wherein the step of obtaining a first voice signal includes the HU device capturing an HU voice signal. 10 . The method of claim 9 wherein the step of obtaining a first voice signal includes the AU device receiving the HU voice signal from the HU device and providing the HU voice signal to a remote relay for captioning. 11 . The method of claim 10 wherein the AU device includes a display screen and wherein the method further includes transmitting the output text string to the AU device and presenting the output text string via the display screen. 12 . The method of claim 1 wherein the step of generating an output text string includes generating the first, second and third text strings and replacing portions of the first text string with at least portions of the second text string and portions of the third text string. 13 . The method of claim 1 wherein the speech recognition technology used to generate the second text string includes a second model adapted to a captioning assistant performing the revoicing. 14 . The method of claim 1 , wherein generating the output text string includes: selecting the one or more first words based on the first text string and the second text string both including the one or more first words and selecting the one or more second words from the second text string based on the first text string not including the one or more second words. 15 . A method comprising: obtaining a first voice signal; obtaining a first text string that is a transcription of the first voice signal, the first text string generated by a first automatic speech recognition engine that is trained to transcribe a plurality of voice signals; obtaining a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the first voice signal and the second text string generated by a second automatic speech recognition engine that is trained to transcribe a specific call assistant's voice signals; obtaining a third text string that is a transcription of the first voice signal, the third text string generated using the first automatic speech recognition engine; and generating an output text string from the first text string, the second text string, and the third text string, wherein the output text string includes at least one or more first words from one of the first text string, the second text string, and the third text string and at least one or more second words from another one of the first text string, the second text string, and the third text string such that the output text string does not include an entirety of any of the first text string, the second text string, and the third text string. 16 . The method of claim 15 , further comprising providing the output text string as a transcription of the first voice signal. 17 . The method of claim 15 , further comprising correcting at least one word in one or more of: the output text string, the first text string, and the second text string based on input obtained from a device associated with the revoicing. 18 . The method of claim 15 for use with an assisted user (AU) device and a hearing user (HU) device that participate in a communication session and wherein the step of obtaining a first voice signal includes the HU device capturing an HU voice signal. 19 . The method of claim 18 wherein the step of obtaining a first voice signal includes the AU device receiving the HU voice signal from the HU device and providing the HU voice signal to a remote relay for captioning. 20 . A system comprising: one or more processors; and one or more computer-readable media configured to store instructions that in response to being executed by the one or more processors cause the system to perform operations, the operations comprising: obtaining a first voice signal; obtaining a first text string that is a transcription of the first voice signal, the first text string generated by speech recognition technology using the first voice signal; obtaining a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the first voice signal and the second text string generated by speech recognition technology using the second voice signal; obtaining a third text string that is a transcription of the first voice signal, the third text string generated using the speech recognition technology; and generating an output text string from the first text string, the second text string, and the third text string, wherein the output text string includes one or more first words from one of the first text string, the second text string, and the third text string and one or more second words from another one of the first text string, the second text string, and the third text stri
Assessment or evaluation of speech recognition systems · CPC title
for measuring the quality of voice signals · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Language aspects · CPC title
Medium conversion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.