Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment
US-9305552-B2 · Apr 5, 2016 · US
US10742805B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10742805-B2 |
| Application number | US-201715477958-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 3, 2017 |
| Priority date | Feb 28, 2014 |
| Publication date | Aug 11, 2020 |
| Grant date | Aug 11, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A captioning system comprising a processor and a memory having stored thereon software such that, when the software is executed by the one or more processors, the system generates text captions from speech data, including at least the following, receiving, from a hearing user's (HU's) device, an HU's speech data, generating, at the one or more hardware processors, first text captions from the speech data using a speech recognition algorithm, automatically determining, at the one or more processors, whether the generated first text captions meet a first accuracy threshold and when the first text captions meet the first accuracy threshold, sending the first text captions to an assisted user's (AU's) device for display, when the first text captions do not meet the first accuracy threshold, generating, at the one or more processors, second text captions from the speech data based on user input to the speech recognition algorithm from a call assistant and sending the second text captions to the AU's device for display.
Opening claim text (preview).
What is claimed is: 1. A captioning system comprising: one or more processors; and a memory having stored thereon software such that, when the software is executed by the one or more processors, the system generates text captions from speech data, including at least the following: receiving, from a hearing user's (HU's) device, an HU's voice signal; providing, at the one or more hardware processors, the HU's voice signal to an automated speech recognition (ASR) engine; generating, at the one or more hardware processors, first text captions from the HU's voice signal using the ASR engine; automatically determining, at the one or more processors, whether the generated first text captions meet a first accuracy threshold; and when the first text captions meet the first accuracy threshold, sending the first text captions to an assisted user's (AU's) communications device for display at a display device; and only when the first text captions fail to meet the first accuracy threshold, performing at least the following: presenting the HU's voice signal to a human call assistant; generating, at the one or more hardware processors, second text captions associated with the HU's voice signal based on user input from the call assistant; and sending the second text captions to the AU's communications device for display at the display device. 2. The captioning system as recited in claim 1 , wherein the HU's voice signal comprises a high-fidelity audio recording of the HU's voice, and wherein the high-fidelity audio recording is received over a high-fidelity network. 3. The captioning system as recited in claim 1 , wherein the HU's voice signal comprises speech data generated at the HU's communications device based on a high-fidelity audio recording of the HU's voice. 4. The captioning system as recited in claim 1 , wherein when the first text captions do not meet the first accuracy threshold, the captioning system also performs at least the following: sending the first text captions to the AU's communications device for display at the display device prior to sending the second text captions to the AU's communications device; and wherein sending the second text captions to the AU's communications device comprises sending one or more corrections to the first text captions. 5. The captioning system as recited in claim 1 , wherein sending the first text captions to the second communications device for display at the display device comprises sending at least one confidence factor associated with at least a portion of the first text caption along with the first text caption and displaying the at least one confidence factor on the display device in addition to presenting the first text caption. 6. The captioning system as recited in claim 1 , wherein the one or more processors further detects non-textual characteristics of the HU's voice signal and automatically controls the visual appearance of at least some of the text presented on the display to visually distinguish text associated with at least one non-textual characteristic from other displayed text. 7. The captioning system as recited in claim 1 wherein the step of automatically determining whether the generated first text captions meet a first accuracy threshold includes generating at least one confidence factor associated with the first text captions. 8. The captioning system as recited in claim 1 , wherein generating the second text captions from the voice signal based on the call assistant input to the speech recognition algorithm comprises receiving, from the call assistant, at least a first letter for each of a plurality of words contained in the HU's voice signal. 9. The captioning system as recited in claim 1 , wherein generating the second text captions based on user input comprises: presenting, to the call assistant, a list of alternative words corresponding to a portion of the HU's voice signal; receiving, from the call assistant, selection of one of the plurality of the alternative words on the list; and generating a text caption based on the selected one of the alternative words. 10. The captioning system as recited in claim 1 , wherein, only when the second text captions do not meet a second predetermined quality threshold, the at least one processor automatically initiates a call assistant service where the call assistant transcribes the HU's voice signal to text. 11. The captioning system as recited in claim 10 wherein the call assistant service also includes the call assistant correcting the text generated by the call assistant. 12. A relay system for providing text captioning to an assisted user (AU) using an AU device while the AU is communicating with a hearing user (HU) that uses an HU device, the relay system comprising: at least one processor programmed to perform the steps of: receiving a first text transcription from one of the AU device and the HU device associated with words spoken by the HU during the conversation with the AU; receiving the HU voice signal from one of the AU device and the HU device; providing the first text transcription and the HU voice signal to a call assistant (CA); receiving corrections from the call assistant to the first text transcription to generate a corrected second text transcription; and transmitting the second text transcription to the AU device for display. 13. The relay system of claim 12 wherein the first text transcription is received from the HU device. 14. The relay system of claim 13 wherein the at least one processor is further programmed to perform the step of transmitting the first text transcription to the AU device to be presented on a display screen at the AU device. 15. The relay system of claim 14 wherein the second text transcription includes text segments that include the corrections by the call assistant. 16. The relay system of claim 15 wherein the at least one processor is further programmed to transmit an instruction to the AU device to replace the corrected text with the second text transcriptions. 17. The relay of claim 13 wherein the at least one processor is further programmed to perform steps of receiving a second text transcript from the AU device associated with words spoken by the AU during the conversation with the HU and providing the second text transcript to the CA. 18. The relay system of claim 12 wherein the processor further performs the steps of, for at least a subset of the corrections made by the CA, transmitting an indication of the error to the one of the AU and HU device that transmitted the first text transcription to the relay for training of an automated voice to text software application. 19. The relay system of claim 12 wherein the HU voice signal includes a hi-fidelity voice signal. 20. The relay system of claim 12 wherein the first text transcription is received from the AU device. 21. The relay system of claim 12 wherein the at least one processor is located remotely from the AU device and HU device. 22. A captioning system, comprising: a processor; and one or more storage devices that store a program that, when executed by the processor, generates text captions from a hearing user's (HU's) speech data, including at least the following: receiving, from an HU's communications device, the speech data based on an HU's voice; generating, at the processor, preliminary automated text captions from the HU's speech data; automatically identifying at least one potential text caption error in the prelimin
for hearing-impaired users · CPC title
Cordless telephones (user interfaces specially adapted therefor H04M1/724) · CPC title
Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice (G10L15/14 takes precedence) · CPC title
Communication-related supplementary services, e.g. call-transfer or call-hold · CPC title
Message disposing or creating aspects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.