Quick lookup for speech translation
US-11900072-B1 · Feb 13, 2024 · US
US12469490B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12469490-B2 |
| Application number | US-202217732971-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 29, 2022 |
| Priority date | Apr 29, 2022 |
| Publication date | Nov 11, 2025 |
| Grant date | Nov 11, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems generate segments of spoken language utterances based on different sets of segmentation boundaries. The systems are also configured to generate one or more formatted segments by assigning a punctuation tags at segmentation boundaries and to generate one or more final sentences from the one or more segments.
Opening claim text (preview).
What is claimed is: 1 . A method for generating transcription data for speech data comprising one or more spoken language utterances, the method comprising: generating one or more partial segments of the one or more spoken language utterances, each partial segment comprising one or more words recognized in the speech data; causing the one or more partial segments to be transmitted to and displayed at a remote client device in a first visual format corresponding to a first font style that indicates to a user that the transcription data being displayed on a user display are partial segments; generating one or more decoder segments based on the one or more partial segments and a first set of segmentation boundaries, each decoder segment comprising one or more-consecutive words recognized in the speech data; generating one or more formatted segments with punctuation based on the one or more decoder segments by assigning a punctuation tag selected from a plurality of punctuation tags at each segmentation boundary included in the first set of segmentation boundaries; causing the one or more formatted segments to be transmitted to and displayed at the remote client device in a second visual format corresponding to a second font style, different than the first font style, that indicates to the user that the transcription data being displayed on the user display are formatted segments; subsequent to the one or more formatted segments being transmitted to the remote client device, generating a second set of segmentation boundaries such that at least one segmentation boundary included in the second set of segmentation boundaries is determined to be a final segmentation boundary corresponding to an end of a sentence included in the one or more spoken language utterances; applying the second set of segmentation boundaries to the one or more decoder segments, the second set of segmentation boundaries being different than the first set of segmentation boundaries; in response to applying the second set of segmentation boundaries to the one or more decoder segments, generating one or more final sentences from the one or more decoder segments, wherein the one or more final sentences have different punctuation than the punctuation of the one or more formatted segments; and causing the one or more final sentences to be transmitted to and displayed at the remote client device in a third visual format corresponding to a third font style, different from the first font style and the second font style, that indicates to the user that the transcription data being displayed on the user display are final sentences. 2 . The method of claim 1 , wherein applying the second set of segmentation boundaries and generating one or more final sentences comprises at least one of: deleting segmentation boundaries included in the first set of segmentation boundaries that do not correspond to segmentation boundaries included in the second set of segmentation boundaries, retaining segmentation boundaries in the first set of segmentation boundaries that correspond to segmentation boundaries in the second set of segmentation boundaries, or adding segmentation boundaries included in the second set of segmentation boundaries that do not correspond to segmentation boundaries in the first set of segmentation boundaries. 3 . The method of claim 1 , wherein the one or more decoder segments collectively comprise one or more sentences, and wherein each final sentence included in the one or more final sentences corresponds to a complete sentence included in the one or more decoder segments. 4 . The method of claim 1 , wherein the second set of segmentation boundaries includes a set of punctuation tags such that each punctuation tag included in the set of punctuation tags corresponds to a segmentation boundary included in the second set of segmentation boundaries. 5 . The method of claim 1 , wherein the speech data is a previously recorded audio dataset. 6 . The method of claim 1 , wherein the speech data is a streaming audio dataset. 7 . The method of claim 6 , further comprising: prior to generating at least one decoder segment, generating one or more partial segments, each partial segment comprising one or more words which are recognized in the speech data, and which are appended to a previously generated partial segment, wherein the at least one decoder segment is generated by applying the first set of segmentation boundaries to the one or more partial segments. 8 . The method of claim 1 , wherein the first set of segmentation boundaries is generated and applied to the one or more decoder segments prior to assigning punctuation tags, and wherein the second set of segmentation boundaries is generated and applied to the one or more decoder segments simultaneously with applying punctuation tags to the one or more decoder segments. 9 . A computer system configured to generate and transmit transcription data for input speech recognized by an automatic speech recognitions system, said computer system comprising: one or more processors; and one or more computer-readable hardware storage devices that store instructions that are executable by the one or more processors to cause the computer system to at least: transmit to a remote client device (i) an initial partial segment comprising one or more initial words recognized from the input speech and (ii) one or more additional partial segments, each additional partial segment comprising the initial partial segment and one or more additional words which are recognized subsequent to the one or more initial words, and which are appended to a previously generated partial segment; generate and transmit to the remote client device for display in a first visual format corresponding to a first font style for rendering the initial partial segment and one or more additional partial segments; transmit to the remote client device a formatted segment with punctuation that has been generated by at least identifying and applying a first set of segmentation boundaries to the one or more additional partial segments and applying a punctuation mark to each segmentation boundary included in the first set of segmentation boundaries; generate and transmit to the remote client device for display in a second visual format corresponding to a second font style for rendering the formatted segments, the second font style being different than the first font style; transmit to the remote client device a final segment that has been generated by at least identifying and applying a second set of segmentation boundaries to the one or more additional partial segments, wherein the final segment comprises at least one complete sentence that includes one or more additional partial segments, and which overlaps at least a portion of the formatted segment, the final segment having second different punctuation than the punctuation of the formatted segment; generate and transmit to the remote client device for display in a third visual format corresponding to a third font style for rendering the final segment, the third font style being different than the first font style and the second font style. 10 . The computer system of claim 9 , wherein the instructions are further executable by the one or more processors to further configure the computer system to: transmit a final transcript comprising one or more final segments associated with the input speech, each final segment comprising a sentence associated with a particular spoken language utterance recognized in the input speech; and cause the remote client device to subsequently display the final transcript in a fourth visual format corresponding to a fourth font style for rendering
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Segmentation; Word boundary detection · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
Orthographic correction, e.g. spell checking or vowelisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.