Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Streaming punctuation for long-form dictation

US12469490B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12469490-B2
Application number	US-202217732971-A
Country	US
Kind code	B2
Filing date	Apr 29, 2022
Priority date	Apr 29, 2022
Publication date	Nov 11, 2025
Grant date	Nov 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems generate segments of spoken language utterances based on different sets of segmentation boundaries. The systems are also configured to generate one or more formatted segments by assigning a punctuation tags at segmentation boundaries and to generate one or more final sentences from the one or more segments.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for generating transcription data for speech data comprising one or more spoken language utterances, the method comprising: generating one or more partial segments of the one or more spoken language utterances, each partial segment comprising one or more words recognized in the speech data; causing the one or more partial segments to be transmitted to and displayed at a remote client device in a first visual format corresponding to a first font style that indicates to a user that the transcription data being displayed on a user display are partial segments; generating one or more decoder segments based on the one or more partial segments and a first set of segmentation boundaries, each decoder segment comprising one or more-consecutive words recognized in the speech data; generating one or more formatted segments with punctuation based on the one or more decoder segments by assigning a punctuation tag selected from a plurality of punctuation tags at each segmentation boundary included in the first set of segmentation boundaries; causing the one or more formatted segments to be transmitted to and displayed at the remote client device in a second visual format corresponding to a second font style, different than the first font style, that indicates to the user that the transcription data being displayed on the user display are formatted segments; subsequent to the one or more formatted segments being transmitted to the remote client device, generating a second set of segmentation boundaries such that at least one segmentation boundary included in the second set of segmentation boundaries is determined to be a final segmentation boundary corresponding to an end of a sentence included in the one or more spoken language utterances; applying the second set of segmentation boundaries to the one or more decoder segments, the second set of segmentation boundaries being different than the first set of segmentation boundaries; in response to applying the second set of segmentation boundaries to the one or more decoder segments, generating one or more final sentences from the one or more decoder segments, wherein the one or more final sentences have different punctuation than the punctuation of the one or more formatted segments; and causing the one or more final sentences to be transmitted to and displayed at the remote client device in a third visual format corresponding to a third font style, different from the first font style and the second font style, that indicates to the user that the transcription data being displayed on the user display are final sentences. 2 . The method of claim 1 , wherein applying the second set of segmentation boundaries and generating one or more final sentences comprises at least one of: deleting segmentation boundaries included in the first set of segmentation boundaries that do not correspond to segmentation boundaries included in the second set of segmentation boundaries, retaining segmentation boundaries in the first set of segmentation boundaries that correspond to segmentation boundaries in the second set of segmentation boundaries, or adding segmentation boundaries included in the second set of segmentation boundaries that do not correspond to segmentation boundaries in the first set of segmentation boundaries. 3 . The method of claim 1 , wherein the one or more decoder segments collectively comprise one or more sentences, and wherein each final sentence included in the one or more final sentences corresponds to a complete sentence included in the one or more decoder segments. 4 . The method of claim 1 , wherein the second set of segmentation boundaries includes a set of punctuation tags such that each punctuation tag included in the set of punctuation tags corresponds to a segmentation boundary included in the second set of segmentation boundaries. 5 . The method of claim 1 , wherein the speech data is a previously recorded audio dataset. 6 . The method of claim 1 , wherein the speech data is a streaming audio dataset. 7 . The method of claim 6 , further comprising: prior to generating at least one decoder segment, generating one or more partial segments, each partial segment comprising one or more words which are recognized in the speech data, and which are appended to a previously generated partial segment, wherein the at least one decoder segment is generated by applying the first set of segmentation boundaries to the one or more partial segments. 8 . The method of claim 1 , wherein the first set of segmentation boundaries is generated and applied to the one or more decoder segments prior to assigning punctuation tags, and wherein the second set of segmentation boundaries is generated and applied to the one or more decoder segments simultaneously with applying punctuation tags to the one or more decoder segments. 9 . A computer system configured to generate and transmit transcription data for input speech recognized by an automatic speech recognitions system, said computer system comprising: one or more processors; and one or more computer-readable hardware storage devices that store instructions that are executable by the one or more processors to cause the computer system to at least: transmit to a remote client device (i) an initial partial segment comprising one or more initial words recognized from the input speech and (ii) one or more additional partial segments, each additional partial segment comprising the initial partial segment and one or more additional words which are recognized subsequent to the one or more initial words, and which are appended to a previously generated partial segment; generate and transmit to the remote client device for display in a first visual format corresponding to a first font style for rendering the initial partial segment and one or more additional partial segments; transmit to the remote client device a formatted segment with punctuation that has been generated by at least identifying and applying a first set of segmentation boundaries to the one or more additional partial segments and applying a punctuation mark to each segmentation boundary included in the first set of segmentation boundaries; generate and transmit to the remote client device for display in a second visual format corresponding to a second font style for rendering the formatted segments, the second font style being different than the first font style; transmit to the remote client device a final segment that has been generated by at least identifying and applying a second set of segmentation boundaries to the one or more additional partial segments, wherein the final segment comprises at least one complete sentence that includes one or more additional partial segments, and which overlaps at least a portion of the formatted segment, the final segment having second different punctuation than the punctuation of the formatted segment; generate and transmit to the remote client device for display in a third visual format corresponding to a third font style for rendering the final segment, the third font style being different than the first font style and the second font style. 10 . The computer system of claim 9 , wherein the instructions are further executable by the one or more processors to further configure the computer system to: transmit a final transcript comprising one or more final segments associated with the input speech, each final segment comprising a sentence associated with a particular spoken language utterance recognized in the input speech; and cause the remote client device to subsequently display the final transcript in a fourth visual format corresponding to a fourth font style for rendering

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L15/04Primary
Segmentation; Word boundary detection · CPC title
G06F40/58
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
G06F40/103
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
G06F40/232
Orthographic correction, e.g. spell checking or vowelisation · CPC title

Patent family

Related publications grouped by family.

View patent family 85641030

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12469490B2 cover?: Systems generate segments of spoken language utterances based on different sets of segmentation boundaries. The systems are also configured to generate one or more formatted segments by assigning a punctuation tags at segmentation boundaries and to generate one or more final sentences from the one or more segments.
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Quick lookup for speech translation

Interactive content output

Messaging feedback mechanism

Method and apparatus for selective visual formatting of an electronic document

Enhanced speech-to-speech translation system and methods for adding a new word

Frequently asked questions