What technology area does this patent fall under?

Primary CPC classification G10L15/1822. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Generating structured text content using speech recognition models

US12315624B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12315624-B2
Application number	US-202318234350-A
Country	US
Kind code	B2
Filing date	Aug 15, 2023
Priority date	Nov 28, 2016
Publication date	May 27, 2025
Grant date	May 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method comprising: obtaining an input acoustic sequence that includes a digital representation of a conversation between a medical professional and a patient; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content, wherein the domain-specific predictive model comprises an automated billing predictive model that is configured to generate billing information based on the transcription of the input acoustic sequence, wherein the billing information comprises data indicating a cost associated with an interaction between the medical professional and the patient. 2. The method of claim 1 , wherein the automated billing predictive model is configured to generate bill information based on the transcription of the input acoustic sequence and one or more of (i) the input acoustic sequence, (ii) data associated with the input acoustic sequence, (iii) an acoustic sequence representing a physician dictation, or (iv) data representing a patient's medical record. 3. The method of claim 1 , wherein the billing information includes a billing code. 4. The method of claim 1 , wherein automated billing predictive model is configured to generate a bill from the billing information. 5. The method of claim 4 , wherein the automated billing predictive model is configured to populate a section of the bill with a summary of an interaction between a medical professional and a patient. 6. The method of claim 4 , wherein the generated bill comprises a formatted document that is organized into one or more sections or fields. 7. The method of claim 1 , wherein the speech recognition model comprises a domain-specific language model. 8. The method of claim 7 , wherein the domain-specific language model comprises a medical language model that has been trained using medical- specific training data. 9. The method of claim 1 , further comprising providing the input acoustic sequence as input to a speech prosody detection predictive model configured to process the input acoustic sequence to generate an indication of speech prosody that is derived from the input acoustic sequence. 10. The method of claim 9 , further comprising screening for diseases based on the generated indication of speech prosody. 11. The method of claim 9 , wherein the speech prosody detection predictive model is configured to provide, as output, a document listing results from the screening. 12. The method of claim 1 , wherein the domain-specific predictive model comprises a translation model that is configured to translate the transcription of the input acoustic sequence into a target language. 13. The method of claim 12 , wherein the translation model is further configured to translate the transcription of the input acoustic sequence into a target language using the input acoustic sequence. 14. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining an input acoustic sequence that includes a digital representation of a conversation between a medical professional and a patient; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content, wherein the domain-specific predictive model comprises an automated billing predictive model that is configured to generate billing information based on the transcription of the input acoustic sequence, wherein the billing information comprises data indicating a cost associated with an interaction between the medical professional and the patient. 15. The system of claim 14 , wherein the automated billing predictive model is configured to generate bill information based on the transcription of the input acoustic sequence and one or more of (i) the input acoustic sequence, (ii) data associated with the input acoustic sequence, (iii) an acoustic sequence representing a physician dictation, or (iv) data representing a patient's medical record. 16. The system of claim 14 , wherein the billing information includes a billing code. 17. The system of claim 14 , wherein the automated billing predictive model is configured to generate a bill from the billing information. 18. One or more non-transitory computer-readable storage media comprising instructions stored thereon that are executable by one or more processing devices and upon such execution cause the one or more processing devices to perform operations comprising: obtaining an input acoustic sequence that includes a digital representation of a conversation between a medical professional and a patient; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content, wherein the domain-specific predictive model comprises an automated billing predictive model that is configured to generate billing information based on the transcription of the input acoustic sequence, wherein the billing information comprises data indicating a cost associated with an interaction between the medical professional and the patient.

Assignees

Google Llc

Inventors

Classifications

G10L15/183
using context dependencies, e.g. language models · CPC title
G10L15/16
using artificial neural networks · CPC title
G10L15/142
Hidden Markov Models [HMMs] · CPC title
G10L15/063
Training · CPC title
G06F40/58
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

Patent family

Related publications grouped by family.

View patent family 60629842

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12315624B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech reco…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/1822. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).