Generating structured text content using speech recognition models

US12315624B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12315624-B2
Application numberUS-202318234350-A
CountryUS
Kind codeB2
Filing dateAug 15, 2023
Priority dateNov 28, 2016
Publication dateMay 27, 2025
Grant dateMay 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method comprising: obtaining an input acoustic sequence that includes a digital representation of a conversation between a medical professional and a patient; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content, wherein the domain-specific predictive model comprises an automated billing predictive model that is configured to generate billing information based on the transcription of the input acoustic sequence, wherein the billing information comprises data indicating a cost associated with an interaction between the medical professional and the patient. 2. The method of claim 1 , wherein the automated billing predictive model is configured to generate bill information based on the transcription of the input acoustic sequence and one or more of (i) the input acoustic sequence, (ii) data associated with the input acoustic sequence, (iii) an acoustic sequence representing a physician dictation, or (iv) data representing a patient's medical record. 3. The method of claim 1 , wherein the billing information includes a billing code. 4. The method of claim 1 , wherein automated billing predictive model is configured to generate a bill from the billing information. 5. The method of claim 4 , wherein the automated billing predictive model is configured to populate a section of the bill with a summary of an interaction between a medical professional and a patient. 6. The method of claim 4 , wherein the generated bill comprises a formatted document that is organized into one or more sections or fields. 7. The method of claim 1 , wherein the speech recognition model comprises a domain-specific language model. 8. The method of claim 7 , wherein the domain-specific language model comprises a medical language model that has been trained using medical- specific training data. 9. The method of claim 1 , further comprising providing the input acoustic sequence as input to a speech prosody detection predictive model configured to process the input acoustic sequence to generate an indication of speech prosody that is derived from the input acoustic sequence. 10. The method of claim 9 , further comprising screening for diseases based on the generated indication of speech prosody. 11. The method of claim 9 , wherein the speech prosody detection predictive model is configured to provide, as output, a document listing results from the screening. 12. The method of claim 1 , wherein the domain-specific predictive model comprises a translation model that is configured to translate the transcription of the input acoustic sequence into a target language. 13. The method of claim 12 , wherein the translation model is further configured to translate the transcription of the input acoustic sequence into a target language using the input acoustic sequence. 14. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining an input acoustic sequence that includes a digital representation of a conversation between a medical professional and a patient; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content, wherein the domain-specific predictive model comprises an automated billing predictive model that is configured to generate billing information based on the transcription of the input acoustic sequence, wherein the billing information comprises data indicating a cost associated with an interaction between the medical professional and the patient. 15. The system of claim 14 , wherein the automated billing predictive model is configured to generate bill information based on the transcription of the input acoustic sequence and one or more of (i) the input acoustic sequence, (ii) data associated with the input acoustic sequence, (iii) an acoustic sequence representing a physician dictation, or (iv) data representing a patient's medical record. 16. The system of claim 14 , wherein the billing information includes a billing code. 17. The system of claim 14 , wherein the automated billing predictive model is configured to generate a bill from the billing information. 18. One or more non-transitory computer-readable storage media comprising instructions stored thereon that are executable by one or more processing devices and upon such execution cause the one or more processing devices to perform operations comprising: obtaining an input acoustic sequence that includes a digital representation of a conversation between a medical professional and a patient; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content, wherein the domain-specific predictive model comprises an automated billing predictive model that is configured to generate billing information based on the transcription of the input acoustic sequence, wherein the billing information comprises data indicating a cost associated with an interaction between the medical professional and the patient.

Assignees

Inventors

Classifications

  • using context dependencies, e.g. language models · CPC title

  • using artificial neural networks · CPC title

  • Hidden Markov Models [HMMs] · CPC title

  • Training · CPC title

  • Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12315624B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech reco…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/1822. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).