Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automated meeting minutes generator

US11615799B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11615799-B2
Application number	US-202016887806-A
Country	US
Kind code	B2
Filing date	May 29, 2020
Priority date	May 29, 2020
Publication date	Mar 28, 2023
Grant date	Mar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A transcription of audio speech included in electronic content associated with a meeting is created by an ASR model trained on speech-to-text data. The transcription is post-processed by modifying text included in the transcription, for example, by modifying punctuation, grammar, or formatting introduced by the ASR model and by changing or omitting one or more words that were included in both the audio speech and the transcription. After the transcription is post-processed, output based on the post-processed transcription is generated in the form of a meeting summary and/or template.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system for automatically processing electronic content and for generating corresponding output, the computing system comprises: one or more processors; and one or more computer readable hardware storage devices having stored computer-executable instructions that are executable by the one or more processors to cause the computing system to at least: identify electronic content associated with a meeting, the electronic content including audio speech; create a transcription of the audio speech with an automatic speech recognition (ASR) model trained on speech-to-text training data, the transcription being a text-based transcription; perform post-processing on the transcription, generating a post-processed transcription, by using a machine learning model trained on post-processing training data for modifying text included in the transcription, wherein the post-processing includes both (1) modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model and (2) changing or omitting one or more words in the transcription which were included in both the audio speech and the transcription; and generate output based from the post-processed transcription, the output comprising a template that is generated at least in part from the post-processed transcription, the template comprising a meeting template that is automatically selected from a plurality of different templates based on a meeting type that is determined from analyzing the post-processed transcript and which is automatically populated with content from the post-processed transcript. 2. The computing system of claim 1 , wherein the transcription includes a plurality of links corresponding to tags associated with the electronic content and wherein the computer-executable instructions are further executable by the one or more processors to cause the computing system to generate the tags from the electronic content. 3. The computing system of claim 2 , wherein the plurality of links point to data related to the electronic content, but wherein the data related to the electronic content is external to the electronic content. 4. The computing system of claim 2 , wherein the computing system uses a machine learning speech tagging model to generate the tags, the machine learning speech tagging model generating at least one tag in response to identifying a spoken starting keyword and a spoken ending keyword in the audio speech, and wherein the generating of the at least one tag includes classifying content of the audio speech as a particular note type, selected from a plurality of note types, based on the content which occurs between the starting keyword and the ending. 5. The computing system of claim 4 , wherein the at least one tag comprises an action item note type that identifies one or more tasks and one or more entities associated with the one or more tasks. 6. The computing system of claim 5 , wherein the at least one tag further includes links to one or more of an assigning party, a responsible party, a deadline, a content, or a priority level associated with the task. 7. The computing system of claim 1 , wherein the readability of the transcription is modified when generating the post-processed transcription by converting a spoken language style of the audio speech to a written language style. 8. The computing system of claim 7 , wherein the readability of the transcription is modified by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription. 9. The computing system of claim 1 , wherein the post-processing training data is created by: identifying ungrammatical sentences comprising text; generating text-to-speech (US) data from the text; transcribing the US data using an automatic speech recognition model; and pairing the transcribed US data with the corresponding ungrammatical sentences. 10. The computing system of claim 1 , wherein the output comprises the meeting summary which is automatically generated based on abstractive summarization of the post-processed transcription. 11. The computing system of claim 10 , wherein the abstractive summarization is performed by a summarization model configured as a multi-level encoding-decoding neural network with attention. 12. The computing system of claim 11 , wherein the summarization model is further configured to summarize the post-processed transcription based on both hierarchical attention at a turn-level and at a word-level. 13. The computing system of claim 12 , wherein each turn is analyzed in context with a determined relationship between one or more of the turns of the plurality of turns. 14. The computing system of claim 1 , wherein selection of one or more input fields of the meeting template that are automatically populated is based on user input. 15. The computing system of claim 1 , wherein one or more fields of the meeting template are automatically populated with content identified in one or more tags that were generated by a speech tag machine learning model that processed at least one of the audio speech or the transcript or the post-processed transcript. 16. The computing system of claim 1 , wherein the output generated from the post-processed transcript is further post-processed remove errors and modify text to improve the readability and accuracy of the output. 17. A computer-implemented method for automatically processing electronic content and for generating corresponding output, the method comprising: identifying electronic content associated with a meeting, the electronic content including audio speech; creating a transcription of the audio speech with an automatic speech recognition (ASR) model trained on speech-to-text training data, the transcription being a text-based transcription; performing post-processing on the transcription, generating a post-processed transcription, by using a machine learning model trained on post-processing training data for modifying text included in the transcription, wherein the post-processing includes both (1) modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model and (2) changing or omitting one or more words in the transcription which were included in both the audio speech and the transcription; and generating output based from the post-processed transcription, the output comprising at least one of: (i) a meeting summary generated by a machine learning summarization model that summarizes content of the post-processed transcription by at least breaking the post-processed transcription into a plurality of turns corresponding to a plurality of speakers, each turn being based on a role vector of a speaker corresponding to the turn, the role vector being (i) configured as a fixed-length vector trained to represent a role of the speaker and (ii) appended to an embedding of the turn, and wherein the summarization model selectively applies rules during analysis of each turn, with each of the rules being selectively applied based on one or more corresponding roles from which the role vector is determined, or (ii) a template that is generated at least in pa

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/09
Supervised learning · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G06Q10/109
Time management, e.g. calendars, reminders, meetings or time accounting · CPC title
G06F40/186
Templates · CPC title

Patent family

Related publications grouped by family.

View patent family 75588298

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11615799B2 cover?: A transcription of audio speech included in electronic content associated with a meeting is created by an ASR model trained on speech-to-text data. The transcription is post-processed by modifying text included in the transcription, for example, by modifying punctuation, grammar, or formatting introduced by the ASR model and by changing or omitting one or more words that were included in both t…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).