Server side hotwording
US-2024412734-A1 · Dec 12, 2024 · US
US11990132B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11990132-B2 |
| Application number | US-202318176180-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 28, 2023 |
| Priority date | May 29, 2020 |
| Publication date | May 21, 2024 |
| Grant date | May 21, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A transcription of audio speech included in electronic content associated with a meeting is created by an ASR model trained on speech-to-text data. The transcription is post-processed by modifying text included in the transcription, for example, by modifying punctuation, grammar, or formatting introduced by the ASR model and by changing or omitting one or more words that were included in both the audio speech and the transcription. After the transcription is post-processed, output based on the post-processed transcription is generated in the form of a meeting summary and/or template.
Opening claim text (preview).
What is claimed is: 1. A computing system for automatically processing electronic content and for generating corresponding output, the computing system comprises: one or more processors; and one or more computer readable hardware storage devices having stored computer-executable instructions that are executable by the one or more processors to cause the computing system to at least: identify electronic content associated with a meeting, the electronic content including audio speech; create a transcription of the audio speech with an automatic speech recognition (ASR) model trained on speech-to-text training data, the transcription being a text-based transcription; use a machine learning model trained on post-processing training data for modifying text included in the transcription to generate a post-processed transcription which includes text modified from the transcription; and generate output based on the post-processed transcription, the output comprising a template that is generated at least in part from the post-processed transcription, the template comprising a meeting template that is automatically selected from a plurality of different templates based on a meeting type that is determined from analyzing the post-processed transcription and which is automatically populated with content from the post-processed transcription. 2. The computing system of claim 1 , wherein the post-processing includes modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model. 3. The computing system of claim 1 , wherein the post-processing includes omitting one or more words in the transcription. 4. The computing system of claim 1 , wherein the post-processing includes modifying text to improve a readability of the transcription. 5. The computing system of claim 4 , wherein the readability of the transcription is improved by converting a spoken language style of the audio speech to a written language style. 6. The computing system of claim 4 , wherein the readability of the transcription is improved by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription. 7. The computing system of claim 1 , wherein the transcription includes a plurality of links corresponding to tags associated with the electronic content and wherein the computer-executable instructions are further executable by the one or more processors to cause the computing system to generate the tags from the electronic content. 8. The computing system of claim 7 , wherein the plurality of links point to data related to the electronic content, but wherein the data related to the electronic content is external to the electronic content. 9. The computing system of claim 1 , wherein one or more fields of the template are automatically populated with content identified in one or more tags that are generated by a speech tag machine learning model that processes at least one of the audio speech, the transcription, or the post-processed transcription. 10. A computer-implemented method for automatically processing electronic content and for generating corresponding output, the method comprising: identify electronic content associated with a meeting, the electronic content including audio speech; create a transcription of the audio speech with an automatic speech recognition (ASR) model trained on speech-to-text training data, the transcription being a text-based transcription; use a machine learning model trained on post-processing training data for modifying text included in the transcription to generate a post-processed transcription which includes text modified from the transcription; and generate output based on the post-processed transcription, the output comprising a template that is generated at least in part from the post-processed transcription, the template comprising a meeting template that is automatically selected from a plurality of different templates based on a meeting type that is determined from analyzing the post-processed transcription and which is automatically populated with content from the post-processed transcription. 11. The method of claim 10 , wherein the post-processing includes modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model. 12. The method of claim 10 , wherein the post-processing includes changing one or more words in the transcription. 13. The method of claim 10 , wherein the post-processing includes omitting one or more words in the transcription. 14. The method of claim 10 , wherein the post-processing includes modifying text to improve a readability of the transcription. 15. The method of claim 14 , wherein the readability of the transcription is improved by converting a spoken language style of the audio speech to a written language style. 16. The method of claim 14 , wherein the readability of the transcription is improved by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription. 17. The method of claim 10 , wherein the transcription includes a plurality of links corresponding to tags associated with the electronic content and wherein the computer-executable instructions are further executable by the one or more processors to cause the computing system to generate the tags from the electronic content. 18. The method of claim 17 , wherein the plurality of links point to data related to the electronic content, but wherein the data related to the electronic content is external to the electronic content. 19. The method of claim 10 , wherein one or more fields of the template are automatically populated with content identified in one or more tags that are generated by a speech tag machine learning model that processes at least one of the audio speech, the transcription, or the post-processed transcription. 20. One or more hardware storage devices comprising computer-executable instructions that are executable by one or more processers of a computing system to cause the computing system to: identify electronic content associated with a meeting, the electronic content including audio speech; create a transcription of the audio speech with an automatic speech recognition (ASR) model trained on speech-to-text training data, the transcription being a text-based transcription; using a machine learning model trained on post-processing training data for modifying text included in the transcription; and generate output based from the post-processed transcription, the output comprising at least one of: (i) a meeting summary generated by a machine learning summarization model that summarizes content of the post-processed transcription by at least breaking the post-processed transcription i
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Staff planning in a project environment · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
using metadata automatically derived from the content · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.