Data-driven and rule-based speech recognition output enhancement

US11257484B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11257484-B2
Application numberUS-201916546715-A
CountryUS
Kind codeB2
Filing dateAug 21, 2019
Priority dateAug 21, 2019
Publication dateFeb 22, 2022
Grant dateFeb 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to some embodiments, a multi-layer speech recognition transcript post processing system may include a data-driven, statistical layer associated with a trained automatic speech recognition model that selects an initial transcript. A rule-based layer may receive the initial transcript from the data-driven, statistical layer and execute at least one pre-determined rule to generate a first modified transcript. A machine learning approach layer may receive the first modified transcript from the rule-based layer and perform a neural model inference to create a second modified transcript. A human editor layer may receive the second modified transcript from the machine learning approach layer along with an adjustment from at least one human editor. The adjustment may create, in some embodiments, a final transcript that may be used to fine-tune the data-driven, statistical layer.

First claim

Opening claim text (preview).

What is claimed is: 1. A multi-layer speech recognition transcript post processing system, comprising: a data-driven, statistical layer associated with a trained automatic speech recognition model that selects an initial transcript; a rule-based layer that receives the initial transcript from the data-driven, statistical layer and executes at least one pre-determined rule to generate a first modified transcript; a machine learning approach layer that receives the first modified transcript from the rule-based layer and performs a neural model inference to create a second modified transcript; and a human editor layer that receives the second modified transcript from the machine learning approach layer and uses an adjustment received from at least one human editor to output a final transcript, wherein the final transcript is used to fine-tune the data-driven, statistical layer. 2. The system of claim 1 , wherein the data-driven, statistical layer selects a best initial transcript from a set of N most probable speech recognition transcripts. 3. The system of claim 2 , wherein the selection of the best initial transcript is augmented by external attention comprising multiple text documents. 4. The system of claim 1 , wherein the pre-determined rule is associated with at least one of: (i) a white list, (ii) a black list, and (iii) a rule approach. 5. The system of claim 4 , wherein the pre-determined rule is automatically generated via offline data mining, data augmentation, and model training. 6. The system of claim 5 , wherein the offline data mining is associated with at least one of: (i) supervised classification, (ii) unsupervised classification, (iii) clustering techniques, (iv) n-gram classification, (v) replacement pairs based on context, (vi) a graph-based method to link spoken and written sentences based on semantic similarity, and (vii) search engine data. 7. The system of claim 1 , wherein the machine learning approach layer is associated with at least one of: (i) online candidate generation, (ii) online neural model inference encoding and decoding, and (iii) online ranking. 8. The system of claim 1 , wherein the human editor layer is associated with at least one of: (i) multiple-level human labeling, (ii) pairwise human labeling, and (iii) manual human transcription. 9. The system of claim 8 , wherein the adjustment is associated with at least one of: (i) syntactic correctness, (ii) semantic closeness, (iii) fluency, and (iv) style. 10. The system of claim 1 , wherein the human editor layer includes a text-to-speech conversion followed by a speech-to-text conversion. 11. The system of claim 1 , wherein the final transcript is transmitted to a downstream task associated with at least one of: (i) language understanding, (ii) machine translation, (iii) text summarization, (iv) text classification, (v) information extraction, and (vi) question answering. 12. A computer-implemented method for a multi-layer speech recognition transcript post processing system, comprising: selecting, by a data-driven, statistical layer associated with a trained automatic speech recognition model, an initial transcript; receiving, by a rule-based layer, the initial transcript and executing at least one pre-determined rule to generate a first modified transcript; receiving, by a machine learning approach layer, the first modified transcript from the rule-based layer and performing a neural model inference to create a second modified transcript; and receiving, at a human editor layer, an adjustment to the second modified transcript from at least one human editor that is used to output a final transcript that wherein the final transcript is used to fine-tune the data-driven, statistical layer. 13. The method of claim 12 , wherein the human editor layer is associated with at least one of: (i) multiple-level human labeling, (ii) pairwise human labeling, and (iii) manual human transcription. 14. The method of claim 13 , wherein the adjustment is associated with at least one of: (i) syntactic correctness, (ii) semantic closeness, (iii) fluency, and (iv) style. 15. The method of claim 12 , wherein the final transcript is transmitted to a downstream task associated with at least one of: (i) language understanding, (ii) machine translation, (iii) text summarization, (iv) text classification, (v) information extraction, and (vi) question answering. 16. A non-transient, computer-readable medium storing instructions to be executed by a processor to perform a method for a multi-layer speech recognition transcript post processing system, the method comprising: selecting, by a data-driven, statistical layer associated with a trained automatic speech recognition model, an initial transcript; receiving, by a rule-based layer, the initial transcript and executing at least one pre-determined rule to generate a first modified transcript; receiving, by a machine learning approach layer, the first modified transcript from the rule-based layer and performing a neural model inference to create a second modified transcript and receiving, at a human editor layer, an adjustment to the second modified transcript from at least one human editor that is used to output a final transcript, wherein the final transcript is used to fine-tune the data-driven, statistical layer. 17. The medium of claim 16 , wherein the data-driven, statistical layer selects a best initial transcript from a set of N most probable speech recognition transcripts. 18. The medium of claim 16 , wherein the pre-determined rule is associated with at least one of: (i) a white list, (ii) a black list, and (iii) a rule approach. 19. The medium of claim 16 , wherein the machine learning approach layer is associated with at least one of: (i) online candidate generation, (ii) online neural model inference encoding and decoding, and (iii) online ranking.

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • Learning methods · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Announcement of recognition results · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11257484B2 cover?
According to some embodiments, a multi-layer speech recognition transcript post processing system may include a data-driven, statistical layer associated with a trained automatic speech recognition model that selects an initial transcript. A rule-based layer may receive the initial transcript from the data-driven, statistical layer and execute at least one pre-determined rule to generate a firs…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).