Stacked cross-modal matching
US-11093560-B2 · Aug 17, 2021 · US
US11915701B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11915701-B2 |
| Application number | US-202017070500-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 14, 2020 |
| Priority date | Jun 5, 2019 |
| Publication date | Feb 27, 2024 |
| Grant date | Feb 27, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Computer-readable media, systems and methods may improve automatic summarization of transcripts of financial earnings calls. For example, a system may generate segments, such as by disambiguating sentences, from a transcript to be summarized. The system may use an estimator that assesses whether or not the segment should be included in the summary. Different types of estimators may be used. For example, the estimator may be rule-based, trained based on machine-learning techniques, or trained on based on machine-learning with language modeling using natural language processing to fine-tune language models specific to financial earnings calls.
Opening claim text (preview).
What is claimed is: 1. A computer system to automatically summarize a transcript, the computer system comprising: a processor programmed to: access the transcript, wherein the transcript is transcribed from audio having spoken words or phrases relating to a subject matter domain; generate a plurality of segments based on content of the transcript, each segment comprising a respective portion of the content; for each segment of the plurality of segments: provide the segment as input to a machine-learning (ML) estimator specifically trained based on a set of predefined features and labeled data from a gold standard corpus comprising a plurality of gold standard summaries that was generated from other transcripts of corresponding audio having respective spoken words or phrases relating to the subject matter domain, wherein for each gold standard summary of the plurality of gold standard summaries, a given segment from the corresponding transcript is added to the gold standard summary when a predefined number or percentage of annotators agreed that the given segment should be added to the gold standard summary; generate, as an output of the ML estimator, a segment score for the segment; identify a subset of the plurality of segments based on the segment score of each segment; and generate a summary of the transcript based on the subset of the plurality of segments. 2. The computer system of claim 1 , wherein the ML-estimator comprises an ML regression estimator, and wherein the labeled data comprises a plurality of annotated segments from an annotation corpus of transcripts, each annotated segment being labeled to indicate a number of relevance votes assigned to the annotated segment from annotators, each relevance vote indicating that an annotator indicated that the segment is relevant to a corresponding transcript in the annotation corpus of transcripts. 3. The computer system of claim 2 , wherein the processor is further programmed to: generate, based on learned relationships between each of the set of predefined features and the number of relevance votes assigned to the annotated segment, a regressive decision tree used by the ML regression estimator to generate the segment score for each segment of the transcript. 4. The computer system of claim 3 , wherein to generate the segment score, the processor is further programmed to, for each segment of the plurality of segments: apply the regressive decision tree to each segment of the plurality of segments to generate the segment score from among a range of values greater than two. 5. The computer system of claim 1 , wherein the ML-estimator comprises an ML binary classification estimator, and wherein the labeled data comprises a plurality of annotated segments from an annotation corpus of transcripts, each annotated segment being labeled with a binary label to indicate whether or not the annotated segment was determined to be relevant to a corresponding transcript in the annotation corpus of transcripts. 6. The computer system of claim 5 , wherein the processor is further programmed to, for each feature of the set of predefined features: learn a respective weighted relationship between the feature and the binary label of each annotated segment. 7. The computer system of claim 6 , wherein to generate the segment score, the processor is further programmed to, for each segment of the plurality of segments: apply each respective weighted relationship to each segment of the plurality of segments to generate the segment score; and classify each segment as relevant or not relevant based on the segment score, wherein only segments classified as relevant are identified for the subset of the plurality of segments. 8. A computer system to automatically summarize a transcript, the computer system comprising: a processor programmed to: generate a plurality of segments based on content of the transcript, each segment comprising a respective portion of the content, wherein the transcript is transcribed from audio having spoken words or phrases relating to a subject matter domain; for each segment of the plurality of segments: provide the segment to a machine-learning natural language processing (ML-NLP) estimator, the ML-NLP estimator being specifically pre-trained on a general corpus to learn a general language model and then fine-tuned on a gold standard corpus comprising a plurality of gold standard summaries of transcripts that was generated from other transcripts of corresponding audio having respective spoken words or phrases relating to the subject matter domain, the gold standard corpus of transcripts comprising a plurality of annotated segments from an annotation corpus of transcripts, each annotated segment being labeled to indicate a number of relevance votes from annotators, each relevance vote indicating that an annotator indicated that the segment is relevant to a corresponding transcript in the annotation corpus of transcripts, wherein for each gold standard summary of the plurality of gold standard summaries, a given segment from the corresponding transcript is added to the gold standard summary when a predefined number or percentage of annotators agreed that the given segment should be added to the gold standard summary; generate, as an output of the ML-NLP estimator, a segment score; identify a subset of the plurality of segments based on the segment score of each segment; and generate a summary of the transcript based on the subset of the plurality of segments. 9. The computer system of claim 8 , wherein the processor is further programmed to: prior to fine-tuning, further pre-train the general language model based on a pre-training transcript corpus to learn a domain-specific language model that is transcript-specific. 10. The computer system of claim 8 , wherein the ML-NLP estimator is trained without human-engineered features. 11. The computer system of claim 8 , wherein the plurality of segments are each sentences, and wherein to generate the plurality of segments, the processor is further programmed to: disambiguate a plurality of sentences of the transcript. 12. The computer system of claim 8 , wherein to generate the cumulative segment score, the processor is further programmed to, for each segment of the plurality of segments: generate a probability that the segment should be included in the summary. 13. The computer system of claim 8 , wherein the general language model generates a classification token for each segment, and wherein to generate the cumulative segment score, the processor is further programmed to, for each segment: pass the classification token through a linear layer added to the general language model and a sigmoid function to generate the cumulative segment score as a probability. 14. The computer system of claim 13 , wherein the general language model comprises a Bidirectional Encoder Representations from Transformers model. 15. A computer system to automatically summarize a transcript, the computer system comprising: a processor programmed to: access a plurality of heuristic rules that encode scoring that adds or subtracts points to process the transcript, the plurality of heuristic rules comprising at least a first heuristic rule and a second heuristic rule, wherein the transcript is transcribed from audio having spoken words or phrases relating to a subject matter domain; generate a plurality of segments based on content of the transcript, each segment comprising a respective portion of the content; for each segment of the plurality of segments: (i) evaluate the first heuristic rule from among the
Feedforward networks · CPC title
Supervised learning · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Probabilistic or stochastic networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.