What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automatic summarization of financial earnings call transcripts

US11915701B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11915701-B2
Application number	US-202017070500-A
Country	US
Kind code	B2
Filing date	Oct 14, 2020
Priority date	Jun 5, 2019
Publication date	Feb 27, 2024
Grant date	Feb 27, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computer-readable media, systems and methods may improve automatic summarization of transcripts of financial earnings calls. For example, a system may generate segments, such as by disambiguating sentences, from a transcript to be summarized. The system may use an estimator that assesses whether or not the segment should be included in the summary. Different types of estimators may be used. For example, the estimator may be rule-based, trained based on machine-learning techniques, or trained on based on machine-learning with language modeling using natural language processing to fine-tune language models specific to financial earnings calls.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system to automatically summarize a transcript, the computer system comprising: a processor programmed to: access the transcript, wherein the transcript is transcribed from audio having spoken words or phrases relating to a subject matter domain; generate a plurality of segments based on content of the transcript, each segment comprising a respective portion of the content; for each segment of the plurality of segments: provide the segment as input to a machine-learning (ML) estimator specifically trained based on a set of predefined features and labeled data from a gold standard corpus comprising a plurality of gold standard summaries that was generated from other transcripts of corresponding audio having respective spoken words or phrases relating to the subject matter domain, wherein for each gold standard summary of the plurality of gold standard summaries, a given segment from the corresponding transcript is added to the gold standard summary when a predefined number or percentage of annotators agreed that the given segment should be added to the gold standard summary; generate, as an output of the ML estimator, a segment score for the segment; identify a subset of the plurality of segments based on the segment score of each segment; and generate a summary of the transcript based on the subset of the plurality of segments. 2. The computer system of claim 1 , wherein the ML-estimator comprises an ML regression estimator, and wherein the labeled data comprises a plurality of annotated segments from an annotation corpus of transcripts, each annotated segment being labeled to indicate a number of relevance votes assigned to the annotated segment from annotators, each relevance vote indicating that an annotator indicated that the segment is relevant to a corresponding transcript in the annotation corpus of transcripts. 3. The computer system of claim 2 , wherein the processor is further programmed to: generate, based on learned relationships between each of the set of predefined features and the number of relevance votes assigned to the annotated segment, a regressive decision tree used by the ML regression estimator to generate the segment score for each segment of the transcript. 4. The computer system of claim 3 , wherein to generate the segment score, the processor is further programmed to, for each segment of the plurality of segments: apply the regressive decision tree to each segment of the plurality of segments to generate the segment score from among a range of values greater than two. 5. The computer system of claim 1 , wherein the ML-estimator comprises an ML binary classification estimator, and wherein the labeled data comprises a plurality of annotated segments from an annotation corpus of transcripts, each annotated segment being labeled with a binary label to indicate whether or not the annotated segment was determined to be relevant to a corresponding transcript in the annotation corpus of transcripts. 6. The computer system of claim 5 , wherein the processor is further programmed to, for each feature of the set of predefined features: learn a respective weighted relationship between the feature and the binary label of each annotated segment. 7. The computer system of claim 6 , wherein to generate the segment score, the processor is further programmed to, for each segment of the plurality of segments: apply each respective weighted relationship to each segment of the plurality of segments to generate the segment score; and classify each segment as relevant or not relevant based on the segment score, wherein only segments classified as relevant are identified for the subset of the plurality of segments. 8. A computer system to automatically summarize a transcript, the computer system comprising: a processor programmed to: generate a plurality of segments based on content of the transcript, each segment comprising a respective portion of the content, wherein the transcript is transcribed from audio having spoken words or phrases relating to a subject matter domain; for each segment of the plurality of segments: provide the segment to a machine-learning natural language processing (ML-NLP) estimator, the ML-NLP estimator being specifically pre-trained on a general corpus to learn a general language model and then fine-tuned on a gold standard corpus comprising a plurality of gold standard summaries of transcripts that was generated from other transcripts of corresponding audio having respective spoken words or phrases relating to the subject matter domain, the gold standard corpus of transcripts comprising a plurality of annotated segments from an annotation corpus of transcripts, each annotated segment being labeled to indicate a number of relevance votes from annotators, each relevance vote indicating that an annotator indicated that the segment is relevant to a corresponding transcript in the annotation corpus of transcripts, wherein for each gold standard summary of the plurality of gold standard summaries, a given segment from the corresponding transcript is added to the gold standard summary when a predefined number or percentage of annotators agreed that the given segment should be added to the gold standard summary; generate, as an output of the ML-NLP estimator, a segment score; identify a subset of the plurality of segments based on the segment score of each segment; and generate a summary of the transcript based on the subset of the plurality of segments. 9. The computer system of claim 8 , wherein the processor is further programmed to: prior to fine-tuning, further pre-train the general language model based on a pre-training transcript corpus to learn a domain-specific language model that is transcript-specific. 10. The computer system of claim 8 , wherein the ML-NLP estimator is trained without human-engineered features. 11. The computer system of claim 8 , wherein the plurality of segments are each sentences, and wherein to generate the plurality of segments, the processor is further programmed to: disambiguate a plurality of sentences of the transcript. 12. The computer system of claim 8 , wherein to generate the cumulative segment score, the processor is further programmed to, for each segment of the plurality of segments: generate a probability that the segment should be included in the summary. 13. The computer system of claim 8 , wherein the general language model generates a classification token for each segment, and wherein to generate the cumulative segment score, the processor is further programmed to, for each segment: pass the classification token through a linear layer added to the general language model and a sigmoid function to generate the cumulative segment score as a probability. 14. The computer system of claim 13 , wherein the general language model comprises a Bidirectional Encoder Representations from Transformers model. 15. A computer system to automatically summarize a transcript, the computer system comprising: a processor programmed to: access a plurality of heuristic rules that encode scoring that adds or subtracts points to process the transcript, the plurality of heuristic rules comprising at least a first heuristic rule and a second heuristic rule, wherein the transcript is transcribed from audio having spoken words or phrases relating to a subject matter domain; generate a plurality of segments based on content of the transcript, each segment comprising a respective portion of the content; for each segment of the plurality of segments: (i) evaluate the first heuristic rule from among the

Assignees

Refinitiv Us Organization Llc

Inventors

Classifications

G06N3/0499
Feedforward networks · CPC title
G06N3/09
Supervised learning · CPC title
G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title
G06F40/20
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

View patent family 74498593

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11915701B2 cover?: Computer-readable media, systems and methods may improve automatic summarization of transcripts of financial earnings calls. For example, a system may generate segments, such as by disambiguating sentences, from a transcript to be summarized. The system may use an estimator that assesses whether or not the segment should be included in the summary. Different types of estimators may be used. For…
Who is the assignee on this patent?: Refinitiv Us Organization Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).