Automated generation of fine-grained call reasons from customer service call transcripts
US-2022383867-A1 · Dec 1, 2022 · US
US12014144B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12014144-B2 |
| Application number | US-202117390573-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 30, 2021 |
| Priority date | Jul 30, 2021 |
| Publication date | Jun 18, 2024 |
| Grant date | Jun 18, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor may receive a call transcript including text and form a text string including at least a portion of the text. The processor may generate a situation description of the call transcript, which may comprise processing the text string using a transformer-based machine learning model. The processor may generate a trouble description of the call transcript, which may comprise creating a sentence embedding of the situation description, creating sentence embeddings for a plurality of utterances within the portion of the text, determining respective similarities between the sentence embedding of the situation description and each of the sentence embeddings for each respective one of the plurality of utterances, and selecting at least one of the plurality of utterances having at least one highest determined respective similarity as the trouble description. The processor may store a call summary comprising the situation description and the trouble description in a non-transitory memory.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by a processor, a call transcript including text; forming, by the processor, a text string including at least a portion of the text; generating, by the processor, a situation description of the call transcript, the generating of the situation description comprising processing the text string using a fine-tuned transformer-based machine learning model; generating, by the processor, a trouble description of the call transcript, the generating of the trouble description comprising: creating, using a sentence transformer algorithm, a sentence embedding of the situation description, creating, using the sentence transformer algorithm, sentence embeddings for a plurality of utterances within the portion of the text, determining respective similarities between the sentence embedding of the situation description and each of the sentence embeddings for each respective one of the plurality of utterances, and selecting at least one of the plurality of utterances having at least one highest determined respective similarity as the trouble description; and storing, by the processor, a call summary comprising the situation description and the trouble description in a non-transitory memory accessible to at least one call-handling system, wherein the fine-tuned transformer-based machine learning model is fine tuned by a process comprising: receiving the transformer-based machine learning model in a pretrained state wherein the transformer-based machine learning model has been previously trained with generic text, generating labeled call summaries by receiving unlabeled call summaries, selecting at least one of the unlabeled call summaries starting with a keyword or keyphrase, and selecting respective excerpts of each of the selected at least one of the unlabeled call summaries for inclusion within the labeled call summaries, and performing further training on the transformer-based machine learning model in the pretrained state using the labeled call summaries. 2. The method of claim 1 , further comprising: generating, by the processor, the call transcript, the generating of the call transcript comprising: selecting a subset of an audio recording of a call as a reduced portion of the audio recording, and automatically transcribing only the reduced portion of the audio recording. 3. The method of claim 2 , wherein the identifying comprises selecting a most recent portion of a predetermined length of the audio recording as the reduced portion or selecting an oldest portion of the predetermined length of the audio recording as the reduced portion. 4. The method of claim 1 , wherein the forming of the text string comprises adding metadata to the text string, the metadata being related to a call from which the call transcript was taken. 5. The method of claim 1 , wherein the determining of the respective similarities is performed using a pairwise cosine similarity function. 6. The method of claim 1 , further comprising: receiving a second call; determining, by the processor, that the second call is related to the call transcript; and providing, by the processor, the call summary in a user interface. 7. A system comprising: a processor; and a non-transitory memory in communication with the processor, the non-transitory memory storing instructions that, when executed by the processor, cause the processor to perform processing comprising: receiving a call transcript including text; forming a text string including at least a portion of the text; generating a situation description of the call transcript, the generating of the situation description comprising processing the text string using a fine-tuned transformer-based machine learning model; generating a trouble description of the call transcript, the generating of the trouble description comprising: creating, using a sentence transformer algorithm, a sentence embedding of the situation description, creating, using the sentence transformer algorithm, sentence embeddings for a plurality of utterances within the portion of the text, determining respective similarities between the sentence embedding of the situation description and each of the sentence embeddings for each respective one of the plurality of utterances, and selecting at least one of the plurality of utterances having at least one highest determined respective similarity as the trouble description; and storing a call summary comprising the situation description and the trouble description in the non-transitory memory, wherein the fine-tuned transformer-based machine learning model is fine tuned by a process comprising: receiving the transformer-based machine learning model in a pretrained state wherein the transformer-based machine learning model has been previously trained with generic text, generating labeled call summaries by receiving unlabeled call summaries, selecting at least one of the unlabeled call summaries starting with a keyword or keyphrase, and selecting respective excerpts of each of the selected at least one of the unlabeled call summaries for inclusion within the labeled call summaries, and performing further training on the transformer-based machine learning model in the pretrained state using the labeled call summaries. 8. The system of claim 7 , wherein the processing further comprises: generating the call transcript, the generating of the call transcript comprising: selecting a subset of an audio recording of a call as a reduced portion of the audio recording, and automatically transcribing only the reduced portion of the audio recording. 9. The system of claim 8 , wherein the identifying comprises selecting a most recent portion of a predetermined length of the audio recording as the reduced portion or selecting an oldest portion of the predetermined length of the audio recording as the reduced portion. 10. The system of claim 7 , wherein the forming of the text string comprises adding metadata to the text string, the metadata being related to a call from which the call transcript was taken. 11. The system of claim 7 , wherein the determining of the respective similarities is performed using a pairwise cosine similarity function. 12. The system of claim 7 , wherein the processing further comprises: receiving a second call; determining that the second call is related to the call transcript; and providing the call summary in a user interface. 13. A method comprising: receiving, at a processor, a transformer-based machine learning model in a pretrained state wherein the transformer-based machine learning model has been previously trained with generic text; generating, by the processor, labeled call summaries, the generating of the labeled call summaries comprising: receiving unlabeled call summaries, selecting at least one of the unlabeled call summaries starting with a keyword or keyphrase, and selecting respective excerpts of each of the selected at least one of the unlabeled call summaries for inclusion within the labeled call summaries; performing, by the processor, further training on the transformer-based machine learning model in the pretrained state using the labeled call summaries; receiving, at the processor, a call transcript including text; forming, by the processor, a text string including at least a portion of the text; generating, by the processor, a situation description of the call transcript, the generating of the situation description comprising processing the text string using the transformer-based machine learning model; and storing, by the processor, a call summary comprising the situation description i
Matching criteria, e.g. proximity measures · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
Centralised call answering arrangements requiring operator intervention {, e.g. call or contact centers for telemarketing} · CPC title
Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.