Systems and methods for processing speech dialogues

US11862143B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11862143-B2
Application numberUS-202016996961-A
CountryUS
Kind codeB2
Filing dateAug 19, 2020
Priority dateJul 27, 2020
Publication dateJan 2, 2024
Grant dateJan 2, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure is related to systems and methods for processing speech dialogue. The method includes obtaining target speech dialogue data. The method includes obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively. The method includes determining a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model. The method includes determining a summary of the target speech dialogue data by inputting the representation vector into a classification model.

First claim

Opening claim text (preview).

We claim: 1. A method for processing speech dialogue implemented on a computing device having at least one processor and at least one storage device, the method comprising: obtaining target speech dialogue data; obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively; determining a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model; determining a summary of the target speech dialogue data by inputting the representation vector into a classification model; and generating an output utilizing the determined summary of the target speech dialogue data. 2. The method of claim 1 , further comprising: obtaining a sentence text of the summary of the target speech dialogue data; and performing a grammatical correction operation on the sentence text. 3. The method of claim 1 , wherein the text embedding model includes at least one of: a word embedding sub-model configured to determine a word vector representation sequence of the target speech dialogue data; a position embedding sub-model configured to determine a position vector representation sequence of the target speech dialogue data; and a paragraph embedding sub-model configured to determine a paragraph vector representation sequence of the target speech dialogue data. 4. The method of claim 1 , wherein the determining the representation vector corresponding to the target speech dialogue data includes: obtaining at least one of a dialect vector representation sequence, an emotion vector representation sequence, or a background text vector representation sequence corresponding to the target speech dialogue data, wherein the dialect vector representation sequence is determined by performing a vector transformation on the target speech dialogue data based on a dialect embedding model; the emotion vector representation sequence is determined by performing a vector transformation on the target speech dialogue data based on an emotion embedding model; and the background text vector representation sequence is determined by performing a vector transformation on a background text of the target speech dialogue data based on a background text embedding model; and determining the representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, the role vector representation sequence, and at least one of the dialect vector representation sequence, the emotion vector representation sequence, or the background text vector representation sequence into the trained speech dialogue coding model. 5. The method of claim 1 , wherein the speech dialogue coding model is determined according to a training process, the training process including: obtaining sample speech dialogue data; obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the sample speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively; and obtaining a pre-trained speech dialogue coding model by pre-training the speech dialogue coding model in a self-supervised learning manner based on the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence. 6. The method of claim 5 , wherein the training process further includes: jointly pre-training the speech dialogue coding model and at least one of the text embedding model, the phonetic symbol embedding model, or the role embedding model. 7. The method of claim 5 , wherein the pre-training the speech dialogue coding model in the self-supervised learning manner includes: designating at least portion of at least one of the text vector representation sequence, the phonetic symbol vector representation sequence, or the role vector representation sequence as an annotation, the annotation including at least portion of elements in the role vector representation sequence. 8. The method of claim 7 , wherein the annotation further includes one or more keywords in the text vector representation sequence. 9. The method of claim 7 , wherein the annotation further includes an order of sentences embodied in the text vector representation sequence. 10. The method of claim 5 , wherein the obtaining the pre-trained speech dialogue coding model by pre-training the speech dialogue coding model includes: obtaining at least one of a dialect vector representation sequence, an emotion vector representation sequence, or a background text vector representation sequence corresponding to the sample speech dialogue data, wherein the dialect vector representation sequence is determined by performing a vector transformation on the sample speech dialogue data based on a dialect embedding model; the emotion vector representation sequence is determined by performing a vector transformation on the sample speech dialogue data based on an emotion embedding model; and the background text vector representation sequence is determined by performing a vector transformation on a background text of the sample speech dialogue data based on a background text embedding model; and obtaining the pre-trained speech dialogue coding model by pre-training the speech dialogue coding model in the self-supervised learning manner based on the text vector representation sequence, the phonetic symbol vector representation sequence, the role vector representation sequence, and at least one of the dialect vector representation sequence, the emotion vector representation sequence, or the background text vector representation sequence. 11. A system for processing speech dialogue, comprising: at least one storage device including a set of instructions; and at least one processor in communication with the at least one storage device, wherein when executing the set of instructions, the at least one processor is directed to cause the system to: obtain target speech dialogue data; obtain a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively; determine a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model; determine a summary of the target speech dialogue data by inputting the representation vector into a classification model; and generate an output utilizing the determined summary of the target speech dialogue data. 12. The system of claim 11 , wherein the at least one processor is further directed to cause the system to: obtain a sentence text of the summary of the target speech dialogue data; and perform a grammatical correction operation on the sentence text. 13. The system of claim 11 , wherein the text embedding model includ

Assignees

Inventors

Classifications

  • G10L15/063Primary

    Training · CPC title

  • Classification techniques · CPC title

  • Grammatical analysis; Style critique · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11862143B2 cover?
The present disclosure is related to systems and methods for processing speech dialogue. The method includes obtaining target speech dialogue data. The method includes obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text…
Who is the assignee on this patent?
Beijing Didi Infinity Technology & Dev Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 02 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).