Systems and methods to improve trust in conversations with deep learning models

US12236944B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12236944-B2
Application numberUS-202217826515-A
CountryUS
Kind codeB2
Filing dateMay 27, 2022
Priority dateMay 27, 2022
Publication dateFeb 25, 2025
Grant dateFeb 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to a system, a method, and a product for using deep learning models to quantify and/or improve trust in conversations. The system includes a non-transitory memory storing instructions executable to construct a deep-learning network to quantify trust scores; and a processor in communication with the non-transitory memory. The processor executes the instructions to cause the system to: obtain a trust score for each voice sample in a plurality of audio samples, generate a predicated trust score by the deep-learning network based on each voice sample in the plurality of audio samples, wherein the deep-learning network comprises a plurality of branches and an aggregation network configured to aggregate results from the plurality of branches, and train the deep-learning network based on the predicated trust score and the trust score for each voice sample to obtain a training result.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a non-transitory memory storing instructions executable to construct a deep learning network to quantify trust scores; and a processor circuitry in communication with the non-transitory memory, wherein, the processor circuitry executes the instructions to cause the system to: obtain a trust score for each voice sample in a plurality of audio samples, generate a predicated trust score by the deep-learning network based on each voice sample in the plurality of audio samples, wherein the predicted trust score is generated by an aggregation network in the deep-learning network based on a set of text-related features, a set of context related features, and a set of voice related features using a plurality of branches of the deep-learning network; wherein the deep-learning network comprises the plurality of branches and the aggregation network configured to aggregate results from the plurality of branches, and train the deep-learning network based on the predicated trust score and the trust score for each voice sample to obtain a training result. 2. The system according to claim 1 , wherein: the plurality of branches in the deep-learning network comprises at least three branches; and the instructions to cause the system to generate the predicated trust score by the deep-learning network based on each voice sample in the plurality of audio samples, comprises instructions to cause the system to: generate the set of text-related features by a first branch in the deep learning network based on each voice sample in the plurality of audio samples; generate the set of context-related features by a second branch in the deep learning network based on each voice sample in the plurality of audio samples; and generate the set of voice-related features by a third branch in the deep learning network based on each voice sample in the plurality of audio samples. 3. The system according to claim 2 , wherein the instructions to cause the system to generate the predicated trust score by the aggregation network in the deep-learning network based on the set of text-related features, the set of context-related features, and the set of voice-related features, comprises instructions to cause the system to: concatenate the set of text-related features, the set of context-related features, and the set of voice-related features to obtain a set of concatenated features; and generate the predicated trust score by a fully connected layer in the aggregation network based on the set of concatenated features. 4. The system according to claim 2 , wherein the instructions to cause the system to generate the predicated trust score by the aggregation network in the deep-learning network based on the set of text-related features, the set of context-related features, and the set of voice-related features, comprises instructions to cause the system to: concatenate the set of text-related features and the set of context-related features to obtain a set of concatenated features; perform a co-attention process for learning pairwise attentions between the set of concatenated features and the set of voice-related features to obtain a set of attended features by a co-attention layer in the aggregation network; and generate the predicated trust score by a fully connected layer in the aggregation network based on the set of attended features. 5. The system according to claim 2 , wherein the instructions to cause the system to generate the predicated trust score by the aggregation network in the deep-learning network based on the set of text-related features, the set of context-related features, and the set of voice-related features, comprises instructions to cause the system to: perform a first co-attention process on the set of context-related features and the set of voice-related features to obtain a set of first attended features by a first co attention layer in the aggregation network; perform a second co-attention process on the set of text-related features and the set of first attended features to obtain a set of second attended features by a second co attention layer in the aggregation network; and generate the predicated trust score by a fully connected layer in the aggregation network based on the set of second attended features. 6. The system according to claim 1 , wherein: the plurality of branches in the deep-learning network comprises: a first branch in the deep-learning network comprises a speech encoder, a connectionist temporal classification (CTC) decoder, a bidirectional long-short term memory (Bi-LSTM) layer, and a self-attention layer; a second branch in the deep-learning network comprises the speech encoder, a Bi-LSTM layer, and a self-attention layer; and a third branch in the deep-learning network comprises a Mel-frequency cepstral coefficients (MFCC) layer, a polling layer, a Bi-LSTM layer, and a self-attention layer. 7. The system according to claim 1 , wherein: the system comprises a machine-learning network; and the processor circuitry executes the instructions to further cause the system to: generate a set of trust components for a user by the deep-learning network, and generate a list of recommended features for the user by the machine learning network based on the set of trust components and a user profile of the user, wherein the list of recommended features comprises features from a recommendation library. 8. A method comprising: obtaining, by a computing device, a trust score for each voice sample in a plurality of audio samples, the computing device comprising a memory storing instructions executable to construct a deep-learning network to quantify trust scores and a processor circuitry in communication with the memory; generating, by the computing device, a predicated trust score by the deep learning network based on each voice sample in the plurality of audio samples, wherein the predicted trust score is generated by an aggregation network in the deep-learning network based on a set of text-related features, a set of context related features, and a set of voice related features generated by a plurality of branches of the deep-learning network, wherein the deep-learning network comprises the plurality of branches and the aggregation network configured to aggregate results from the plurality of branches; and training, by the computing device, the deep-learning network based on the predicated trust score and the trust score for each voice sample to obtain a training result. 9. The method according to claim 8 , wherein: the plurality of branches in the deep-learning network comprises three or more branches; and the generating the predicated trust score by the deep-learning network based on each voice sample in the plurality of audio samples comprises: generating the set of text-related features by a first branch in the deep learning network based on each voice sample in the plurality of audio samples; generating the set of context-related features by a second branch in the deep-learning network based on each voice sample in the plurality of audio samples; and generating the set of voice-related features by a third branch in the deep learning network based on each voice sample in the plurality of audio samples. 10. The method according to claim 9 , wherein the generating the predicated trust score by the aggregation network in the deep-learning network based on the set of text related features comprises: concatenating the set of text-related features, the set of context-related features, and the set of voice-related features to obtain a set of concatenated features; and generating the predicated trust score by a fully connected layer in the ag

Assignees

Inventors

Classifications

  • the extracted parameters being the cepstrum · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Training · CPC title

  • Parsing for meaning understanding · CPC title

  • G10L15/16Primary

    using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12236944B2 cover?
The present disclosure relates to a system, a method, and a product for using deep learning models to quantify and/or improve trust in conversations. The system includes a non-transitory memory storing instructions executable to construct a deep-learning network to quantify trust scores; and a processor in communication with the non-transitory memory. The processor executes the instructions to …
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).