Machine learning system for digital assistants
US-12067006-B2 · Aug 20, 2024 · US
US12412562B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12412562-B2 |
| Application number | US-202217732944-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 29, 2022 |
| Priority date | Apr 29, 2022 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to a system, a method, and a product for using machine learning models to quantify and/or improve trust in conversations. The system includes a non-transitory memory; and a processor in communication with the non-transitory memory. The processor executes the instructions to cause the system to: obtain a set of vocal features and a set of text features for each sample in audio samples; obtain a trust score for each sample; perform a preprocess to obtain a set of input features for each sample; determine a type of machine-learning algorithm for the machine-learning network; tune a set of hyper parameters for the machine-learning network; generate a predicated trust score by the machine-learning network with the sets of input features for each sample; and train the machine-learning network based on the predicated trust score and the trust score for each sample to obtain the training result.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a non-transitory memory storing instructions executable to construct a machine learning network to quantify a trust score and to automate trust delivery with a digital avatar by generating a trustworthy voice for the digital avatar; and a processor in communication with the non-transitory memory, wherein, the processor executes the instructions to cause the system to: obtain a set of vocal features and a set of text features for each sample in audio samples; obtain a trust score for each sample; perform a preprocess on the set of vocal features and the set of text features to obtain a set of input features for each sample; determine a type of machine-learning algorithm for the machine-learning network based on a training result of the machine-learning network; tune a set of hyper parameters for the machine-learning network based on a cross validation according to the machine-learning network; generate a predicated trust score by the machine-learning network with the sets of input features for each sample; train the machine-learning network based on the predicated trust score and the trust score for each sample to obtain the training result; generate a set of trust components for a user by the machine-learning network; concatenate the set of trust components with a user profile of the user to obtain an expanded user profile; train a second machine-learning network by input the expanded user profile to recommend features for improving trust scores; and generate a list of recommended features for the user by the trained second machine learning network based on the expanded user profile, wherein generating the trustworthy voice for the digital avatar comprises: receiving an input text and a reference trustworthy tone sample; collecting a sequence of phonemes and a Mel spectrogram from the input text using a text to speech module; encoding the Mel spectrogram with an input encoder to generate an input embedding; encoding the reference trustworthy tone sample with a trust encoder and concatenating with the input embedding to generate an output; processing the output of the concatenation through a location-sensitive attention layer using cumulative attention weights to generate an encoded input sequence; predicting a Mel spectrogram with a decoder from the encoded input sequence; and generating the trustworthy voice for the digital avatar from the Mel spectrogram using a vocoder, wherein the digital avatar is configured to replace the user in a conversation. 2. The system according to claim 1 , wherein the instructions to cause the system to obtain the set of vocal features and the set of text features for each sample in the audio samples, comprises instructions to cause the system to: input each sample in the audio samples into a transcribe analytics module to obtain a transcribed result for each sample; input the transcribed result for each sample into a vocal feature module to obtain the set of vocal features for each sample; and input the transcribed result for each sample into a text feature module to obtain the set of text features for each sample. 3. The system according to claim 1 , wherein the instructions to cause the system to obtain the trust score for each sample, comprises the instructions to cause the system to: obtain a set of scores based on human annotation for each sample, each score in the set of scores corresponding to a variable in a pre-defined trust calculation function; and calculate the trust score for each sample based on the pre-defined trust calculation function with the set of scores. 4. The system according to claim 3 , wherein: the pre-defined trust calculation function comprises a plurality of variables comprising a credibility variable, a reliability variable, an intimacy variable, and a self-orientation variable. 5. The system according to claim 4 , wherein: the pre-defined trust calculation function comprises one of the following: a summation of the credibility variable, the reliability variable, and the intimacy variable, the summation being divided by the self-orientation variable; the summation of the credibility variable, the reliability variable, and the intimacy variable, the summation being subtracted by the self-orientation variable; or the summation of the credibility variable, the reliability variable, and the intimacy variable, the summation being subtracted by three times the self-orientation variable. 6. The system according to claim 1 , wherein the instructions to cause the system to perform the preprocess on the set of vocal features and the set of text features to obtain the set of input features for each sample, comprises the instructions to cause the system to: remove correlated features from the set of vocal features and the set of text features to obtain a reduced set of vocal features and a reduced set of text features; and encode categorical variables based on the reduced set of vocal features and the reduced set of text features to obtain the set of input features. 7. The system according to claim 1 , wherein the type of machine-learning algorithm comprises one of a gradient boosting, a random forest, a ridge with principal component analysis (PCA), or a linear regression. 8. The system according to claim 1 , wherein the set of hyper parameters comprises at least one of a learning rate, a minimum sample split, a minimum sample leaf, or a number of estimators. 9. The system according to claim 1 , wherein: the training result comprises a mean absolute error (MAE) between the predicated trust score and the trust score for each sample; and the instructions to cause the system to train the machine-learning network based on the predicated trust score and the trust score for each sample, comprises the instructions to cause the system to: train the machine-learning network based on the predicated trust score and the trust score for each sample to minimize the MAE based on a gradient boosting regressor. 10. The system according to claim 1 , wherein: the list of recommended features comprises features from a recommendation library; the second machine-learning network comprises a softmax module for training and a nearest neighbor index module to generate a recommendation probability for each feature in the recommendation library; and the list of recommended features comprises top N features with highest recommendation probability, N being a positive integer. 11. A method comprising: obtaining, by a computing device comprising a memory storing instructions executable to construct a machine-learning network to quantify a trust score and to automate trust delivery with a digital avatar by generating a trustworthy voice for the digital avatar and a processor in communication with the memory, a set of vocal features and a set of text features for each sample in audio samples; obtaining, by the computing device, a trust score for each sample; performing, by the computing device, a preprocess on the set of vocal features and the set of text features to obtain a set of input features for each sample; determining, by the computing device, a type of machine-learning algorithm for the machine-learning network based on a training result of the machine-learning network; tuning, by the computing device, a set of hyper parameters for the machine learning network based on a cross validation according to the machine-learning network; generating, by the computing device, a predicated trust score by the machine learning network with the sets of input features for each sample; training, by the computing device, the machine-learning network based on the predicated trus
using artificial neural networks · CPC title
driven by audio data · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.