System to detect and reduce understanding bias in intelligent virtual assistants

US11854532B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11854532-B2
Application numberUS-202217567493-A
CountryUS
Kind codeB2
Filing dateJan 3, 2022
Priority dateOct 30, 2018
Publication dateDec 26, 2023
Grant dateDec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a system and method for detecting and addressing bias in training data prior to building language models based on the training data. Accordingly system and method, detect bias in training data for Intelligent Virtual Assistant (IVA) understanding and highlight any found. Suggestions for reducing or eliminating them may be provided This detection may be done for each model within the Natural Language Understanding (NLU) component. For example, the language model, as well as any sentiment or other metadata models used by the NLU, can introduce understanding bias. For each model deployed, training data is automatically analyzed for bias and corrections suggested.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer readable medium comprising instructions that, when executed by a processor of a processing system, cause the processing system to: digitally process training data for a language model from among multi-class training data to identify if the training data comprises a class population bias by comparing a distribution of each given value for a plurality of given class labels to a representation threshold value associated with the class label; adjust the training data to compensate for the bias identified upon determination that the training data comprises class population bias, based upon the given values by adding examples of an underrepresented class or removing examples of an overrepresented class; and randomly select a percentage of samples of training data of the overrepresented class for removal. 2. The non-transitory computer readable medium of claim 1 , wherein the instructions further cause the processing system to refer detected bias to a human reviewer for further determination of bias. 3. The non-transitory computer readable medium of claim 1 , wherein the determination includes one of deeming the class population bias being artificial requiring repair and deeming the class population bias being accurate allowing disregarding. 4. The non-transitory computer readable medium of claim 1 , wherein instructions further cause the processing system to digitally process the training data by scanning the training data with a bias scoring system. 5. The non-transitory computer readable medium of claim 1 , wherein instructions further cause the processing system to adjust the training data to compensate for the bias identified by deleting examples of class label combinations for entry values above a predetermined threshold until normalized entries of all class values are below the predetermined threshold. 6. The non-transitory computer readable medium of claim 1 , wherein the removal of the percentage of samples is automatic without human intervention. 7. The non-transitory computer readable medium of claim 1 , wherein instructions further cause the processing system to report identified bias to a user before compensating for the identified bias. 8. A method of automatically detecting bias in training data for training a language model, comprising: digitally processing training data for a language model from among multi-class training data to identify if the training data comprises a class population bias by comparing a distribution of each given value for a plurality of given class labels to a representation threshold value associated with the class label; adjusting the training data to compensate for the bias identified upon determining that the training data comprises class population bias, based upon the given values by adding examples of an underrepresented class or removing examples of an overrepresented class; and randomly selecting a percentage of samples of training data of the overrepresented class for removal. 9. The method of claim 8 , further comprising referring detected bias to a human reviewer for further determination of bias. 10. The method of claim 8 , wherein the determination includes one of deeming the class population bias being artificial requiring repair and deeming the class population bias being accurate allowing disregarding. 11. The method of claim 8 , wherein digitally processing the training data comprises scanning the training data with a bias scoring system. 12. The method of claim 8 , further comprising adjusting the training data to compensate for the bias identified by deleting examples of class label combinations for entry values above the predetermined threshold until the normalized entries of all class values are below the predetermined threshold. 13. The method of claim 8 , wherein the removing of the percentage of samples is automatic without human intervention. 14. The method of claim 8 , the method further comprising reporting identified bias to a user before compensating for the identified bias. 15. A system for automatically detecting bias in training data for training a language model, comprising: a memory comprising computer readable instructions; and a processor configured to execute the computer readable instructions, that cause the system to: digitally process training data for a language model from among multi-class training data to identify if the training data comprises a class population bias by comparing a distribution of each given value for a plurality of given class labels to a representation threshold value associated with the class label; adjust the training data to compensate for the bias identified upon determination that the training data comprises class population bias, based upon the given values by adding examples of an underrepresented class or removing examples of an overrepresented class; and randomly select a percentage of samples of training data of the overrepresented class for removal. 16. The system of claim 15 , wherein the instructions further cause the system to refer detected bias to a human reviewer for further determination of bias.. 17. The system of claim 15 , wherein the determination includes one of deeming the class population bias being artificial requiring repair and deeming the class population bias being accurate allowing disregarding. 18. The system of claim 15 , wherein the instructions further cause the system to digitally process the training data by scanning the training data with a bias scoring system. 19. The system of claim 15 , wherein the instructions further cause the system to to adjust the training data to compensate for the bias identified by deleting examples of class label combinations for entry values above the predetermined threshold until the normalized entries of all class values are below the predetermined threshold. 20. The system of claim 15 , wherein the removal of the percentage of samples is automatic without human intervention. 21. The system of claim 15 , wherein the instructions further cause the system to report identified bias to a user before compensating for the identified bias.

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • G10L15/063Primary

    Training · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

  • using natural language modelling · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854532B2 cover?
Disclosed is a system and method for detecting and addressing bias in training data prior to building language models based on the training data. Accordingly system and method, detect bias in training data for Intelligent Virtual Assistant (IVA) understanding and highlight any found. Suggestions for reducing or eliminating them may be provided This detection may be done for each model within th…
Who is the assignee on this patent?
Verint Americas Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).