Systems and methods for refining pre-trained language models with improved gender fairness

US12073178B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12073178-B2
Application numberUS-202217586504-A
CountryUS
Kind codeB2
Filing dateJan 27, 2022
Priority dateOct 5, 2021
Publication dateAug 27, 2024
Grant dateAug 27, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are directed to a training framework for reducing gender bias in a pre-trained language model. To reduce gender bias a gender neutral dataset is generated. Next, parameters of the pre-trained language model are frozen and do not change during a subsequent training phase. As all the pre-trained parameters are frozen, forgetting of information from the original training data is minimized. New parameters are added to the language model. The new parameters may be associated with gender related terms, such as profession names. In a subsequent training phase the new parameters of the language model are trained using a gender neutral dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training a neural network language model to a generate gender-neutral output, the method comprising: obtaining a gender-neutral dataset for training the neural network language model that has been previously trained to generate output for a text input; freezing parameters of the neural network language model, wherein values of the parameters were determined during the previous training of the neural network language model and wherein the values of the parameters do not change after the parameters are frozen; adding new parameters to the neural network language model, the new parameters associated with gender related terms; and training the neural network language model using the gender-neutral dataset, wherein the training modifies values of the new parameters and not the values of the frozen parameters, and wherein the trained neural network language model generates the gender-neutral output for the text input. 2. The method of claim 1 , wherein obtaining the gender-neutral dataset further comprises: identifying a set of sentences containing at least one gender-related term from a training dataset; and swapping, in the identified set of sentences, the at least one gender-related term with an opposite gender term. 3. The method of claim 1 , wherein obtaining the gender-neutral dataset further comprises: identifying a set of sentences containing at least one gender-related name term from a training dataset; and replacing, in the identified set of sentences, the at least one gender-related name term with an anonymized term. 4. The method of claim 1 , wherein the neural network language model includes at least one self-attention layer and at least one feed-forward neural network, and wherein the frozen parameters are in the at least one self-attention layer and the at least one feed-forward neural network. 5. The method of claim 1 , wherein the new parameters are added to an embedding layer of the neural network language model. 6. The method of claim 1 , wherein the values of the new parameters are randomly initialized. 7. The method of claim 1 , further comprising: generating an embedding matrix, the embedding matrix including a portion of the frozen parameters and the new parameters; and updating the new parameters and not the frozen parameters in the embedding matrix during training the neural network language model using the gender-neutral dataset. 8. The method of claim 1 , further comprising: receiving the text input at the trained neural network language model; and generating, using the new parameters and frozen parameters of the neural network language model, the gender-neutral output. 9. A system for training a neural network language model to generate gender neutral output, the system comprising: a memory configured to store the neural network language model; and a processor coupled to the memory and configured to execute instructions for training the neural network language model, the instructions comprising: obtaining a gender-neutral dataset for training the neural network language model that has been previously trained to generate output for a text input; freezing parameters of the neural network language model, wherein values of the parameters were determined during previous training of the neural network language model and wherein the values of the parameters do not change after the parameters are frozen; adding new parameters to the neural network language model; and training the neural network language model using the gender-neutral dataset, wherein the training modifies values of the new parameters and not the values of the frozen parameters, and wherein the trained neural network language model generates the gender-neutral output for the text input. 10. The system of claim 9 , wherein to obtain the gender-neutral dataset, the instructions further comprise: identifying a set of sentences containing at least one gender-related term from a training dataset; and swapping, in the identified set of sentences, the gender-related term with an opposite gender term. 11. The system of claim 9 , wherein to obtain the gender-neutral dataset, the instructions further comprise: identifying a set of sentences containing at least one gender-related name term from a training dataset; and replacing, in the identified set of sentences, the at least one gender-related name term with an anonymized term. 12. The system of claim 9 , wherein the neural network language model includes at least one self-attention layer and at least one feed-forward neural network and wherein the frozen parameters are in the at least one self-attention layer and the at least one feed-forward neural network. 13. The system of claim 9 , wherein the new parameters are added to an embedding layer of the neural network language model. 14. The system of claim 9 , wherein the values of the new parameters are randomly initialized. 15. The system of claim 9 , wherein the instructions further comprise: generating an embedding matrix, the embedding matrix including a portion of the frozen parameters and the new parameters; and updating the new parameters and not the frozen parameters in the embedding matrix during training the neural network language model using the gender-neutral dataset. 16. The system of claim 9 , wherein the instructions further comprise: receiving the text input at the trained neural network language model; and generating, using the new parameters and frozen parameters of the trained neural network language model, the gender-neutral output. 17. A non-transitory computer readable medium having instructions stored thereon, that when executed by a processor cause the processor to perform operations, the operations comprising: obtaining a gender-neutral dataset for training a neural network language model that has been previously trained to generate output for a text input; freezing parameters of the neural network language model, wherein values of the parameters were determined during previous training of the neural network language model and wherein the values of the parameters do not change after the parameters are frozen; adding new parameters to the neural network language model; and training the neural network language model using the gender-neutral dataset, wherein the training modifies values of the new parameters and not the values of the frozen parameters, and wherein the trained neural network language model generates a gender-neutral output for the text input. 18. The non-transitory computer readable medium of claim 17 , wherein the neural network language model includes at least one self-attention layer and at least one feed-forward neural network, and wherein the frozen parameters are in the at least one self-attention layer or the at least one feed-forward neural network. 19. The non-transitory computer readable medium of claim 17 , wherein the new parameters are added to an embedding layer of the neural network language model and the values of the new parameters are randomly initialized. 20. The non-transitory computer readable medium of claim 19 , wherein the operations further comprise: generating an embedding matrix, the embedding matrix including a portion of the frozen parameters and the new parameters; and updating the new parameters and not the portion of the frozen parameters in the embedding matrix when training the neural network language model with the gender-neutral dataset.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Natural language generation · CPC title

  • Discourse or dialogue representation · CPC title

  • Learning methods · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12073178B2 cover?
Embodiments are directed to a training framework for reducing gender bias in a pre-trained language model. To reduce gender bias a gender neutral dataset is generated. Next, parameters of the pre-trained language model are frozen and do not change during a subsequent training phase. As all the pre-trained parameters are frozen, forgetting of information from the original training data is minimi…
Who is the assignee on this patent?
Salesforce Com Inc, Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).