Using machine learning to flag gender biased words within free-form text, such as job descriptions

US10242260B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10242260-B1
Application numberUS-201715801982-A
CountryUS
Kind codeB1
Filing dateNov 2, 2017
Priority dateNov 2, 2017
Publication dateMar 26, 2019
Grant dateMar 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Under one aspect, first user input including free-form text is received in a first graphical user interface (GUI). A classification engine of the computer system incorporating a machine learning model classifies words of the free-form text into a male-biased class, a female-biased class, or a neutral class. At least one of the words is classified into the male-biased class or the female-biased class. At least one of the words classified into the male-biased class or the female-biased class is flagged in the first GUI. Second user input is received in the first GUI including at least one revision to at least one of the words of the free-form text classified into the male-biased class or the female-biased class responsive to the flagging. The revised free-form text is posted to a web site for display in a second GUI.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, in a first graphical user interface (GUI) forming part of an end-user layer of a computer system, first user input comprising free-form text; respectively classifying, by the computer system incorporating a machine learning model, words of the free-form text into a male-biased class, a female-biased class, or a neutral class, at least one of the words being classified into the male-biased class or the female-biased class; flagging, in the first GUI, at least one of the words classified into the male-biased class or the female-biased class; receiving, in the first GUI, second user input comprising at least one revision to at least one of the words of the free-form text classified into the male-biased class or the female-biased class responsive to the flagging; and posting, by the computer system, the revised free-form text to a web site for display in a second GUI forming part of the end-user layer of the computer system; wherein: the computer system is trained using a corpus of words; the words of the corpus are respectively labeled as male-biased, female-biased, or neutral using operations comprising: generating, from a plurality of job descriptions, a first subset of the job descriptions that comprises data regarding the respective genders of applicants to that job, the jobs being within job families; generating, for each job description of the first subset, a gender ratio based on the data regarding the respective genders of applicants to that job; selecting, from each job family, a job description that has the highest gender ratio and a job description that has the lowest gender ratio; generating a second subset of the job descriptions comprising the selected job descriptions of the job families that have the highest gender ratios; generating a third subset of the job descriptions comprising the selected job descriptions of the job families that have the lowest gender ratios; comparing frequencies of words of the job descriptions of the second and third subsets of the job descriptions; and based on the comparing: labeling at least one word of the job descriptions of the second subset of the job descriptions as being one of male-biased and female-biased, and labeling at least one word of the job descriptions of the third set of the job descriptions as being the other of male-biased and female-biased. 2. The method of claim 1 , further comprising: generating, by the computer system, a gender bias score for the free-form text based on the classification of the words of the free-form text; and displaying, in the first GUI, the gender bias score. 3. The method of claim 2 , further comprising: respectively classifying each word of the revised free-form text into the male-biased class, the female-biased class, or the neutral class, at least one of the words being classified into the male-biased class or the female-biased class; generating a revised gender bias score for the revised free-form text based on the classification of the words of the revised free-form text; and displaying, in the first GUI, the revised gender bias score. 4. The method of claim 1 , wherein the flagging comprises highlighting the word, changing a color of the word, changing an emphasis of the word, or changing a font of the word. 5. The method of claim 1 , wherein the flagging comprises displaying the word in an area of the first GUI that is separate from the free-form text. 6. A computer system comprising: at least one data processor; and memory storing instructions which, when executed by the at least one data processor, result in operations comprising: receiving, in a first graphical user interface (GUI) forming part of an end-user layer of the computer system, first user input comprising free-form text; respectively classifying, by the computer system incorporating a machine learning model, words of the free-form text into a male-biased class, a female-biased class, or a neutral class, at least one of the words being classified into the male-biased class or the female-biased class; flagging, in the first GUI, at least one of the words classified into the male-biased class or the female-biased class; receiving, in the first GUI, second user input comprising at least one revision to at least one of the words of the free-form text classified into the male-biased class or the female-biased class responsive to the flagging; and posting, by the computer system, the revised free-form text to a web site for display in a second GUI forming part of the end-user layer of the computer system; wherein: the computer system is trained using a corpus of words; the words of the corpus are respectively labeled as male-biased, female-biased, or neutral using operations comprising: generating, from a plurality of job descriptions, a first subset of the job descriptions that comprises data regarding the respective genders of applicants to that job, the jobs being within job families; generating, for each job description of the first subset, a gender ratio based on the data regarding the respective genders of applicants to that job; selecting, from each job family, a job description that has the highest gender ratio and a job description that has the lowest gender ratio; generating a second subset of the job descriptions comprising the selected job descriptions of the job families that have the highest gender ratios; generating a third subset of the job descriptions comprising the selected job descriptions of the job families that have the lowest gender ratios; comparing frequencies of words of the job descriptions of the second and third subsets of the job descriptions; and based on the comparing: labeling at least one word of the job descriptions of the second subset of the job descriptions as being one of male-biased and female-biased, and labeling at least one word of the job descriptions of the third set of the job descriptions as being the other of male-biased and female-biased. 7. The computer system of claim 6 , the memory further storing instructions which, when executed by the at least one data processor, result in operations comprising: generating, by the computer system, a gender bias score for the free-form text based on the classification of the words of the free-form text; and displaying, in the first GUI, the gender bias score. 8. The computer system of claim 7 , the memory further storing instructions which, when executed by the at least one data processor, result in operations comprising: respectively classifying each word of the revised free-form text into the male-biased class, the female-biased class, or the neutral class, at least one of the words being classified into the male-biased class or the female-biased class; generating a revised gender bias score for the revised free-form text based on the classification of the words of the revised free-form text; and displaying, in the first GUI, the revised gender bias score. 9. The computer system of claim 6 , wherein the flagging comprises highlighting the word, changing a color of the word, changing an emphasis of the word, or changing a font of the word. 10. The computer system of claim 6 , wherein the flagging comprises displaying the word in an area of the first GUI that is separate from the free-form text. 11. A non-transitory computer-readable medium storing instructions which, when executed by at least one data processor of a computer system, result in operations comprising: receiving, in a graphical user interface (GUI) forming part of an end-user layer of the computer system, first user input comprising free-form text; re

Assignees

Inventors

Classifications

  • characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title

  • Multiple classes · CPC title

  • Parsing · CPC title

  • Semantic analysis · CPC title

  • Form filling; Merging · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10242260B1 cover?
Under one aspect, first user input including free-form text is received in a first graphical user interface (GUI). A classification engine of the computer system incorporating a machine learning model classifies words of the free-form text into a male-biased class, a female-biased class, or a neutral class. At least one of the words is classified into the male-biased class or the female-biased …
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/353. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).