Techniques for improving standardized data accuracy

US12229669B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12229669-B2
Application numberUS-202117340607-A
CountryUS
Kind codeB2
Filing dateJun 7, 2021
Priority dateJun 7, 2021
Publication dateFeb 18, 2025
Grant dateFeb 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word embedding space is translated, using a neural network, to a vector representation of the job title in the entity embedding space. Finally, a nearest neighbor search is performed to identify an entity embedding associated with an entity or entry in the title taxonomy that has a vector representation that is closest in distance to the vector output by the neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: training a multilayer perceptron neural network using training data, wherein an instance of training data in the set of training data includes a first vector representation of a job title in a multilingual word embedding space and a second vector representation of an entity in an entity embedding space, wherein the job title corresponding with the first vector representation is a job title that corresponds with a job title associated with the entity represented by the second vector representation in the entity embedding space; providing as input to an input layer of the multilayer perceptron neural network a vector representation of a job title of an online job posting, the vector representation of the job title of the online job posting derived by mapping one or more words identified in raw text of the job title to one or more pre-trained multilingual word embeddings in the multilingual word embedding space, wherein a pre-trained multilingual word embedding comprises a vector representation of the job title expressed in multiple languages; with the multilayer perceptron neural network, processing the input to translate the vector representation of the job title of the online job posting in the multilingual word embedding space to a vector representation of the job title of the online job posting in the entity embedding space associated with a multilingual title taxonomy; performing a nearest neighbor search to identify one or more vector representations corresponding with one or more entity embeddings in the entity embedding space, each of the one or more entity embeddings associated with a job title from the multilingual title taxonomy; and storing with the online job posting at least one of the one or more vector representations corresponding with the entity embedding in the entity embedding space. 2. The computer-implemented method of claim 1 , further comprising: prior to providing the vector representation of the job title of the online job posting to the input layer of the multilayer perceptron neural network, processing the raw text of the job title of the online job posting with an entity tagger model to identify the one or more words in the raw text of the job title as words that are representative of the job title. 3. The computer-implemented method of claim 2 , further comprising: responsive to the entity tagger model identifying two or more words in the raw text of the job title as words that are representative of the job title, calculating the vector representation of the job title of the online job posting by deriving an average of the vector representations for each of the two or more words in the raw text of the job title that have been identified as representative of the job title. 4. The computer-implemented method of claim 1 , wherein performing the nearest neighbor search to identify one or more vector representations corresponding with one or more entity embeddings in the entity embedding space comprises performing an approximate nearest neighbor search. 5. The computer-implemented method of claim 1 , further comprising: subsequent to performing the nearest neighbor search to identify the plurality of vector representations: for each entity embedding corresponding with a vector representation in the plurality of vector representations, ranking the job title from the title taxonomy associated with the entity embedding; and storing with the online job posting each vector representation of the plurality of vector representations corresponding with the entity embeddings in the entity embedding space with a ranking score for the job title associated with vector representation. 6. The computer-implemented method of claim 5 , wherein ranking each job title from the title taxonomy comprises: deriving the ranking score for each job title from the multilingual title taxonomy by providing as input to a machine learned model input features associated with the job title of the job posting and information associated with an entity in the multilingual title taxonomy corresponding with the entity embedding identified via the nearest neighbor search. 7. The computer-implemented method of claim 1 , further comprising: obtaining a set of training data for training the multilayer perceptron neural network. 8. A system comprising: a processor; and a memory storage device storing instructions thereon, which, when executed by the processor, cause the system to: train a multilayer perceptron neural network using training data, wherein an instance of training data in the set of training data includes a first vector representation of a job title in a multilingual word embedding space and a second vector representation of an entity in an entity embedding space, wherein the job title corresponding with the first vector representation is a job title that corresponds with a job title associated with the entity represented by the second vector representation in the entity embedding space; provide as input to an input layer of the multilayer perceptron neural network a vector representation of a job title of an online job posting, the vector representation of the job title of the online job posting derived by mapping one or more words identified in raw text of the job title to one or more pre-trained multilingual word embeddings in the multilingual word embedding space, wherein a pre-trained multilingual word embedding comprises a vector representation of the job title expressed in multiple languages; with the multilayer perceptron neural network, process the input to translate the vector representation of the job title of the online job posting in the multilingual word embedding space to a vector representation of the job title of the online job posting in the entity embedding space associated with a multilingual title taxonomy; perform a nearest neighbor search to identify one or more vector representations corresponding with one or more entity embeddings in the entity embedding space, each of the one or more entity embeddings associated with a job title from the multilingual title taxonomy; and store with the online job posting at least one of the one or more vector representations corresponding with the entity embedding in the entity embedding space. 9. The system of claim 8 , wherein the instructions, when executed by the processor, further cause the system to: process the raw text of the job title of the online job posting with an entity tagger model to identify the one or more words in the raw text of the job title as words that are representative of the job title, prior to providing the vector representation of the job title of the online job posting to the input layer of the multilayer perceptron neural network. 10. The system of claim 9 , wherein the instructions, when executed by the processor, further cause the system to: calculate the vector representation of the job title of the online job posting by deriving an average of the vector representations for each of the two or more words in the raw text of the job title that have been identified as representative of the job title in response to the entity tagger model identifying two or more words in the raw text of the job title as words that are representative of the job title. 11. The system of claim 8 , wherein the instructions, when executed by the processor, further cause the system to: perform the nearest neighbor search to identify one or more vector representations corresponding with one or more entity embeddings in the entity embedding space by performing an approximate nearest neighbor search. 12. The system of claim 8 , wherein t

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Transfer learning · CPC title

  • Feedforward networks · CPC title

  • Distances to closest patterns, e.g. nearest neighbour classification · CPC title

  • by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12229669B2 cover?
Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word e…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06Q10/1053. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).