What technology area does this patent fall under?

Primary CPC classification G06F40/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Word embedding quality assessment through asymmetry

US12282485B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12282485-B2
Application number	US-202017005471-A
Country	US
Kind code	B2
Filing date	Aug 28, 2020
Priority date	Aug 28, 2020
Publication date	Apr 22, 2025
Grant date	Apr 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach to determine the quality of encodings assigned to a word by a word embedding model. The approach may include determining the asymmetry of two embeddings associated with two words from a word embedding model. The asymmetry of the two words from a preexisting evocation dataset may be determined. The asymmetry of the two embeddings may be compared to the asymmetry from the evocation dataset to generate an encoding quality score.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method to determine the quality of word embeddings assigned by a word embedding model, the computer-implemented method comprising: processing, by one or more processors, a corpus, based on a natural language preprocessing system, wherein processing comprises one or more of the following techniques: tokenization, part-of-speech tagging, semantic relationship identification, or syntactic relationship identification; generating, by one or more processors, a word embedding for each word of a plurality of words in the processed corpus, based at least in part on a word embedding model with transformer neural network architecture, wherein generating the word embedding for each word of a plurality of words in the processed corpus further comprises assigning, by the one or more processors, a vector to each dimension within an n-dimensional space; generating, by the one or more processors, a first asymmetry score for the embedding of a first word from the plurality of words in the processed corpus and the embedding of a second word from the plurality of words in the processed corpus by calculating a first log asymmetric ratio of the first word and the second word; generating, by the one or more processors, a second asymmetry score that is a second log asymmetric ratio based, at least in part, on evocation data corresponding to the first word embedding and evocation data corresponding to the second word embedding wherein evocation data from evocation database; comparing, by the one or more processors, the first asymmetry score to the second asymmetry score, based on one of the following, Kendall tau rank correlation coefficient, distance correlation, or polychoric correlation; generating, by one or more processors, an embedding quality score that measures the quality of word representations of word embedding models through a degree of asymmetry between the first word and the second word based at least in part on the comparing of the first asymmetry score and the second asymmetry score. 2. The computer-implemented method of claim 1 , further comprising: receiving, by the one or more processors, evocation data for a first word and a second word from an evocation dataset, wherein evocation data comprises probability relationships between a plurality of words. 3. A computer system for determining the quality of word embeddings assigned by a word embedding model, the system comprising: one or more computer processors; one or more computer readable storage media; computer program instructions to: process a corpus, based on a natural language preprocessing system, wherein processing comprises one or more of the following techniques: tokenization, part-of-speech tagging, semantic relationship identification, or syntactic relationship identification; generate a word embedding for each word of a plurality of words in the processed corpus, based at least in part on a word embedding model with transformer neural network architecture, wherein generating the word embedding for each word of a plurality of words in the processed corpus further comprises assigning, by the one or more processors, a vector to each dimension within an n-dimensional space; generate a first asymmetry score for the embedding of a first word from the plurality of words in the processed corpus and the embedding of a second word from the plurality of words in the processed corpus by calculating a first log asymmetric ratio of the first word and the second word; generate a second asymmetry score that is a second log asymmetric ratio based on evocation data corresponding to the first word embedding and evocation data corresponding to the second word embedding wherein evocation data from evocation database; compare the first asymmetry score to the second asymmetry score, based on one of the following, Kendall tau rank correlation coefficient, distance correlation, or polychoric correlation; generate an embedding quality score that measures the quality of word representations of word embedding models through a degree of asymmetry between the first word and the second word based at least in part on the comparing of the first asymmetry score and the second asymmetry score. 4. The computer system of claim 3 , further comprising instructions to: receive evocation data for a first word and a second word from an evocation dataset, wherein evocation data comprises probability relationships between a plurality of words. 5. A computer program product for determining the quality of word embeddings assigned by a word embedding model, the computer program product comprising a computer readable storage media and program instructions stored on the computer readable storage media comprising instructions to: process a corpus, based on a natural language preprocessing system, wherein processing comprises one or more of the following techniques: tokenization, part-of-speech tagging, semantic relationship identification, or syntactic relationship identification; generate a word embedding for each word of a plurality of words in the processed corpus, based at least in part on a word embedding model with transformer neural network architecture, wherein generating the word embedding for each word of a plurality of words in the processed corpus further comprises assigning, by the one or more processors, a vector to each dimension within an n-dimensional space; generate a first asymmetry score for the embedding of a first word from the plurality of words in the processed corpus and the embedding of a second word from the plurality of words in the processed corpus by calculating a first log asymmetric ratio of the first word and the second word; generate a second asymmetry score that is a second log asymmetric ratio based, at least in part, on evocation data corresponding to the first word embedding and evocation data corresponding to the second word embedding wherein evocation data from evocation database; compare the first asymmetry score to the second asymmetry score, based on one of the following, Kendall tau rank correlation coefficient, distance correlation, or polychoric correlation; generate an embedding quality score that measures the quality of word representations of word embedding models through a degree of asymmetry between the first word and the second word based at least in part on the comparing of the first asymmetry score and the second asymmetry score. 6. The computer program product of claim 5 , further comprising instructions to: receive evocation data for a first word and a second word from an evocation dataset, wherein evocation data comprises probability relationships between a plurality of words.

Assignees

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06N3/02
Neural networks · CPC title
G06F40/237
Lexical tools · CPC title

Patent family

Related publications grouped by family.

View patent family 80358559

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12282485B2 cover?: An approach to determine the quality of encodings assigned to a word by a word embedding model. The approach may include determining the asymmetry of two embeddings associated with two words from a word embedding model. The asymmetry of the two words from a preexisting evocation dataset may be determined. The asymmetry of the two embeddings may be compared to the asymmetry from the evocation da…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Context-sensitive feature score generation

Methods and systems for establishing semantic equivalence in access sequences using sentence embeddings

Filtering spurious knowledge graph relationships between labeled entities

Methods and system for fast, adaptive correction of misspells

Methods and system for fast, adaptive correction of misspells

Unsupervised Topic Modeling For Short Texts

Frequently asked questions