What technology area does this patent fall under?

Primary CPC classification G06F40/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Determining and discerning items with multiple meanings

US10585987B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10585987-B2
Application number	US-201715442834-A
Country	US
Kind code	B2
Filing date	Feb 27, 2017
Priority date	Oct 19, 2015
Publication date	Mar 10, 2020
Grant date	Mar 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector, and producing a new sequence of items by modifying the distributed representation in the producing by replacing each occurrence of an item depending on the cosine distance calculated by the calculating.

First claim

Opening claim text (preview).

What is claimed is: 1. A two-phase method of determining and discerning items with multiple meanings in a sequence of items, the method comprising: in a first phase: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector not including an own class of the item; and producing a fixed number of classes D with smallest said cosine distance; and in a second phase: producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I; producing a new distributed representation for each item I_m of the modified sequence of items including a word vector and a context vector; and using said new distributed representation for determining and discerning items with multiple meanings, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector without being a member of the class. 2. The method of claim 1 , wherein the producing a modified sequence of items includes replacing each item with a variation of said item based on the sense of the item in a current context usage. 3. The method of claim 2 , wherein the replacing scans the sequence of items and sets a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector. 4. The method of claim 1 , wherein the partitioning items into classes uses a K-means algorithm. 5. The method of claim 3 , wherein an average comprises a weighted average with higher weights assigned to word vectors whose word is closer to a window center word. 6. The method of claim 1 , wherein the using further partitions the words of the modified sequence into new classes. 7. The method of claim 6 , wherein said partitioning uses a K-means algorithm. 8. The method of claim 6 , wherein each word of the modified sequence is presented alongside dominant members of a new class to which said word belongs. 9. The method of claim 1 , further comprising scanning the sequence of items and setting a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector. 10. A non-transitory computer-readable recording medium recording a program for determining and discerning items with multiple meanings in a sequence of items, in two-phases, the program causing a computer to perform: in a first phase: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector at including an own class of the item; and producing a fixed number of classes D with smallest said cosine distance; and in a second phase: producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I; producing a new distributed representation for each item I_m of the modified sequence of items including a word vector and a context vector; and using said new distributed representation for determining and discerning items with multiple meanings, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector without being a member of the class. 11. The non-transitory computer readable recording medium of claim 10 , wherein the producing a modified sequence of items replaces each item with a variation of said item based on the sense of the item in a current context usage. 12. The non-transitory computer readable recording medium of claim 11 , wherein the replacing scans the sequence of items and sets a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector. 13. The non-transitory computer readable recording medium of claim 10 , wherein the partitioning items into classes uses a K-means algorithm. 14. The non-transitory computer readable recording medium of claim 12 , wherein an average comprises a weighted average with higher weights assigned to word vectors whose word is closer to a window center word. 15. The non-transitory computer readable recording medium of claim 10 , wherein the using further partitions the words of the modified sequence into new classes. 16. The non-transitory computer readable recording medium of claim 15 , wherein said partitioning uses a K-Means algorithm. 17. The non-transitory computer readable recording medium of claim 15 , wherein each word of the modified sequence is presented alongside dominant members of a new class to which said word belongs. 18. A two-phase system for determining and discerning items with multiple meanings in a sequence of items, the system comprising: a processor, and a memory, the memory storing instructions to cause the processor to perform: in a first phase: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector not including an own class of the item; and producing a fixed number of classes D with smallest said cosine distance; and in a second phase: producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I; producing a new distributed representation for each item I_m of the modified sequence of items including a word vector and a context vector, and using said new distributed representation for determining and discerning items with multiple meanings, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector without being a member of the class. 19. The system of claim 18 , wherein the producing a modified sequence of items including replacing each item with a variation of said item based on the sense of the item in a current context usage.

Assignees

Inventors

Shmueli Oded

Classifications

G06F40/30Primary
Semantic analysis · CPC title
G06F40/166
Editing, e.g. inserting or deleting · CPC title
G10L15/1822
Parsing for meaning understanding · CPC title
G06F16/355
Creation or modification of classes or clusters · CPC title
G10L15/1815
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

Patent family

Related publications grouped by family.

View patent family 58530296

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10585987B2 cover?: A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating …
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System and method for unsupervised text normalization using distributed representation of words

Context-based metadata generation and automatic annotation of electronic media in a computer network

Hierarchical models for language modeling

Device, method, and program for word sense estimation

Frequently asked questions