Determining and discerning items with multiple meanings

US10585987B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10585987-B2
Application numberUS-201715442834-A
CountryUS
Kind codeB2
Filing dateFeb 27, 2017
Priority dateOct 19, 2015
Publication dateMar 10, 2020
Grant dateMar 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector, and producing a new sequence of items by modifying the distributed representation in the producing by replacing each occurrence of an item depending on the cosine distance calculated by the calculating.

First claim

Opening claim text (preview).

What is claimed is: 1. A two-phase method of determining and discerning items with multiple meanings in a sequence of items, the method comprising: in a first phase: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector not including an own class of the item; and producing a fixed number of classes D with smallest said cosine distance; and in a second phase: producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I; producing a new distributed representation for each item I_m of the modified sequence of items including a word vector and a context vector; and using said new distributed representation for determining and discerning items with multiple meanings, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector without being a member of the class. 2. The method of claim 1 , wherein the producing a modified sequence of items includes replacing each item with a variation of said item based on the sense of the item in a current context usage. 3. The method of claim 2 , wherein the replacing scans the sequence of items and sets a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector. 4. The method of claim 1 , wherein the partitioning items into classes uses a K-means algorithm. 5. The method of claim 3 , wherein an average comprises a weighted average with higher weights assigned to word vectors whose word is closer to a window center word. 6. The method of claim 1 , wherein the using further partitions the words of the modified sequence into new classes. 7. The method of claim 6 , wherein said partitioning uses a K-means algorithm. 8. The method of claim 6 , wherein each word of the modified sequence is presented alongside dominant members of a new class to which said word belongs. 9. The method of claim 1 , further comprising scanning the sequence of items and setting a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector. 10. A non-transitory computer-readable recording medium recording a program for determining and discerning items with multiple meanings in a sequence of items, in two-phases, the program causing a computer to perform: in a first phase: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector at including an own class of the item; and producing a fixed number of classes D with smallest said cosine distance; and in a second phase: producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I; producing a new distributed representation for each item I_m of the modified sequence of items including a word vector and a context vector; and using said new distributed representation for determining and discerning items with multiple meanings, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector without being a member of the class. 11. The non-transitory computer readable recording medium of claim 10 , wherein the producing a modified sequence of items replaces each item with a variation of said item based on the sense of the item in a current context usage. 12. The non-transitory computer readable recording medium of claim 11 , wherein the replacing scans the sequence of items and sets a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector. 13. The non-transitory computer readable recording medium of claim 10 , wherein the partitioning items into classes uses a K-means algorithm. 14. The non-transitory computer readable recording medium of claim 12 , wherein an average comprises a weighted average with higher weights assigned to word vectors whose word is closer to a window center word. 15. The non-transitory computer readable recording medium of claim 10 , wherein the using further partitions the words of the modified sequence into new classes. 16. The non-transitory computer readable recording medium of claim 15 , wherein said partitioning uses a K-Means algorithm. 17. The non-transitory computer readable recording medium of claim 15 , wherein each word of the modified sequence is presented alongside dominant members of a new class to which said word belongs. 18. A two-phase system for determining and discerning items with multiple meanings in a sequence of items, the system comprising: a processor, and a memory, the memory storing instructions to cause the processor to perform: in a first phase: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector not including an own class of the item; and producing a fixed number of classes D with smallest said cosine distance; and in a second phase: producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I; producing a new distributed representation for each item I_m of the modified sequence of items including a word vector and a context vector, and using said new distributed representation for determining and discerning items with multiple meanings, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector without being a member of the class. 19. The system of claim 18 , wherein the producing a modified sequence of items including replacing each item with a variation of said item based on the sense of the item in a current context usage.

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Editing, e.g. inserting or deleting · CPC title

  • Parsing for meaning understanding · CPC title

  • Creation or modification of classes or clusters · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10585987B2 cover?
A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).