Determining and discerning items with multiple meanings

US11328126B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11328126-B2
Application numberUS-201916690350-A
CountryUS
Kind codeB2
Filing dateNov 21, 2019
Priority dateOct 19, 2015
Publication dateMay 10, 2022
Grant dateMay 10, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector, and producing a new sequence of items by modifying the distributed representation in the producing by replacing each occurrence of an item depending on the cosine distance calculated by the calculating.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of determining and discerning items with multiple meanings in a sequence of items, the method comprising: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes and associating a class representative vector with the classes based on a user selection; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector; producing a fixed number of classes D with smallest of the cosine distance; and producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector as determined using the calculated cosine distance, wherein the modified sequence and the sequence include words replaced by the senses to explain the words using the senses, and wherein the calculating scans the sequence of items and sets a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector, further comprising displaying a dominant member of each class based on the member having a largest cosine distance between the word vector of a potential dominant member item and the class representative vector, wherein a distribution representation device learns the distributed representation using a tool and the distribution representation device produces a vector from the learned distributed representation, the vector including the word vector and the context vector. 2. The method of claim 1 , further comprising producing a new sequence of items based on a result of the calculating. 3. The method of claim 1 , wherein the sequence of items into are partitioned into the classes by applying a clustering algorithm to the word vector. 4. The method of claim 1 , wherein the producing the modified sequence of items includes replacing each item with a variation of said item based on the sense of the item in a current context usage. 5. The method of claim 1 , wherein the partitioning items into classes uses a K-means algorithm. 6. The method of claim 1 , wherein the average comprises a weighted average with higher weights assigned to word vectors whose word is closer to a window center word. 7. The method of claim 1 , wherein the words of the modified sequence are partitioned into new classes. 8. A non-transitory computer-readable recording medium recording a program for determining and discerning items with multiple meanings in a sequence of items, the program causing a computer to perform: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes and associating a class representative vector with the classes based on a user selection; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector; producing a fixed number of classes D with smallest of the cosine distance; and producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector as determined using the calculated cosine distance, wherein the modified sequence and the sequence include words replaced by the senses to explain the words using the senses, and wherein the calculating scans the sequence of items and sets a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector, further comprising displaying a dominant member of each class based on the member having a largest cosine distance between the word vector of a potential dominant member item and the class representative vector, wherein a distribution representation device learns the distributed representation using a tool and the distribution representation device produces a vector from the learned distributed representation, the vector including the word vector and the context vector. 9. The non-transitory computer-readable recording medium of claim 8 , further comprising producing a new sequence of items based on a result of the calculating. 10. The non-transitory computer-readable recording medium of claim 8 , wherein the sequence of items into are partitioned into the classes by applying a clustering algorithm to the word vector. 11. A two-phase system for determining and discerning items with multiple meanings in a sequence of items, the system comprising: a processor; and a memory, the memory storing instructions to cause the processor to perform: producing a distributed representation for each item of the sequence of items including a word vector and a context vector; partitioning the sequence of items into classes and associating a class representative vector with the classes based on a user selection; for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector; producing a fixed number of classes D with smallest of the cosine distance; and producing a modified sequence of items by replacing each item I occurrence in the sequence by an I_j occurrence where j is one of the D classes for item I, wherein each word of the modified sequence conveys a ranked list of senses of the word in the original sequence based on a closeness of the item to the class representative vector as determined using the calculated cosine distance, wherein the modified sequence and the sequence include words replaced by the senses to explain the words using the senses, and wherein the calculating scans the sequence of items and sets a current center item as a vocabulary word and uses an average of the word vectors of items in a window of a predetermined size surrounding the vocabulary word, and then calculates the cosine distance of the vector average with a class representative context vector, further comprising displaying a dominant member of each class based on the member having a largest cosine distance between the word vector of a potential dominant member item and the class representative vector, wherein a distribution representation device learns the distributed representation using a tool and the distribution representation device produces a vector from the learned distributed representation, the vector including the word vector and the context vector. 12. The system of claim 11 , further comprising producing a new sequence of items based on a result of the calculating. 13. The system of claim 11 , wherein the sequence of items into are partitioned into the classes by applying a clustering algorithm to the word vector.

Assignees

Inventors

Classifications

  • Editing, e.g. inserting or deleting · CPC title

  • Parsing for meaning understanding · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Creation or modification of classes or clusters · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11328126B2 cover?
A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).