What technology area does this patent fall under?

Primary CPC classification G06F40/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Method for word sense disambiguation for homonym words based on part of speech (POS) tag of a non-homonym word

US9824084B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9824084-B2
Application number	US-201515516102-A
Country	US
Kind code	B2
Filing date	Jun 23, 2015
Priority date	Mar 19, 2015
Publication date	Nov 21, 2017
Grant date	Nov 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method of ( 600 ) and a system ( 222, 208 ) for processing a text stream. The method comprises accessing ( 602 ) the text stream; parsing ( 604 ) the text stream; analyzing ( 606 ) a first collection of words to identify a homonym candidate; generating ( 608 ) a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words; determining ( 610 ), for at least one word of the homonym word pattern, a first context element; generating ( 612 ) a homonym context pattern; analyzing ( 614 ) a second collection of words to identify a non-homonym candidate having a non-homonym context pattern at least partially matching the homonym context pattern, the non-homonym candidate being associated with a lexical tag; and assigning ( 616 ) the lexical tag associated with the non-homonym candidate to the homonym candidate.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of processing a first text stream for execution by a processor, the method comprising: accessing, from a non-transitory computer-readable medium, the first text stream; parsing the first text stream by breaking the first text stream down into a first collection of words; analysing the first collection of words to identify a homonym candidate, the homonym candidate being associated with a first meaning and a second meaning; generating a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words, the at least one word being selected based on a distance between the at least one word and the homonym candidate in the first text stream, the distance being a number of words separating the at least one word of the first collection of words from the homonym candidate in the first text stream; determining, for at least one word of the homonym word pattern, a first context element; generating a homonym context pattern, the homonym context pattern being at least partially based on the first context element; parsing a second text stream by breaking the second text stream down into a second collection of words, the second collection of words being distinct from the first collection of words, the second text stream being in a same language as the first text stream; analysing the second collection of words to identify a non-homonym candidate, the non-homonym candidate being associated with a lexical tag; if a non-homonym context pattern at least partially matches the homonym context pattern, the non-homonym context pattern being at least partially based on a second context element determined for at least one word of a non-homonym word pattern, assigning the lexical tag associated with the non-homonym candidate to the homonym candidate; storing, to a memory coupled to the processor, the lexical tag; and rendering the lexical tag on a display of an electronic device. 2. The method of claim 1 , wherein analysing the first collection of words to identify the homonym candidate is based on a comparison of each one of the words of the first collection of words with entries of a dictionary database. 3. The method of claim 2 , wherein analysing the first collection of words to identify the homonym candidate comprises accessing, from the non-transitory computer-readable medium, the dictionary database. 4. The method of claim 2 , wherein the homonym candidate is identified upon determining that one of the entries of the dictionary database corresponding to the one of the words of the first collection of words is associated with a plurality of meanings including the first meaning and the second meaning. 5. The method of claim 1 , wherein the distance is pre-defined as being at least one of one word before the homonym candidate, two words before the homonym candidate, three words before the homonym candidate, one word after the homonym candidate, two words after the homonym candidate and three words after the homonym candidate. 6. The method of claim 1 , wherein the first context element and the second context element are at least one of an indication of a word form, an indication of a semantic characteristic and an indication of a grammatical characteristic. 7. The method of claim 1 , wherein generating the homonym context pattern is based on multiple context elements, each one of the multiple context elements being determined for the corresponding word of the homonym word pattern, the multiple context elements including the first context element. 8. The method of claim 1 , wherein identification of the non-homonym candidate comprises determining that the second context element of the non-homonym context pattern is similar to the first context element of the homonym context pattern. 9. The method of claim 1 , wherein the lexical tag defines at least one of an indication of a word form, an indication of a semantic characteristic and an indication of a grammatical characteristic. 10. The method of claim 1 , wherein the non-homonym candidate is associated with a unique meaning. 11. The method of claim 1 , wherein the method further comprises determining which one of the first meaning and the second meaning of the homonym candidate is to be retained based on the lexical tag assigned to the homonym candidate. 12. The method of claim 1 , wherein determining which one of the first meaning and the second meaning of the homonym candidate is to be retained is completed without having to access to a training corpus of text manually tagged. 13. The method of claim 1 , wherein the first text stream is a corpus of text. 14. A computer-implemented method of processing a first text stream for execution by a processor, the method comprising: accessing, from a non-transitory computer-readable medium, the first text stream; parsing the first text stream by breaking the first text stream down into a first collection of words; analysing the first collection of words to identify a homonym candidate, the homonym candidate being associated with a first meaning and a second meaning; generating a homonym word pattern, the homonym word pattern comprising at least one word of the collection of words, the at least one word being selected based on a distance between the at least one word and the homonym candidate in the first text stream, the distance being a number of words separating the at least one word of the first collection of words from the homonym candidate in the first text stream; determining, for at least one word of the homonym word pattern, a first context element; generating a homonym context pattern, the homonym context pattern being at least partially based on the first context element; parsing a second text stream by breaking the second text stream down into a second collection of words, the second collection of words being distinct from the first collection of words, the second text stream being in a same language as the first text stream; analysing the second collection of words to identify a non-homonym candidate, the non-homonym candidate being associated with a lexical tag; if a non-homonym context pattern at least partially matches the homonym context pattern, the non-homonym context pattern being at least partially based on a second context element determined for at least one word of a non-homonym word pattern, determining which one of the first meaning and the second meaning of the homonym candidate is to be retained based on the lexical tag associated with the non-homonym candidate; and rendering a retained meaning on a display of an electronic device. 15. A computer-implemented system for processing a first text stream, the system comprising: a non-transitory computer-readable medium; a processor configured to perform: accessing, from the non-transitory computer-readable medium, the first text stream; parsing the first text stream by breaking the first text stream down into a first collection of words; analysing the first collection of words to identify a homonym candidate, the homonym candidate being associated with a first meaning and a second meaning; generating a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words, the at least one word being selected based on a distance between the at least one word and the homonym candidate in the first text stream, the distance being a number of words separating the at least one word of the first collection of words from the homonym candidate in the first text stream; determining, for at least one word of the homonym wo

Assignees

Yandex Europe Ag

Inventors

Classifications

G06F40/253
Grammatical analysis; Style critique · CPC title
G06F40/279
Recognition of textual entities · CPC title
G06F40/30Primary
Semantic analysis · CPC title
G06F40/211Primary
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
G06F40/242
Dictionaries · CPC title

Patent family

Related publications grouped by family.

View patent family 56918457

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9824084B2 cover?: A computer-implemented method of ( 600 ) and a system ( 222, 208 ) for processing a text stream. The method comprises accessing ( 602 ) the text stream; parsing ( 604 ) the text stream; analyzing ( 606 ) a first collection of words to identify a homonym candidate; generating ( 608 ) a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words; d…
Who is the assignee on this patent?: Yandex Europe Ag
What technology area does this patent fall under?: Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).