Text processing method, system and computer program

US10353932B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10353932-B2
Application numberUS-201615243299-A
CountryUS
Kind codeB2
Filing dateAug 22, 2016
Priority dateAug 10, 2012
Publication dateJul 16, 2019
Grant dateJul 16, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes hierarchically identifying occurrences of some of the words in the set of sentences; creating a first index for each of some of the words based on the upper hierarchy of occurrences identified for each word; receiving input of a queried word; hierarchically identifying occurrences of the queried word in the set of sentences; creating a second index based on the upper hierarchy of occurrences identified for the queried word; comparing the first index and the second index to calculate an estimated value for the number of occurrences of a word in the neighborhood of the queried word; and calculating the actual value of the number of occurrences of a word in the neighborhood of the queried word based on an upper hierarchy and lower hierarchy of the occurrences on condition that the estimated value is equal to or greater than a predetermined number.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of processing by computer a set of a plurality of sentences including a plurality of words, the method comprising the steps of: creating a first index for each of at least some of the words based on an upper hierarchy of occurrences identified for each word, wherein occurrences of at least some of the words in the set of sentences are hierarchically identified; creating a second index based on an upper hierarchy of occurrences identified for a queried word, wherein occurrences of the queried word in the set of sentences are hierarchically identified; comparing the first index and the second index to calculate an estimated value for the number of occurrences of a word in a neighborhood of the queried word; and calculating the actual value of the number of occurrences of a word in the neighborhood of the queried word based on an upper hierarchy and lower hierarchy of the occurrences on condition that the estimated value is equal to or greater than a predetermined number, wherein the first index and the second index have an upper hierarchy bit set including uncompressed bits compressed by 1/N where N is a natural number, such that N sized subsets of each upper hierarchy bit set are mapped to respective ones of N compressed bits of the first index and the second index respectively, and a compressed bit is true on condition that one or more uncompressed bits of the corresponding N sized subset are true. 2. The method according to claim 1 , wherein the comparison of the first index and the second index is performed by bit calculation. 3. The method according to claim 1 , wherein the step of calculating an estimated value stores an element of the corresponding upper hierarchy, and calculates the estimated value of the number of occurrences of a word in the neighborhood of the queried word based on the element on condition that two or more uncompressed bits are true. 4. The method according to claim 1 , wherein the step of calculating an estimated value stores the number of elements of the corresponding upper hierarchy, and calculates the estimated value of the number of occurrences of a word in the neighborhood of the queried word based on the number of elements on condition that two or more uncompressed bits are true. 5. The method according to claim 4 , wherein the method further comprises the steps of: creating a third index having an upper hierarchy bit set of the identified occurrences corresponding to each word including uncompressed bits compressed by 1/N, where N is a natural number, such that N sized subsets of the upper hierarchy bit set corresponding to the third index are mapped to respective ones of N compressed bits of the third index, a compressed bit being true on condition that two or more uncompressed bits of the corresponding N sized subset are true; and creating a fourth index having an upper hierarchy bit set of the identified occurrences corresponding to the queried word including uncompressed bits by 1/N, such that N sized subsets of the upper hierarchy bit set corresponding to the fourth index are mapped to respective ones of N compressed bits of the fourth index, a compressed bit being time on condition that two or more uncompressed bits of the corresponding N sized subset are true; the step of calculating the estimated value comparing the third index and the fourth index by bit calculation. 6. The method according to claim 1 , wherein the method further comprises a step of storing K, where K is a natural number, provisionally top frequently occurring words among the words occurring in the neighborhood of the queried word; and the step of calculating the actual value of the number of occurrences of a word calculates the actual value of the number of occurrences of the word in the neighborhood of the queried word based on the upper hierarchy and lower hierarchy of the occurrences on condition that the estimated value is equal to or greater than the Kth provisionally top frequently occurring word. 7. The method according to claim 6 , further comprising a step of updating the Kth provisionally top frequently occurring word on condition that the actual value of the number of occurrences of the word is equal to or greater than the number of occurrences of the K provisionally top frequently occurring words. 8. The method according to claim 6 , further comprising a step of outputting K provisionally top frequently occurring words as the final K top frequently occurring words on condition that all of at least some of the words have been completely examined. 9. The method according to claim 1 , wherein the upper hierarchy of occurrences is a sentence ID specifying one sentence among the plurality of sentences, and the lower hierarchy of occurrences is a position ID specifying the position of the one sentence. 10. The method according to claim 1 , wherein an estimated value for the number of occurrences of the next word among at least some of the words is calculated on condition that the estimated value does not satisfy a predetermined number. 11. The method according to claim 1 , wherein the step of calculating the actual value of the number of occurrences of a word is skipped on condition that the estimated value does not satisfy a predetermined number. 12. The method according to claim 1 , wherein the estimated value of the number of occurrences of a word is a value equal to or greater than the actual value of the number of occurrences of the word. 13. The method according to claim 1 , wherein an estimated value of the number of occurrences of a word is calculated in order of frequency of occurrence for at least some of the words in a set of a plurality of sentences. 14. The method according to claim 1 , wherein at least some of the words include L, where L is a natural number, top frequently occurring words in the set of a plurality of sentences. 15. The method according to claim 1 , wherein at least some of the words include words of a particular part of speech in the set of a plurality of sentences. 16. The method according to claim 1 , wherein the neighborhood of a queried word is established in advance as a range X words before and Y words after an occurrence of the queried word.

Assignees

Inventors

Classifications

  • G06F16/313Primary

    Selection or weighting of terms for indexing · CPC title

  • using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection · CPC title

  • G06F16/316Primary

    Indexing structures · CPC title

  • Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10353932B2 cover?
A method includes hierarchically identifying occurrences of some of the words in the set of sentences; creating a first index for each of some of the words based on the upper hierarchy of occurrences identified for each word; receiving input of a queried word; hierarchically identifying occurrences of the queried word in the set of sentences; creating a second index based on the upper hierarchy…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/313. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 16 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).