Index creating device, index creating method, search device, search method, and computer-readable recording medium

US2017103123A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017103123-A1
Application numberUS-201615287257-A
CountryUS
Kind codeA1
Filing dateOct 6, 2016
Priority dateOct 9, 2015
Publication dateApr 13, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A non-transitory computer-readable recording medium stores an index generating program that causes a computer to execute a process including: generating presence information of a plurality of pieces of text data, the presence information including whether each of a plurality of elements, included at least one of the plurality of pieces of text data, is present for each of the plurality of pieces of text data, the presence information including a first axe for the plurality of elements and a second axe for the plurality of pieces of text data; detecting collision data for hashed index information when generating the hashed index information, the collision data corresponding to data elements that are independent in the presence information; and setting additional values to each of a plurality of specific collision data, respectively, for one of the plurality of hashed axes.

First claim

Opening claim text (preview).

What is claimed is: 1 . A non-transitory computer-readable recording medium storing therein an index generating program that causes a computer to execute a process comprising: generating presence information of a plurality of pieces of text data, the presence information including whether each of a plurality of elements, included at least one of the plurality of pieces of text data, is present for each of the plurality of pieces of text data, the presence information including a first axe for the plurality of elements and a second axe for the plurality of pieces of text data; detecting collision data for hashed index information when generating the hashed index information, the hashed index information being generated from the presence information and including a plurality of hashed axes, the plurality of hashed axes being generated by applying a plurality of hash functions to the second axe of the presence information, the collision data corresponding to data elements that are independent in the presence information with the first axe and the second axe and duplicating in the hashed index information with the first axe and the plurality of hashed axes; and setting additional values to each of a plurality of specific collision data, respectively, for one of the plurality of hashed axes, the plurality of specific collision data being the detected collision data and satisfying a specific condition. 2 . The non-transitory computer-readable recording medium according to claim 1 , wherein the linking includes aggregating, when a collision continuously occurs in one of the plurality of hashed axes, the presence/absence ratio by using the presence information related to the element that is associated with the hashed axis in which the collisions have occurred, dividing, when the presence ratio of the aggregated presence/absence ratio is greater than a threshold, the presence information related to the element, and setting and linking a division destination to one of the plurality of hashed axes. 3 . The non-transitory computer-readable recording medium according to claim 2 , wherein the division destination used when the presence information related to the element is divided is an area of a low frequency word of the element. 4 . The non-transitory computer-readable recording medium according to claim 1 , wherein the size of the hashed axis is a number of bits matched with the size of a register. 5 . The non-transitory computer-readable recording medium according to claim 1 , wherein a unit of the plurality of elements is a unit of words. 6 . The non-transitory computer-readable recording medium according to claim 1 , wherein a unit of the plurality of elements is a unit of characters with an N grams (N is 2 or more). 7 . An index generating method comprising: generating presence information of a plurality of pieces of text data, the presence information including whether each of a plurality of elements, included at least one of the plurality of pieces of text data, is present for each of the plurality of pieces of text data, the presence information including a first axe for the plurality of elements and a second axe for the plurality of pieces of text data, by a processor; detecting collision data for hashed index information when generating the hashed index information, the hashed index information being generated from the presence information and including a plurality of hashed axes, the plurality of hashed axes being generated by applying a plurality of hash functions to the second axe of the presence information, the collision data corresponding to data elements that are independent in the presence information with the first axe and the second axe and duplicating in the hashed index information with the first axe and the plurality of hashed axes, by the processor; and setting additional values to each of a plurality of specific collision data, respectively, for one of the plurality of hashed axes, the plurality of specific collision data being the detected collision data and satisfying a specific condition, by the processor. 8 . An index generating device comprising: a processor that executes a process including: generating presence information of a plurality of pieces of text data, the presence information including whether each of a plurality of elements, included at least one of the plurality of pieces of text data, is present for each of the plurality of pieces of text data, the presence information including a first axe for the plurality of elements and a second axe for the plurality of pieces of text data; detecting collision data for hashed index information when generating the hashed index information, the hashed index information being generated from the presence information and including a plurality of hashed axes, the plurality of hashed axes being generated by applying a plurality of hash functions to the second axe of the presence information, the collision data corresponding to data elements that are independent in the presence information with the first axe and the second axe and duplicating in the hashed index information with the first axe and the plurality of hashed axes; and setting additional values to each of a plurality of specific collision data, respectively, for one of the plurality of hashed axes, the plurality of specific collision data being the detected collision data and satisfying a specific condition. 9 . A non-transitory computer-readable recording medium storing a search program that causes a computer to execute a process comprising: restoring, when receiving an element formed by two or more characters and identification information on text data, each of a plurality of hashed axes related to the received element; and searching for, based on presence information that is related to the element in each of a plurality of pieces of text data and that is indicated by each of bits in restored bit strings, the presence information on the element associated with the received identification information on the text data. 10 . A search method comprising: restoring, when receiving an element formed by two or more characters and identification information on text data, each of a plurality of hashed axes related to the received element, by a processor; and searching for, based on presence information that is related to the element in each of a plurality of pieces of text data and that is indicated by each of bits in restored bit strings, the presence information on the element associated with the received identification information on the text data, by the processor. 11 . A search device comprising: a processor that executes a process including: restoring, when receiving an element formed by two or more characters and identification information on text data, each of a plurality of hashed axes related to the received element; and searching, based on presence information that is related to the element in each of a plurality of pieces of text data and that is indicated by each of bits in bit strings restored at the restoring, the presence information on the element associated with the received identification information on the text data.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017103123A1 cover?
A non-transitory computer-readable recording medium stores an index generating program that causes a computer to execute a process including: generating presence information of a plurality of pieces of text data, the presence information including whether each of a plurality of elements, included at least one of the plurality of pieces of text data, is present for each of the plurality of piece…
Who is the assignee on this patent?
Fujitsu Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/325. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 13 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).