Distributed data set indexing

US9977807B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9977807-B1
Application numberUS-201715838211-A
CountryUS
Kind codeB1
Filing dateDec 11, 2017
Priority dateFeb 13, 2017
Publication dateMay 22, 2018
Grant dateMay 22, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus including a processor to: receive search criteria including a data value; in response to receiving the search criteria, generate a hash value from the data value of the search criteria, and for each data cell of a super cell, compare the hash value to hash values within a hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record meeting the search criteria, and in response to determining that the data cell includes at least one of such data record, search the data records to identify one or more data records meeting the search criteria; and in response to identifying at least one data record within at least one data cell of the super cell meeting the search criteria, provide results data indicative of the super cell including at least one of such data record.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus comprising a processor of a first node device of multiple node devices, and a storage of the first node device to store instructions that, when executed by the processor, cause the processor to perform operations comprising: store, at the first node device, a first super cell of multiple super cells into which a data set is divided from a data file maintained by at least one data device, wherein: the multiple super cells are distributed among the multiple node devices; each super cell comprises multiple data cells; each data cell of the multiple data cells comprises multiple data records; and each data record of the multiple data records comprises a set of data fields at which data values of the data set are stored; store, for each data cell within the first super cell, a cell index that corresponds to the data cell, wherein the cell index comprises a first hash values vector that corresponds to a first data field of the set of data fields, and that comprises hash values generated from each unique value among the data values stored within the first data field; receive, at the first node device, from a control device, and at least partially in parallel with other node devices of the multiple node devices, query instructions specifying search criteria of a search to be performed of the data set for data records that meet the search criteria, wherein the search criteria comprises at least one data value to be searched for within the first data field; in response to the receipt of the query instructions, generate a first hash value from a first data value of the at least one data value of the search criteria, and for each data cell within the first super cell, the processor is caused to perform operations of the search, the operations comprising: compare the first hash value to the hash values within the first hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value; and in response to a determination that the data cell includes at least one data record that meets the search criteria, search the data records of the data cell to identify one or more data records that meet the search criteria; and in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, the processor is caused to perform operations comprising: generate results data indicative of the first super cell including at least one data record that meets the search criteria for at least the first data value; and provide the results data to the control device. 2. The apparatus of claim 1 , wherein: each of the cell indexes corresponding to the data cells within the first super cell comprises a second hash values vector that corresponds to a second data field of the set of data fields, wherein the second hash values vector comprises hash values generated from each unique value among the data values stored within the second data field; in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, and for each data cell within the at least one data cell, the processor is caused to perform operations of the search, the operations comprising: generate a second hash value from a second data value of the at least one data value of the search criteria; compare the second hash value to the hash values within the second hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value and the second data value; and in response to a determination that the data cell includes at least one data record that meets the search criteria for at least the first data value and the second data value, search the data records of the data cell to identify one or more data records that meet the search criteria for at least the first data value and the second data value; and condition the generation and transmission of results on identification of at least one data record within at least one data cell of the first super cell that meets the search criteria for the first data value and the second data value. 3. The apparatus of claim 2 , wherein the processor is caused to perform operations comprising: perform the search corresponding to the first data field on a first thread of execution; and perform the search corresponding to the second data field on a second thread of execution at least partially in parallel with the performance of the search on the first thread. 4. The apparatus of claim 3 , wherein the processor is caused to allocate a separate processor core of the processor to each of the first and second threads of execution. 5. The apparatus of claim 1 , wherein: each of the cell indexes corresponding to the data cells within the first super cell comprises a unique values vector that corresponds to the first data field, wherein the unique values vector comprises a single instance of each data values present within the first data field among the data records of the corresponding data cell, wherein the single instances of each data value are sorted by value; and in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, and for each data cell within the at least one data cell, the processor is caused to perform operations of the search, the operations comprising: compare the first data value of the at least one data value of the search criteria to the single instances of data values within the unique values vector to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value; and condition the search of the data records of the data cell on a determination, via the comparison with the first hash values vector and the comparison with the unique values vector that the data cell that includes at least one data record that meets the search criteria. 6. The apparatus of claim 1 , wherein the processor is caused to perform operations comprising: parse the query instructions to determine whether the query instructions include task instructions for the performance of a task with data retrieved from one or more data records identified as meeting the search criteria; and in response to a determination that the query instructions do include task instructions for the performance of a task, perform operations comprising: execute the instructions to perform the task at least partially in parallel with at least one other node device of the multiple node devices; and generate the results data to include results of the performance of the task as the indication that the super cell includes at least one data record that meets the search criteria. 7. The apparatus of claim 1 , wherein the processor is caused to perform operations comprising: store, at the first node device, a first super cell index corresponding to the first super cell, wherein the first super cell index comprises an indication of a range of values stored within the first data field within the multiple data cells of the first super cell; in response to the receipt of the query instructions, compare the at least one data value of the search criteria to the range of values indicated in the first super cell index to determine whether the first super cell includes at least one data record within at least one data cell of the first super cell that meets the search criteria; and condi

Assignees

Inventors

Classifications

  • Hash tables · CPC title

  • Trees · CPC title

  • Comparing separate sets of record carriers arranged in the same sequence to determine whether at least some of the data in one set is identical with that in the other set or sets · CPC title

  • Query processing · CPC title

  • Indexing structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9977807B1 cover?
An apparatus including a processor to: receive search criteria including a data value; in response to receiving the search criteria, generate a hash value from the data value of the search criteria, and for each data cell of a super cell, compare the hash value to hash values within a hash values vector in the corresponding cell index to determine whether the data cell includes at least one dat…
Who is the assignee on this patent?
Sas Inst Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2228. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).