System and method for investigating large amounts of data

US9852144B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9852144-B2
Application numberUS-201715446917-A
CountryUS
Kind codeB2
Filing dateMar 1, 2017
Priority dateJun 23, 2011
Publication dateDec 26, 2017
Grant dateDec 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being stored to minimize storage requirements. The results of searches present input data in its original form. The input data may include access logs, call data records (CDRs), e-mail messages, etc. The system allows a data analyst to efficiently identify information of interest in a very large dynamic data set up to multiple petabytes in size. Once information of interest has been identified, that subset of the large data set can be imported into a dedicated or specialized data analysis system for an additional in-depth investigation and contextual analysis.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a search parameter; deriving a search criterion from the search parameter; using the search criterion to obtain one or more first values from a first-key value family of a key-value data repository, the first key-value family mapping keys to data block identifiers; using the one or more first values to obtain one or more compressed values from a second key-value family of the key-value data repository, the second key-value family mapping data block identifiers to data blocks; wherein the first key-value family comprises a first set of unique keys, each key in the first set of unique keys mapping to one or more values; wherein the second key-value family comprises a second set of unique keys, each key in the second set of unique keys mapping to at least one compressed value; uncompressing the one or more compressed values to produce one or more uncompressed values; using the search criterion to identify one or more portions of the one or more uncompressed values; and returning the one or more portions of the one or more uncompressed values as search results. 2. The method of claim 1 , wherein at least one of the one or more first values comprises an identifier of a compressed value of the one or more compressed values. 3. The method of claim 1 , wherein at least one of the one or more first values comprises a key of the second key-value family. 4. The method of claim 1 , wherein using the search criterion to identify one or more portions of the one or more uncompressed values comprises using the search criterion to obtain one or more second values from the first-key value family of the key-value data repository, wherein the one or more second values identify a byte sequential portion of one of the one or more uncompressed values. 5. The method of claim 1 , wherein at least one of one or more portions is a byte sequential portion of an uncompressed value of the one or more uncompressed values. 6. The method of claim 1 , wherein each key of the first key-value family is unique at least amongst all keys of the first key-value family. 7. The method of claim 1 , wherein the first key-value family comprises at least one million unique keys. 8. The method of claim 1 , wherein using the search criterion to obtain the one or more first values includes selecting a key from the first key-value family that equals the search criterion. 9. The method of claim 1 : wherein the key-value data repository comprises a cluster of a plurality of computing nodes; wherein at least one key of the first key-value family is mastered by at least one node of the plurality nodes and at least one other key of the first key-value family is mastered by at least one other node of the plurality of nodes; wherein each and every node of the cluster of nodes is configured to obtain values for any key of the first key-value family. 10. The computer system of claim 1 , wherein the key-value family is unique at least amongst all keys of the first key-value family. 11. A computer system comprising: a key-value data repository comprising a first key-value family mapping keys to data block identifiers and a second key-value family mapping data block identifiers to data blocks; one or more processors configured to: receive a search parameter; derive a search criterion from the search parameter; use the search criterion to obtain one or more first values from the first-key value family; use the one or more first values to obtain one or more compressed values from the second key-value family; wherein the first key-value family comprises a first set of unique keys, each key in the first set of unique keys mapping to one or more values; wherein the second key-value family comprises a second set of unique keys, each key in the second set of unique keys mapping to at least one compressed value; uncompress the one or more compressed values to produce one or more uncompressed values; use the search criterion to identify one or more portions of the one or more uncompressed values; return the one or more portions of the one or more uncompressed values as search results. 12. The computer system of claim 11 , wherein at least one of the one or more first values comprises an identifier of a compressed value of the one or more compressed values. 13. The computer system of claim 11 , wherein at least one of the one or more first values comprises a key of the second key-value family. 14. The computer system of claim 11 , wherein the one or more processors are configured to use the search criterion to identify one or more portions of the one or more uncompressed values by using the search criterion to obtain one or more second values from the first-key value family of the key-value data repository, wherein the one or more second values identify a byte sequential portion of one of the one or more uncompressed values. 15. The computer system of claim 11 , wherein at least one of the one or more first values comprises information identifying a byte sequential portion of one of the one or more uncompressed values. 16. The computer system of claim 11 , wherein each key of the first key-value family is unique at least amongst all keys of the first key-value family. 17. The computer system of claim 11 , wherein the one or more processors are configured to use the search criterion to obtain the one or more first values by selecting a key from the first key-value family that equals the search criterion. 18. The computer system of claim 11 : wherein the key-value data repository comprises a cluster of a plurality of computing nodes; wherein at least one key of the first key-value family is mastered by at least one node of the plurality nodes and at least one other key of the first key-value family is mastered by at least one other node of the plurality of nodes; wherein each and every node of the cluster of nodes is configured to obtain values for any key of the first key-value family.

Assignees

Inventors

Classifications

  • using context · CPC title

  • Digital computing or data processing equipment or methods, specially adapted for specific functions (information retrieval, database structures or file system structures therefor G06F16/00) · CPC title

  • File search processing · CPC title

  • Data format conversion from or to a database · CPC title

  • G06F16/902Primary

    using more than one table in sequence, i.e. systems with three or more layers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9852144B2 cover?
A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/902. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).