Querying Data Records Stored On A Distributed File System

US2018060341A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018060341-A1
Application numberUS-201615254467-A
CountryUS
Kind codeA1
Filing dateSep 1, 2016
Priority dateSep 1, 2016
Publication dateMar 1, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for query large database records are disclosed. An example method includes: obtaining a first search query including a first keyword; accessing a relational database that stores a mapping between one or more keywords and a data record location associated with a distributed file system (DFS). The data record location identifies a location on the DFS at which a data record matching the one or more keywords is stored. The method also includes, determining, using a relational database, a first data record location based on the first keyword; identifying a first data record based on the first data record location; and providing the first data record as a matching record responsive to the first search query.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: obtaining a first search query including a first keyword; accessing a relational database that stores a mapping between one or more keywords and a data record location associated with a distributed file system (DFS), wherein the data record location identifies a location on the DFS at which a data record matching the one or more keywords is stored; determining, using the relational database, a first data record location based on the first keyword; identifying a first data record based on the first data record location; and providing the first data record as a matching record responsive to the first search query. 2 . The method of claim 1 , wherein the mapping is an inverted index mapping from the one or more keywords to the data record location. 3 . The method of claim 1 , further comprising: retrieving, as part of a batch data processing, the first data record from the DFS. 4 . The method of claim 1 , wherein the search query includes a second keyword different from the first keyword; and further comprising: determining, using the relational database, the first data record location based on the second keyword. 6 . The method of claim 1 , further comprising: obtaining a second search query including a second keyword; determining, using the relational database, a second data record location based on the second keyword; identifying a second data record based on the second data record location; executing a batch data retrieval job to retrieve the first data record and the second data record; and providing the second data record as a matching record responsive to the second search query. 6 . The method of claim 1 , further comprising: acknowledging that the first search query has a first matching record store on the DFS. 7 . The method of claim 6 , wherein the acknowledging occurs as part of a stream data processing job. 8 . The method of claim 1 , wherein the DFS system includes a Hadoop database and the relational database is a SQL database. 9 . The method of claim 1 , wherein the one or more keywords include a plurality of keywords. 10 . A system, comprising: a non-transitory memory; and one or more hardware processors coupled to the non-transitory memory and configured to execute instructions to perform operations comprising: receiving a first search query including a first keyword; receiving a second search query including a second keyword; accessing a relational database that stores a mapping between one or more keywords and a data record location associated with a distributed file system (DFS), wherein the data record location identifies a location on the DFS at which a data record matching the one or more keywords is stored; determining, using the relational database, a first data record location based on the first keyword and a second data record location based on the second keyword; identifying a first data record based on the first data record location and a second data record based on the second data record location; and performing a batch data processing job to retrieve the first data record and the second data record from the DFS. 11 . The system of claim 10 , wherein the operations further comprise: retrieving the first data record from a first data node associated with the DFS; and retrieving the second data record from a second data node associated with the DFS. 12 . The system of claim 10 , wherein the operations further comprising: responsive to determining the first data record location and the second data record location, acknowledging that matching records exist for the first search query and the second search query. 13 . The system of claim 10 , wherein receiving the first search query and receiving the second search query are part of a stream data processing job. 14 . The system of claim 10 , wherein the first data record and the second data records are greater than a predefined file size. 16 . A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: obtaining a first search query including a first keyword; obtaining a second search query including a second keyword; accessing a relational database that stores a mapping between one or more keywords and a data record location associated with a distributed file system (DFS), wherein the data record location identifies a location on the DFS at which a data record matching the one or more keywords is stored; determining, using the relational database, a first data record location based on the first keyword and a second data record location based on the second keyword; identifying a first data record based on the first data record location and a second data record based on the second data record location; and performing a batch data processing job to retrieve the first data record and the second data record from the DFS. 16 . The non-transitory machine-readable medium of claim 16 , wherein performing the batch data processing job comprises: requesting a name node to retrieve the first data record based on the first data record location and to retrieve the second data record based on the second data record location. 17 . The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise: retrieving the first data record and the second data record from a same data node associated with the DFS. 18 . The non-transitory machine-readable medium of claim 16 , wherein the first query includes a request to modify the first data record based on the first keyword. 19 . The non-transitory machine-readable medium of claim 16 , wherein the one or more keywords include a plurality of keywords. 20 . The non-transitory machine-readable medium of claim 16 , wherein the DFS system includes a Hadoop database and the relational database is a SQL database.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018060341A1 cover?
Systems and methods for query large database records are disclosed. An example method includes: obtaining a first search query including a first keyword; accessing a relational database that stores a mapping between one or more keywords and a data record location associated with a distributed file system (DFS). The data record location identifies a location on the DFS at which a data record mat…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/148. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).