Query integration across databases and file systems

US10997124B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10997124-B2
Application numberUS-201314781896-A
CountryUS
Kind codeB2
Filing dateApr 2, 2013
Priority dateApr 2, 2013
Publication dateMay 4, 2021
Grant dateMay 4, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Query integration across databases and file systems is disclosed. An example method may include streaming data managed by a first database file system for a query. The method may also include streaming data managed by a second database file system for the query. The method may also include joining the streaming data managed by the first database file system with the streaming data managed by the second database file system.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of query integration across databases and file systems, comprising: streaming data managed by a first database file system for a query, comprising replacing, utilizing a first connector for executing queries issued by a second database file system, a table source for a query engine executing the query with a function-scan that generates the streaming data from the first database file system, making a data source of the query a record generation function, wherein the streaming of the data managed by the first database file system further comprises iteratively retrieving data from the first database file system tuple-by-tuple, and the function-scan allows non-blocking output of the iterative tuple-by-tuple retrieval to generate the streaming data managed by the first database file system; streaming data managed by the second database file system for the query, including utilizing a second connector to set a job input format to a first input format class for receiving database record objects from the second database file system and to utilize a second input class to accept the database record objects as input to the first database file system; and joining the streaming data managed by the first database file system with the streaming data managed by the second database file system. 2. The method of claim 1 , further comprising persisting joined data in a format supported by one of the first database file system and the second database file system. 3. The method of claim 1 , further comprising accessing data in a relational database (RDB) from a Hadoop file system (HDFS), including giving a database input format (DbInputFormat) class a Structured Query Language (SQL) query to extract data from the RDB, wherein the first input format class is the DbInputFormat class. 4. The method of claim 1 , further comprising writing data to a relational database (RDB) from a Hadoop file system (HDFS) by: setting an output value class of a Hadoop job to a database record; setting details of a table for storing data in a database output format class; and creating a reduce class that adds data to a database record object and calls a write function to store the data. 5. The method of claim 1 , further comprising joining and storing query results to a relational database (RDB) by: retrieving data from the RDB by table-scan; retrieving data from a Hadoop file system (HDFS) database by the function-scan; joining the retrieved data from the RDB and the HDFS database; and writing joined data to the RDB. 6. The method of claim 1 , further comprising joining and storing query results to a Hadoop file system (HDFS) database by: retrieving data from a relational database (RDB) by table-scan; retrieving data from the HDFS database by the function-scan; joining the retrieved data from the RDB and the HDFS database; and persisting joined data in a format supported by the HDFS. 7. A system of query integration across databases and file systems, comprising a query engine stored on a non-transient computer-readable medium and executable by a processor to: stream data managed by a first database file system, including replacing, by the query engine and further utilizing a first connector for executing queries issued by a second database file system, a table source for the query engine with a function-scan that generates the streamed data managed by the first database file system, making a data source of a query a record generation function, wherein to stream the data managed by the first database file system further comprises iteratively retrieving data from the first database file system tuple-by-tuple, and the function-scan allows non-blocking output of the iterative tuple-by-tuple retrieval to generate the streamed data managed by the first database file system; stream data managed by the second database file system utilizing a second connector to set a job input format to a first input format class to receive database record objects from the second database file system and to create a second input class to accept the database record objects as input to the first database file system; and join the streamed data managed by the first database file system with the streamed data managed by the second database file system. 8. The system of claim 7 , further comprising a Record Iterator (RIR) configured to retrieve data from a Hadoop file system (HDFS) database and output data processing results record-by-record. 9. The system of claim 8 , wherein the function-scan comprises a Stream Source Function (SSF) to read data records from the RIR. 10. The system of claim 9 , wherein the RIR serves as a data source of the SSF and the SSF serves as the data source of the query. 11. The system of claim 7 , wherein the query engine treats a relation database (RDB) as a Hadoop data source and sink. 12. The system of claim 7 , wherein the query engine treats a Hadoop file system (HDFS) database as a relational database (RDB) data source. 13. The system of claim 12 , wherein the query engine joins data from the HDFS database with data from the RDB. 14. The system of claim 12 , wherein the query engine uses the function-scan for streaming in data from the HDFS database without materializing data statically or dynamically. 15. A non-transitory computer-readable medium containing instructions executable by a processor to cause the processor to: stream data managed by a first database file system, including replacing, utilizing a first connector for executing queries issued by a second database file system, a table source with a function-scan that generates the streamed data managed by the first database file system, making a data source of a query a record generation function, wherein to stream the data managed by the first database file system further comprises iteratively retrieving data from the first database file system tuple-by-tuple, and the function-scan allows non-blocking output of the iterative tuple-by-tuple retrieval to generate the streamed data managed by the first database file system; stream data managed by the second database file system utilizing a second connector to set a job input format to a first input format class to receive database record objects from the second database file system and to create a second input class to accept the database record objects as input to the first database file system; and join the streamed data managed by the first database file system with the streamed data managed by the second database file system. 16. The non-transitory computer-readable medium of claim 15 , wherein the instructions are further executable by the processor to cause the processor to retrieve data, via a Record Iterator (RIR), from a Hadoop file system (HDFS) database and output data processing results record-by-record. 17. The non-transitory computer-readable medium of claim 15 , wherein the function-scan comprises a Stream Source Function (SSF). 18. The non-transitory computer-readable medium of claim 17 , wherein a Record Iterator (RIR) serves as a data source of the SSF.

Assignees

Inventors

Classifications

  • G06F16/258Primary

    Data format conversion from or to a database · CPC title

  • Distributed queries · CPC title

  • implemented using Network-attached Storage [NAS] architecture (distributed or networked storage systems G06F3/067; protocols for distributed storage of data in a network H04L67/1097) · CPC title

  • using file content signatures, e.g. hash values · CPC title

  • Data stream processing; Continuous queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10997124B2 cover?
Query integration across databases and file systems is disclosed. An example method may include streaming data managed by a first database file system for a query. The method may also include streaming data managed by a second database file system for the query. The method may also include joining the streaming data managed by the first database file system with the streaming data managed by th…
Who is the assignee on this patent?
Hewlett Packard Development Co, Micro Focus Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/258. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 04 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).