Integrating kafka data-in-motion with data-at-rest tables
US-2020125572-A1 · Apr 23, 2020 · US
US10936616B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10936616-B2 |
| Application number | US-201514733691-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 8, 2015 |
| Priority date | Jun 9, 2014 |
| Publication date | Mar 2, 2021 |
| Grant date | Mar 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A storage system communicatively coupled to a database management system (DBMS performs storage-side scanning of data sources that are not stored in native database storage format of the DBMS. Data sources for external tables are accessible in a storage system referred to as a distributed data access system (DDAS), e.g. a Hadoop Distributed File System. To execute a query that references an external table, a DBMS first generates an execution plan. The DDAS supplies the DBMS with information that specifies each portion of the data source, and specifies which data node to use to access the portion. The DBMS sends a request for each portion to the respective data node, requesting that the data node generate rows from data in the portion. The request may specify scanning criteria, specifying one or more columns to project and/or filter on, and code modules for the data node to execute to generate records.
Opening claim text (preview).
What is claimed is: 1. A method for execution by a distributed data access system, said distributed data access system comprising a plurality of data nodes and an external data source that comprises a first plurality of columns having a first plurality of data types, the method comprising: receiving a profile request from a database management system (DBMS) for profile information about the external data source, wherein the distributed data access system does not contain the DBMS; returning said profile information, said profile information specifying a plurality of input splits for the external data source, wherein each input split of said plurality of input splits maps a separate portion of a plurality of data portions of the external data source to a respective data node that stores said separate portion of the plurality of data portions, wherein each data node of the plurality of data nodes comprises at least one hardware processor; for each data node of said plurality of data nodes: a) respectively receiving a retrieval request from the DBMS for particular rows that satisfy one or more scanning criteria, said data node corresponding to a particular input split belonging to said plurality of input splits, said retrieval request specifying said one or more scanning criteria and a data type converter; b) according to said one or more scanning criteria, said data node generating said particular rows from a data source split corresponding to said input split, wherein: said particular rows contain converted values that are generated by the data type converter using a column mapping from the first plurality of columns having the first plurality of data types of the external data source to a second plurality of columns having a second plurality of data types that are native to the DBMS, the second plurality of columns are part of an external table that is based on the external data source, the external table, including that the second plurality of columns respectively have the second plurality of data types, is defined in a data dictionary of the DBMS, said converted values have said data types that are native to the DBMS, said data source split stores data in a storage format that is not a native database storage format; and c) returning said particular rows to said DBMS, said particular rows being returned in a format supported by said DBMS. 2. The method of claim 1 , wherein for a particular data node of said plurality of data nodes, said generating said particular rows includes: generating certain rows from data from said data source split, said certain rows having said format supported by said DBMS; and applying said one or more scanning criteria to generate said particular rows. 3. The method of claim 2 , wherein said generating certain rows includes: generating records and columns from said data from said data source split; and converting said records and columns to said certain rows. 4. The method of claim 3 , wherein said retrieval request received by said particular data node specifies one or more code modules, wherein execution of said one or more code modules causes generating said records and columns. 5. A method comprising: storing, within a data dictionary of a database management system (DBMS), metadata that defines data types of columns of an external table that is based on an external data source; the DBMS generating an execution plan for a query that requires access to an external table, wherein data for said external table is stored in a data source that: a) is accessible on a distributed data access system comprising a plurality of data nodes, and b) comprises a plurality of data portions, wherein the distributed data access system does not contain the DBMS, wherein generating said execution plan includes: sending to said distributed data access system a profile request for a profile information of said data source; receiving said profile information, said profile information specifying a plurality of input splits for the data source, wherein each input split of said plurality of input splits maps a respective data node to a respective portion of said plurality of data portions; and generating a plurality of external work granules for generating rows from said data source that satisfy one or more scanning criteria, each external work granule of said plurality of external work granules being for generating respective rows from a respective input split that satisfy said one or more scanning criteria; said DBMS executing said execution plan, wherein executing said execution plan comprises, for a particular work granule of said plurality of external work granules, for each data node of said plurality of data nodes corresponding to the respective input split of said particular work granule: a) sending a retrieval request from the DBMS to data node, said retrieval request to the data node requesting the respective rows for said respective input split, said retrieval request to the data node specifying said one or more scanning criteria; b) said data node generating said respective rows that contain values that have said data types of said columns as defined by said metadata; and c) receiving the respective rows from said data node. 6. The method of claim 5 , wherein for each data node of said plurality of data nodes, said receiving the respective rows from said each data node includes receiving the respective rows in a row format supported by said DBMS. 7. The method of claim 6 , wherein said data source stores data in a storage format that is not a native database storage format. 8. The method of claim 5 , wherein for each data node of said plurality of data nodes, the retrieval request identifies code modules to execute to generate particular rows from the respective portion for said respective input split. 9. The method of claim 8 , wherein: for each data node of said plurality of data nodes, said code modules are configured to generate said particular rows from data that is in a storage format that is not a native database storage format. 10. The method of claim 8 , wherein for each data node of said plurality of data nodes, said code modules comprise: one or more record reader modules for generating records from said data source; and one or more column reader modules for generating column values from records generated by said one or more record reader modules. 11. One or more non-transitory computer-readable storage media storing instructions which, when executed by a distributed data access system comprising a plurality of data nodes and an external data source that comprises a first plurality of columns having a first plurality of data types, cause: receiving a profile request from a database management system (DBMS) for profile information about the external data source, wherein the distributed data access system does not contain the DBMS; returning said profile information, said profile information specifying a plurality of input splits for the external data source, wherein each input split of said plurality of input splits maps a separate portion of a plurality of data portions of the external data source to a respective data node that stores said separate portion of the plurality of data portions, wherein each data node of the plurality of data nodes comprises at least one hardware processor; for each data node of said plurality of data nodes: a) respectively receiving a retrieval request from the DBMS for particular rows that satisfy one or more scanning criteria, said data node corresponding to a particular input split belonging to said plurality of input splits, said retrieval request specifying said one or more scanning criteria and a data type co
Plan optimisation · CPC title
Data format conversion from or to a database · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.