Adaptive data clustering for databases
US-2021117447-A1 · Apr 22, 2021 · US
US11727013B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11727013-B2 |
| Application number | US-202217930150-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 7, 2022 |
| Priority date | Apr 9, 2021 |
| Publication date | Aug 15, 2023 |
| Grant date | Aug 15, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Hybrid tables can be used in different use-case scenarios. Hybrid tables provide a flexible mechanism to support files and data in different formats while providing access to the different types of data as part of one table. This flexibility can allow the use of hybrid tables in data lake or other similar environments.
Opening claim text (preview).
What is claimed is: 1. A method comprising: storing a first set of data in a first format in a first cloud storage location; storing a second set of data in a second format in a second cloud storage location; classifying a first subset of the first set of data in the first format as high-value data and classifying a second subset of the first set of data as low-value data; and ingesting a copy of the high-value data from the first cloud storage location into the second cloud storage location in the second format, wherein the first subset of the first set of data in the first format is maintained and not deleted in the first cloud storage location in response to ingesting the copy of the high-value data; providing an interface for accessing the first and second sets of data; receiving, via the interface, a first query referencing the first and second sets of data; determining that the first query references the first subset of the first data; executing the first query using the first subset of data in the second cloud storage location in the second format and the second set of data; receiving, via the interface, a second query referencing the first and second sets of data, determining that the second query references a second subset of the first set of data not ingested into the second cloud storage location; converting the second subset of the first set of data from the first format into a common format; converting the second set of data from the second format into the common format; joining the second subset of the first set of data in the common format and the second set of data in the common format to generate joined data; and executing the second query based on the joined data. 2. The method of claim 1 , wherein the first cloud storage location is in an external cloud storage location and wherein the second cloud storage location is a network-based data warehouse system, wherein the first format is a raw format and the second format is a formatted format used by the network-based data warehouse system. 3. The method of claim 1 , wherein the classifying is performed based on query patterns. 4. The method of claim 1 , wherein the classifying is performed based on scan statistics. 5. The method of claim 1 , wherein the classifying is performed based on metadata received from a client. 6. The method of claim 1 , further comprising: re-classifying the first subset from high-value data to low-value data; and in response to re-classifying, deleting the ingested copy of the first subset in the second format. 7. A machine-storage medium embodying instructions that, when executed by a machine, cause the machine to perform operations comprising: storing a first set of data in a first format in a first cloud storage location; storing a second set of data in a second format in a second cloud storage location; classifying a first subset of the first set of data in the first format as high-value data and classifying a second subset of the first set of data as low-value data; and ingesting a copy of the high-value data from the first cloud storage location into the second cloud storage location in the second format, wherein the first subset of the first set of data in the first format is maintained and not deleted in the first cloud storage location in response to ingesting the copy of the high-value data; providing an interface for accessing the first and second sets of data; receiving, via the interface, a first query referencing the first and second sets of data; determining that the first query references the first subset of the first data; executing the first query using the first subset of data in the second cloud storage location in the second format and the second set of data; receiving, via the interface, a second query referencing the first and second sets of data; determining that the second query references a second subset of the first set of data not ingested into the second cloud storage location; converting the second subset of the first set of data from the first format into a common format; converting the second set of data from the second format into the common format; joining the second subset of the first set of data in the common format and the second set of data in the common format to generate joined data; and executing the second query based on the joined data. 8. The machine-storage medium of claim 7 , wherein the first cloud storage location is in an external cloud storage location and wherein the second cloud storage location is a network-based data warehouse system, wherein the first format is a raw format and the second format is a formatted format used by the network-based data warehouse system. 9. The machine-storage medium of claim 7 , wherein the classifying is performed based on query patterns. 10. The machine-storage medium of claim 7 , wherein the classifying is performed based on scan statistics. 11. The machine-storage medium of claim 7 , wherein the classifying is performed based on metadata received from a client. 12. The machine-storage medium of claim 7 , further comprising: re-classifying the first subset from high-value data to low-value data; and in response to re-classifying, deleting the ingested copy of the first subset in the second format. 13. A system comprising: at least one hardware processor; and at least one memory storing instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising: storing a first set of data in a first format in a first cloud storage location; storing a second set of data in a second format in a second cloud storage location; classifying a first subset of the first set of data in the first format as high-value data and classifying a second subset of the first set of data as low-value data; and ingesting a copy of the high-value data from the first cloud storage location into the second cloud storage location in the second format, wherein the first subset of the first set of data in the first format is maintained and not deleted in the first cloud storage location in response to ingesting the copy of the high-value data; providing an interface for accessing the first and second sets of data; receiving, via the interface, a first query referencing the first and second sets of data; determining that the first query references the first subset of the first data; executing the first query using the first subset of data in the second cloud storage location in the second format and the second set of data; receiving, via the interface, a second query referencing the first and second sets of data; determining that the second query references a second subset of the first set of data not ingested into the second cloud storage location; converting the second subset of the first set of data from the first format into a common format; converting the second set of data from the second format into the common format; joining the second subset of the first set of data in the common format and the second set of data in the common format to generate joined data; and executing the second query based on the joined data. 14. The system of claim 13 , wherein the first cloud storage location is in an external cloud storage location and wherein the second cloud storage location is a network-based data warehouse system, wherein the first format is a raw format and the second format is a formatted format used by the network-based data warehouse system. 15. The system of claim 13 , wherein the classifying is performed based on query
Data stream processing; Continuous queries · CPC title
Join order optimisation · CPC title
Integrating or interfacing systems involving database management systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.