NOSQL Database Capacity Configuration Optimization System for Cloud Computing
US-2021342197-A1 · Nov 4, 2021 · US
US12423596B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12423596-B2 |
| Application number | US-202016918540-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 1, 2020 |
| Priority date | Jul 1, 2020 |
| Publication date | Sep 23, 2025 |
| Grant date | Sep 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data performance measurement of a computer system is measured. A future value of the data performance measurement is forecasted by executing a forecasting model. A set of throughput model input parameters is configured. A throughput requirement for the computer system is computed by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement. A capacity requirement corresponding to the throughput requirement is determined. A resource within the computer system is deployed according to the capacity requirement.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: measuring a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system, the data performance measurement including a number of queries performed on data stored in the computer system over a time series, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data; forecasting, by executing a forecasting model, a future value of the data performance measurement on the time series of the computer system; configuring a set of throughput model input parameters; computing, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system being sized by computing without utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset of storage devices in a cluster, and a compression percentage; determining a capacity requirement corresponding to the throughput requirement; and deploying, according to the capacity requirement, a resource within the computer system. 2. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of data being stored by the computer system. 3. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of a Hive implementation implemented on the computer system and the configured set of throughput model input parameters change a false positive rate of a bloom filter of the Hive implementation. 4. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of a set of queries performed on data being stored by the computer system. 5. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a performance requirement of the computer system. 6. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a set of queries performed on data being stored by the computer system. 7. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a Hive implementation implemented on the computer system. 8. The computer-implemented method of claim 1 , wherein the data performance measurement determines an architecture of a Hive implementation implemented on the computer system and the capacity requirement is determined according to the architecture. 9. A computer program product for throughput-based node sizing and deployment, the computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to measure a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system over a time series, the data performance measurement including a number of queries performed on data stored in the computer system, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data; program instructions to forecast, by executing a forecasting model, a future value of the data performance measurement of the time series of the computer system; program instructions to configure a set of throughput model input parameters; program instructions to compute, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system, the throughput requirement for a Hive implementation of storage devices in a cluster being sized by computing utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset, a number of strides in the Hive dataset, and a bloom filter false positive rate of the Hive implementation; program instructions to determine a capacity requirement corresponding to the throughput requirement; and program instructions to deploy, according to the capacity requirement, a resource within the computer system. 10. The computer program product of claim 9 , wherein the data performance measurement measures a characteristic of data being stored by the computer system. 11. The computer program product of claim 9 , wherein the configured set of throughput model input parameters change the false positive rate of the bloom filter of the Hive implementation. 12. The computer program product of claim 9 , wherein the data performance measurement measures a characteristic of a set of queries performed on data being stored by the computer system. 13. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a performance requirement of the computer system. 14. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a set of queries performed on data being stored by the computer system. 15. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of the Hive implementation implemented on the computer system. 16. The computer program product of claim 9 , wherein the data performance measurement determines an architecture of the Hive implementation implemented on the computer system and the capacity requirement is determined according to the architecture. 17. The computer program product of claim 9 , wherein the stored program instructions are stored in the at least one of the one or more storage media of a local data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system. 18. The computer program product of claim 9 , wherein the stored program instructions are stored in the at least one of the one or more storage media of a server data processing system, and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system. 19. The computer program product of claim 9 , wherein the computer program product is provided as a service in a cloud environment. 20. A computer system comprising one or more processors, one or more computer-readable memories, and one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by
where the computing system component is a software system · CPC title
for systems · CPC title
Machine learning · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
by assessing time · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.