Throughput based sizing for hive deployment

US12423596B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12423596-B2
Application numberUS-202016918540-A
CountryUS
Kind codeB2
Filing dateJul 1, 2020
Priority dateJul 1, 2020
Publication dateSep 23, 2025
Grant dateSep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data performance measurement of a computer system is measured. A future value of the data performance measurement is forecasted by executing a forecasting model. A set of throughput model input parameters is configured. A throughput requirement for the computer system is computed by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement. A capacity requirement corresponding to the throughput requirement is determined. A resource within the computer system is deployed according to the capacity requirement.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: measuring a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system, the data performance measurement including a number of queries performed on data stored in the computer system over a time series, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data; forecasting, by executing a forecasting model, a future value of the data performance measurement on the time series of the computer system; configuring a set of throughput model input parameters; computing, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system being sized by computing without utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset of storage devices in a cluster, and a compression percentage; determining a capacity requirement corresponding to the throughput requirement; and deploying, according to the capacity requirement, a resource within the computer system. 2. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of data being stored by the computer system. 3. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of a Hive implementation implemented on the computer system and the configured set of throughput model input parameters change a false positive rate of a bloom filter of the Hive implementation. 4. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of a set of queries performed on data being stored by the computer system. 5. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a performance requirement of the computer system. 6. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a set of queries performed on data being stored by the computer system. 7. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a Hive implementation implemented on the computer system. 8. The computer-implemented method of claim 1 , wherein the data performance measurement determines an architecture of a Hive implementation implemented on the computer system and the capacity requirement is determined according to the architecture. 9. A computer program product for throughput-based node sizing and deployment, the computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to measure a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system over a time series, the data performance measurement including a number of queries performed on data stored in the computer system, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data; program instructions to forecast, by executing a forecasting model, a future value of the data performance measurement of the time series of the computer system; program instructions to configure a set of throughput model input parameters; program instructions to compute, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system, the throughput requirement for a Hive implementation of storage devices in a cluster being sized by computing utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset, a number of strides in the Hive dataset, and a bloom filter false positive rate of the Hive implementation; program instructions to determine a capacity requirement corresponding to the throughput requirement; and program instructions to deploy, according to the capacity requirement, a resource within the computer system. 10. The computer program product of claim 9 , wherein the data performance measurement measures a characteristic of data being stored by the computer system. 11. The computer program product of claim 9 , wherein the configured set of throughput model input parameters change the false positive rate of the bloom filter of the Hive implementation. 12. The computer program product of claim 9 , wherein the data performance measurement measures a characteristic of a set of queries performed on data being stored by the computer system. 13. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a performance requirement of the computer system. 14. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a set of queries performed on data being stored by the computer system. 15. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of the Hive implementation implemented on the computer system. 16. The computer program product of claim 9 , wherein the data performance measurement determines an architecture of the Hive implementation implemented on the computer system and the capacity requirement is determined according to the architecture. 17. The computer program product of claim 9 , wherein the stored program instructions are stored in the at least one of the one or more storage media of a local data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system. 18. The computer program product of claim 9 , wherein the stored program instructions are stored in the at least one of the one or more storage media of a server data processing system, and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system. 19. The computer program product of claim 9 , wherein the computer program product is provided as a service in a cloud environment. 20. A computer system comprising one or more processors, one or more computer-readable memories, and one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by

Assignees

Inventors

Classifications

  • where the computing system component is a software system · CPC title

  • for systems · CPC title

  • Machine learning · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • by assessing time · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12423596B2 cover?
A data performance measurement of a computer system is measured. A future value of the data performance measurement is forecasted by executing a forecasting model. A set of throughput model input parameters is configured. A throughput requirement for the computer system is computed by executing a throughput model using the set of throughput model input parameters and the future value of the dat…
Who is the assignee on this patent?
Kyndryl Inc
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).