What technology area does this patent fall under?

Primary CPC classification G06N5/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Throughput based sizing for hive deployment

US12423596B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12423596-B2
Application number	US-202016918540-A
Country	US
Kind code	B2
Filing date	Jul 1, 2020
Priority date	Jul 1, 2020
Publication date	Sep 23, 2025
Grant date	Sep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data performance measurement of a computer system is measured. A future value of the data performance measurement is forecasted by executing a forecasting model. A set of throughput model input parameters is configured. A throughput requirement for the computer system is computed by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement. A capacity requirement corresponding to the throughput requirement is determined. A resource within the computer system is deployed according to the capacity requirement.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: measuring a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system, the data performance measurement including a number of queries performed on data stored in the computer system over a time series, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data; forecasting, by executing a forecasting model, a future value of the data performance measurement on the time series of the computer system; configuring a set of throughput model input parameters; computing, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system being sized by computing without utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset of storage devices in a cluster, and a compression percentage; determining a capacity requirement corresponding to the throughput requirement; and deploying, according to the capacity requirement, a resource within the computer system. 2. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of data being stored by the computer system. 3. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of a Hive implementation implemented on the computer system and the configured set of throughput model input parameters change a false positive rate of a bloom filter of the Hive implementation. 4. The computer-implemented method of claim 1 , wherein the data performance measurement measures a characteristic of a set of queries performed on data being stored by the computer system. 5. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a performance requirement of the computer system. 6. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a set of queries performed on data being stored by the computer system. 7. The computer-implemented method of claim 1 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a Hive implementation implemented on the computer system. 8. The computer-implemented method of claim 1 , wherein the data performance measurement determines an architecture of a Hive implementation implemented on the computer system and the capacity requirement is determined according to the architecture. 9. A computer program product for throughput-based node sizing and deployment, the computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to measure a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system over a time series, the data performance measurement including a number of queries performed on data stored in the computer system, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data; program instructions to forecast, by executing a forecasting model, a future value of the data performance measurement of the time series of the computer system; program instructions to configure a set of throughput model input parameters; program instructions to compute, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system, the throughput requirement for a Hive implementation of storage devices in a cluster being sized by computing utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset, a number of strides in the Hive dataset, and a bloom filter false positive rate of the Hive implementation; program instructions to determine a capacity requirement corresponding to the throughput requirement; and program instructions to deploy, according to the capacity requirement, a resource within the computer system. 10. The computer program product of claim 9 , wherein the data performance measurement measures a characteristic of data being stored by the computer system. 11. The computer program product of claim 9 , wherein the configured set of throughput model input parameters change the false positive rate of the bloom filter of the Hive implementation. 12. The computer program product of claim 9 , wherein the data performance measurement measures a characteristic of a set of queries performed on data being stored by the computer system. 13. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a performance requirement of the computer system. 14. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of a set of queries performed on data being stored by the computer system. 15. The computer program product of claim 9 , wherein a throughput model input parameter in the set of throughput model input parameters comprises a characteristic of the Hive implementation implemented on the computer system. 16. The computer program product of claim 9 , wherein the data performance measurement determines an architecture of the Hive implementation implemented on the computer system and the capacity requirement is determined according to the architecture. 17. The computer program product of claim 9 , wherein the stored program instructions are stored in the at least one of the one or more storage media of a local data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system. 18. The computer program product of claim 9 , wherein the stored program instructions are stored in the at least one of the one or more storage media of a server data processing system, and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system. 19. The computer program product of claim 9 , wherein the computer program product is provided as a service in a cloud environment. 20. A computer system comprising one or more processors, one or more computer-readable memories, and one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by

Assignees

Kyndryl Inc

Inventors

Classifications

G06F11/302
where the computing system component is a software system · CPC title
G06F11/3495
for systems · CPC title
G06N20/00
Machine learning · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06F11/3419
by assessing time · CPC title

Patent family

Related publications grouped by family.

View patent family 79167564

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12423596B2 cover?: A data performance measurement of a computer system is measured. A future value of the data performance measurement is forecasted by executing a forecasting model. A set of throughput model input parameters is configured. A throughput requirement for the computer system is computed by executing a throughput model using the set of throughput model input parameters and the future value of the dat…
Who is the assignee on this patent?: Kyndryl Inc
What technology area does this patent fall under?: Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

NOSQL Database Capacity Configuration Optimization System for Cloud Computing

Clustered database reconfiguration system for time-varying workloads

Systems and methods for generating performance prediction model and estimating execution time for applications

Cost-based optimization of configuration parameters and cluster sizing for hadoop

Frequently asked questions