What technology area does this patent fall under?

Primary CPC classification G06F3/0655. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Machine learning facets for dataset preparation in storage devices

US12450003B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12450003-B2
Application number	US-202217821513-A
Country	US
Kind code	B2
Filing date	Aug 23, 2022
Priority date	Aug 23, 2022
Publication date	Oct 21, 2025
Grant date	Oct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples described herein relate to preparing datasets in a storage device for machine learning (ML) applications. Examples include maintaining ML facet mappings between ML facets and dataset preparation tags, deriving ML facets of a dataset stored in the storage device, and generating filtered datasets from the datasets using the ML facets and ML facet mappings. The filtered dataset is associated with improved dataset quality compared to unfiltered dataset. The storage device transmits the filtered dataset to ML applications requesting the dataset. Some examples include recommending, by the storage device, ML facets to the ML application based on performance metrics.

First claim

Opening claim text (preview).

What is claimed is: 1. A storage device comprising: a processing resource; and a non-transitory machine-readable storage medium comprising instructions executable by the processing resource to: store machine learning (ML) facet mappings between ML facets and dataset preparation tags in a repository, wherein the ML facets are properties of datasets or ML models for optimizing quality of the datasets; identify a ML facet of a dataset stored in the storage device; determine, based on at least one of dataset metrics of the dataset, storage performance metrics of the storage device, and application performance metrics, a first quality score for the dataset, wherein the first quality score indicates an amount of relevant information in the dataset; identify a dataset preparation tag mapped to the identified ML facet as indicated in the ML facet mappings; generate a filtered dataset from the dataset based on the dataset preparation tag and determine, based on at least one of dataset metrics of the filtered dataset, the storage performance metrics of the storage device, and the application performance metrics, a second quality score that indicates an amount of relevant information in the filtered dataset; and in response to a request for the dataset from an ML application and determining that the second quality score is greater than the first quality score, transmit the filtered dataset to the ML application across a bandwidth-limited communication link. 2. The storage device of claim 1 , wherein to identify the ML facet, the processing resource executes one or more of the instructions to: input the dataset to analytics workflow, wherein the analytics workflow determines the ML facet of the dataset and a dataset portion associated with the ML facet. 3. The storage device of claim 2 , wherein to generate the filtered dataset, the processing resource executes one or more of the instructions to: Identify a dataset preparation operation indicated in the dataset preparation tag; and prepare the dataset based on the dataset preparation operation and the dataset portion. 4. The storage device of claim 1 , further comprising: an ML facets store to store ML facets of each dataset in the storage device and an identifier of the respective dataset. 5. The storage device of claim 1 , wherein the processing resource executes one or more of the instructions to: store an ML facet mapping between ML facets, application type, and dataset type. 6. The storage device of claim 5 , wherein the processing resource executes one or more of the instructions to: in response to receiving the request for the dataset, recommend one or more of the ML facets to the ML application for selection based on the mapping between the ML facets, the application type, and the dataset type. 7. The storage device of claim 6 , wherein to recommend the ML facets, the processing resource executes one or more of the instructions to: identify one or more of the ML facets based on the dataset type of the dataset and the application type of the ML application; and transmit one or more the ML facets to the ML application as a recommendation. 8. The storage device of claim 7 , further comprising a user interface to: present one or more of the ML facets to the ML application for selection. 9. The storage device of claim 1 , wherein the processing resource executes one or more of the instructions to: receive, from a test application, the application performance metrics, wherein the application performance metrics include one or more of time-to-insights, accuracy, precision, or recall; and determine the storage performance metrics, the dataset metrics of the dataset, and the dataset metrics of the filtered dataset, wherein: the storage performance metrics include one or more of samples per IO operation or throughput; and the dataset metrics of the dataset and the dataset metrics of the filtered dataset include at least a dataset size. 10. The storage device of claim 9 , wherein the processing resource executes one or more of the instructions to: determine a rank for each of the ML facets based on the storage performance metrics, the dataset metrics, and the application performance metrics; and recommend the ML facets to the ML application based on the rank. 11. The storage device of claim 1 , wherein the processing resource executes one or more of the instructions to: store the filtered dataset in persistent storage of the storage device; create a volume containing the filtered dataset; and display the volume to the ML application. 12. The storage device of claim 1 , wherein the ML facets include one or more of correlated features, non-correlated features, hyperparameters, bias, seasonality, balanced dataset, mean, quadrant, private data, variance, missing values, data completeness, anomalous dataset, quantization, high frequency filtering, and null datasets. 13. A method comprising: storing, by a storage device, machine learning (ML) facet mappings between ML facets and dataset preparation tags in a repository, wherein the ML facets are properties of datasets or ML models for optimizing quality of the datasets; identifying, by the storage device, one or more ML facets of a dataset stored in the storage device; determining, based on at least one of dataset metrics of the dataset, storage performance metrics of the storage device, and application performance metrics, a first quality score for the dataset, wherein the first quality score indicates an amount of relevant information in the dataset; receiving, by the storage device, a request for the dataset from an ML application executing on a computing device; recommending, by the storage device, the one or more ML facets to the ML application for selection; generating, by the storage device, a filtered dataset from the dataset based on dataset preparation tags mapped to the selected ML facets and determining, based on at least one of dataset metrics of the filtered dataset, the storage performance metrics of the storage device, and the application performance metrics, a second quality score that indicates an amount of relevant information in the filtered dataset; and in response to determining that the second quality score is greater than the first quality score, transmitting, by the storage device, the filtered dataset to the ML application across a bandwidth-limited communication link. 14. The method of claim 13 , further comprising: in response to generating the filtered dataset, applying, by the storage device, a dataset management policy for the filtered dataset based on the ML facets, wherein the dataset management policy includes rules to perform one or more of data protection, data backup, or data tiering. 15. The method of claim 14 , wherein: the application performance metrics include one or more of time-to-insights, accuracy, precision, or recall; the storage performance metrics include one or more of samples per IO operation and throughput; and the dataset metrics of the dataset and the dataset metrics of the filtered dataset include at least a dataset size. 16. The method of claim 15 , further comprising: based on the quality scores, storing the filtered dataset in a first storage component and the dataset in a second storage component, wherein the first storage component allows faster data retrieval. 17. The method of claim 14 , further comprising: in response to determining that the ML facets include sensitive data, apply the dataset management policy to encrypt the filtered dataset. 18. The method of

Assignees

Hewlett Packard Entpr Dev Lp

Inventors

Classifications

G06F3/0604
Improving or facilitating administration, e.g. storage management · CPC title
G06F3/0679
Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title
G06N20/00
Machine learning · CPC title
G06F16/215
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06F3/0671
In-line storage system · CPC title

Patent family

Related publications grouped by family.

View patent family 89844225

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12450003B2 cover?: Examples described herein relate to preparing datasets in a storage device for machine learning (ML) applications. Examples include maintaining ML facet mappings between ML facets and dataset preparation tags, deriving ML facets of a dataset stored in the storage device, and generating filtered datasets from the datasets using the ML facets and ML facet mappings. The filtered dataset is associa…
Who is the assignee on this patent?: Hewlett Packard Entpr Dev Lp
What technology area does this patent fall under?: Primary CPC classification G06F3/0655. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Data facet generation and recommendation

Method and system for automated quality assurance in radiation therapy

Diagnostic tool to tool matching and full-trace drill-down analyasis methods for manufacturing equipment

Methods and systems for data backup based on data classification

Methods and systems for metadata tag inheritance between multiple file systems within a storage system

Frequently asked questions