What technology area does this patent fall under?

Primary CPC classification H04L67/1097. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for offloading preprocessing of machine learning data to remote storage

US12438943B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12438943-B2
Application number	US-202217981077-A
Country	US
Kind code	B2
Filing date	Nov 4, 2022
Priority date	Nov 17, 2021
Publication date	Oct 7, 2025
Grant date	Oct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An illustrative embodiment disclosed herein is an apparatus including a processor having programmed instructions to place a first compute resource in a storage node of an object storage platform and to place a second compute resource in a compute node in a client coupled to the object storage platform via a public network. In some embodiments, unstructured data is stored in the storage node. In some embodiments, the first compute resource of the storage node preprocesses the unstructured data. In some embodiments, the preprocessed unstructured data is sent to the compute node. In some embodiments, the second compute resource trains a machine learning (ML) model using the preprocessed unstructured data.

First claim

Opening claim text (preview).

What is claimed: 1. An apparatus comprising a processor and a memory, wherein the memory includes programmed instructions that, when executed by the processor, cause the apparatus to: assign, by a resource scheduler, a first virtualized compute resource to a storage node of an object store on a first cloud, the storage node including a virtualized storage resource, wherein unstructured data is stored in the storage node; preprocess, by the first virtualized compute resource of the storage node, at the storage node on the first cloud, the unstructured data stored in the storage node to generate preprocessed data; transfer, via a public network, the preprocessed data generated by the first virtualized compute resource of the storage node to a compute node on a client system separate from the first cloud; and assign, by the resource scheduler, a second virtualized compute resource to the compute node of the client system, wherein the second compute resource trains a machine learning (ML) model using the preprocessed data. 2. The apparatus of claim 1 , wherein the storage node comprises a hyper-converged infrastructure (HCI) node. 3. The apparatus of claim 1 , wherein the preprocessing the unstructured data comprises at least two preprocessing steps, wherein a first preprocessing step of the at least two preprocessing steps includes parsing the unstructured data, and wherein a second preprocessing step of the at least two preprocessing steps includes filtering the unstructured data. 4. The apparatus of claim 1 , wherein the client system is on a second cloud. 5. The apparatus of claim 1 , wherein the unstructured data is partitioned into a first chunk on the storage node and a second chunk on a second storage node of the object storage platform, wherein a third virtualized compute resource preprocesses the second chunk. 6. The apparatus of claim 1 , wherein the storage node is an accelerator-enabled node. 7. The apparatus of claim 1 , wherein the memory includes the programmed instructions that, when executed by the processor, further cause the apparatus to generate a template in which the first virtualized compute resource of the storage node preprocesses the unstructured data. 8. The apparatus of claim 1 , wherein the memory includes the programmed instructions that, when executed by the processor, further cause the apparatus to determine, based on a mapping operation of the preprocessing the unstructured data resulting in an increase of data volume, to execute the mapping operation at the compute node of the client system. 9. A non-transitory computer readable storage medium comprising instructions stored thereon that, when executed by a processor, cause the processor to: assign, by a resource scheduler, a first virtualized compute resource to a storage node of an object storage platform on a first cloud, the storage node including a virtualized storage resource, wherein unstructured data is stored in the storage node; preprocess, by the first virtualized compute resource of the storage node, at the storage node on the first cloud, the unstructured data stored in the storage node to generate preprocessed data; transfer, via a public network, the preprocessed data generated by the first virtualized compute resource of the storage node to a compute node on a client system separate from the first cloud; and assign, by the resource scheduler, a second virtualized compute resource to the compute node on the client system, wherein the second compute resource trains a machine learning (ML) model using the preprocessed data. 10. The medium of claim 9 , wherein the storage node comprises a hyper-converged infrastructure (HCI) node. 11. The medium of claim 9 , wherein preprocessing includes at least two preprocessing steps, wherein a first preprocessing step of the at least two preprocessing steps includes parsing the unstructured data, and wherein a second preprocessing step of the at least two preprocessing steps includes filtering the unstructured data. 12. The medium of claim 9 , wherein the second virtualized compute resource further preprocesses the preprocessed unstructured data before using the preprocessed unstructured data to train the ML model. 13. The medium of claim 9 , wherein the unstructured data is partitioned into a first chunk on the storage node and a second chunk on a second storage node of the object storage platform, wherein a third virtualized compute resource preprocesses the second chunk. 14. The medium of claim 9 , wherein the storage node is an accelerator-enabled node. 15. The medium of claim 9 , comprising the instructions stored thereon that, when executed by the processor, further cause the processor to generate a template in which the first virtualized compute resource of the storage node preprocesses the unstructured data. 16. The medium of claim 9 , comprising the instructions stored thereon that, when executed by the processor, further cause the processor to determine, based on a mapping operation of the preprocessing the unstructured data resulting in an increase of data volume, to execute the mapping operation at the compute node of the client system. 17. A computer-implemented method, comprising: assigning, by a processor associated with a resource scheduler, a first virtualized compute resource to a storage node of an object store on a first cloud, the storage node including a virtualized storage resource, wherein unstructured data is stored in the storage node; preprocessing, by the first virtualized compute resource of the storage node, at the storage node on the first cloud, the unstructured data stored in the storage node to generate preprocessed data; transferring, via a public network, the preprocessed data generated by the first virtualized compute resource of the storage node to a compute node on a client system separate from the first cloud; and assigning, by the processor associated with the resource scheduler, a second virtualized compute resource to the compute node on the client system, wherein the second virtualized compute resource trains a machine learning (ML) model using the preprocessed data. 18. The method of claim 17 , wherein a first preprocessing step of the preprocessing includes parsing the unstructured data, and wherein a second preprocessing step of the preprocessing includes filtering the unstructured data. 19. The method of claim 17 , wherein the second virtualized compute resource further preprocesses the preprocessed unstructured data before using the preprocessed unstructured data to train the ML model. 20. The method of claim 17 , wherein the unstructured data is partitioned into a first chunk on the storage node and a second chunk on a second storage node of the object storage platform, wherein a third virtualized compute resource preprocesses the second chunk. 21. The method of claim 17 , wherein the storage node is an accelerator-enabled node. 22. The method of claim 17 , further comprising generating a template in which the first virtualized compute resource of the storage node preprocesses the unstructured data. 23. The computer-implemented method of claim 17 , further comprising determining, based on a mapping operation of the preprocessing the unstructured data resulting in an increase of data volume, to execute the mapping operation at the compute node of the client system.

Assignees

Nutanix Inc

Inventors

Classifications

G06F9/45533
Hypervisors; Virtual machine monitors · CPC title
G06N20/00
Machine learning · CPC title
H04L67/1097Primary
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

View patent family 86323213

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12438943B2 cover?: An illustrative embodiment disclosed herein is an apparatus including a processor having programmed instructions to place a first compute resource in a storage node of an object storage platform and to place a second compute resource in a compute node in a client coupled to the object storage platform via a public network. In some embodiments, unstructured data is stored in the storage node. In…
Who is the assignee on this patent?: Nutanix Inc
What technology area does this patent fall under?: Primary CPC classification H04L67/1097. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).