System and method for offloading preprocessing of machine learning data to remote storage

US12438943B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12438943-B2
Application numberUS-202217981077-A
CountryUS
Kind codeB2
Filing dateNov 4, 2022
Priority dateNov 17, 2021
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An illustrative embodiment disclosed herein is an apparatus including a processor having programmed instructions to place a first compute resource in a storage node of an object storage platform and to place a second compute resource in a compute node in a client coupled to the object storage platform via a public network. In some embodiments, unstructured data is stored in the storage node. In some embodiments, the first compute resource of the storage node preprocesses the unstructured data. In some embodiments, the preprocessed unstructured data is sent to the compute node. In some embodiments, the second compute resource trains a machine learning (ML) model using the preprocessed unstructured data.

First claim

Opening claim text (preview).

What is claimed: 1. An apparatus comprising a processor and a memory, wherein the memory includes programmed instructions that, when executed by the processor, cause the apparatus to: assign, by a resource scheduler, a first virtualized compute resource to a storage node of an object store on a first cloud, the storage node including a virtualized storage resource, wherein unstructured data is stored in the storage node; preprocess, by the first virtualized compute resource of the storage node, at the storage node on the first cloud, the unstructured data stored in the storage node to generate preprocessed data; transfer, via a public network, the preprocessed data generated by the first virtualized compute resource of the storage node to a compute node on a client system separate from the first cloud; and assign, by the resource scheduler, a second virtualized compute resource to the compute node of the client system, wherein the second compute resource trains a machine learning (ML) model using the preprocessed data. 2. The apparatus of claim 1 , wherein the storage node comprises a hyper-converged infrastructure (HCI) node. 3. The apparatus of claim 1 , wherein the preprocessing the unstructured data comprises at least two preprocessing steps, wherein a first preprocessing step of the at least two preprocessing steps includes parsing the unstructured data, and wherein a second preprocessing step of the at least two preprocessing steps includes filtering the unstructured data. 4. The apparatus of claim 1 , wherein the client system is on a second cloud. 5. The apparatus of claim 1 , wherein the unstructured data is partitioned into a first chunk on the storage node and a second chunk on a second storage node of the object storage platform, wherein a third virtualized compute resource preprocesses the second chunk. 6. The apparatus of claim 1 , wherein the storage node is an accelerator-enabled node. 7. The apparatus of claim 1 , wherein the memory includes the programmed instructions that, when executed by the processor, further cause the apparatus to generate a template in which the first virtualized compute resource of the storage node preprocesses the unstructured data. 8. The apparatus of claim 1 , wherein the memory includes the programmed instructions that, when executed by the processor, further cause the apparatus to determine, based on a mapping operation of the preprocessing the unstructured data resulting in an increase of data volume, to execute the mapping operation at the compute node of the client system. 9. A non-transitory computer readable storage medium comprising instructions stored thereon that, when executed by a processor, cause the processor to: assign, by a resource scheduler, a first virtualized compute resource to a storage node of an object storage platform on a first cloud, the storage node including a virtualized storage resource, wherein unstructured data is stored in the storage node; preprocess, by the first virtualized compute resource of the storage node, at the storage node on the first cloud, the unstructured data stored in the storage node to generate preprocessed data; transfer, via a public network, the preprocessed data generated by the first virtualized compute resource of the storage node to a compute node on a client system separate from the first cloud; and assign, by the resource scheduler, a second virtualized compute resource to the compute node on the client system, wherein the second compute resource trains a machine learning (ML) model using the preprocessed data. 10. The medium of claim 9 , wherein the storage node comprises a hyper-converged infrastructure (HCI) node. 11. The medium of claim 9 , wherein preprocessing includes at least two preprocessing steps, wherein a first preprocessing step of the at least two preprocessing steps includes parsing the unstructured data, and wherein a second preprocessing step of the at least two preprocessing steps includes filtering the unstructured data. 12. The medium of claim 9 , wherein the second virtualized compute resource further preprocesses the preprocessed unstructured data before using the preprocessed unstructured data to train the ML model. 13. The medium of claim 9 , wherein the unstructured data is partitioned into a first chunk on the storage node and a second chunk on a second storage node of the object storage platform, wherein a third virtualized compute resource preprocesses the second chunk. 14. The medium of claim 9 , wherein the storage node is an accelerator-enabled node. 15. The medium of claim 9 , comprising the instructions stored thereon that, when executed by the processor, further cause the processor to generate a template in which the first virtualized compute resource of the storage node preprocesses the unstructured data. 16. The medium of claim 9 , comprising the instructions stored thereon that, when executed by the processor, further cause the processor to determine, based on a mapping operation of the preprocessing the unstructured data resulting in an increase of data volume, to execute the mapping operation at the compute node of the client system. 17. A computer-implemented method, comprising: assigning, by a processor associated with a resource scheduler, a first virtualized compute resource to a storage node of an object store on a first cloud, the storage node including a virtualized storage resource, wherein unstructured data is stored in the storage node; preprocessing, by the first virtualized compute resource of the storage node, at the storage node on the first cloud, the unstructured data stored in the storage node to generate preprocessed data; transferring, via a public network, the preprocessed data generated by the first virtualized compute resource of the storage node to a compute node on a client system separate from the first cloud; and assigning, by the processor associated with the resource scheduler, a second virtualized compute resource to the compute node on the client system, wherein the second virtualized compute resource trains a machine learning (ML) model using the preprocessed data. 18. The method of claim 17 , wherein a first preprocessing step of the preprocessing includes parsing the unstructured data, and wherein a second preprocessing step of the preprocessing includes filtering the unstructured data. 19. The method of claim 17 , wherein the second virtualized compute resource further preprocesses the preprocessed unstructured data before using the preprocessed unstructured data to train the ML model. 20. The method of claim 17 , wherein the unstructured data is partitioned into a first chunk on the storage node and a second chunk on a second storage node of the object storage platform, wherein a third virtualized compute resource preprocesses the second chunk. 21. The method of claim 17 , wherein the storage node is an accelerator-enabled node. 22. The method of claim 17 , further comprising generating a template in which the first virtualized compute resource of the storage node preprocesses the unstructured data. 23. The computer-implemented method of claim 17 , further comprising determining, based on a mapping operation of the preprocessing the unstructured data resulting in an increase of data volume, to execute the mapping operation at the compute node of the client system.

Assignees

Inventors

Classifications

  • Hypervisors; Virtual machine monitors · CPC title

  • Machine learning · CPC title

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12438943B2 cover?
An illustrative embodiment disclosed herein is an apparatus including a processor having programmed instructions to place a first compute resource in a storage node of an object storage platform and to place a second compute resource in a compute node in a client coupled to the object storage platform via a public network. In some embodiments, unstructured data is stored in the storage node. In…
Who is the assignee on this patent?
Nutanix Inc
What technology area does this patent fall under?
Primary CPC classification H04L67/1097. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).