Ensuring reproducibility in an artificial intelligence infrastructure

US10360214B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10360214-B2
Application numberUS-201816045814-A
CountryUS
Kind codeB2
Filing dateJul 26, 2018
Priority dateOct 19, 2017
Publication dateJul 23, 2019
Grant dateJul 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Ensuring reproducibility in an artificial intelligence infrastructure that includes one or more storage systems and one or more graphical processing unit (‘GPU’) servers, including: identifying, by a unified management plane, one or more transformations applied to a dataset by the artificial intelligence infrastructure, wherein applying the one or more transformations to the dataset causes the artificial intelligence infrastructure to generate a transformed dataset; storing, within the one or more storage systems, information describing the dataset, the one or more transformations applied to the dataset, and the transformed dataset; identifying, by the unified management plane, one or more machine learning models executed by the artificial intelligence infrastructure using the transformed dataset as input; and storing, within the one or more storage systems, information describing one or more machine learning models executed using the transformed dataset as input.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of ensuring reproducibility in an artificial intelligence infrastructure that includes one or more storage systems and one or more graphical processing unit (Gal) servers, the method comprising: identifying, by a unified management plane, one or more transformations applied to a dataset by the artificial intelligence infrastructure, wherein applying the one or more transformations to the dataset causes the artificial intelligence infrastructure to generate a transformed dataset; storing, within the one or more storage systems, information describing the dataset, the one or more transformations applied to the dataset, and the transformed dataset; identifying, by the unified management plane, one or more machine learning models executed by the artificial intelligence infrastructure using the transformed dataset as input; storing, within the one or more storage systems, information describing one or more machine learning models executed using the transformed dataset as input; determining, by the artificial intelligence infrastructure, whether data related to a previously executed machine learning model should be tiered off of the one or more storage systems; and responsive to determining that the data related to the previously executed machine learning model should be tiered off of the one or more storage systems: storing the data related to the previously executed machine learning model in lower-tier storage; and removing, from the one or more storage systems, the data related to the previously executed machine learning model. 2. The method of claim 1 wherein storing, within the one or more storage systems, information describing the dataset, the one or more transformations applied to the dataset, and the transformed dataset further comprises: generating, by the artificial intelligence infrastructure applying a predetermined hash function to the dataset, the one or more transformations applied to the dataset, and the transformed dataset, a hash value; and storing, within the one or more storage systems, the hash value. 3. The method of claim 1 wherein storing, within the one or more storage systems, information describing one or more machine learning models executed using the transformed dataset as input further comprises: generating, by the artificial intelligence infrastructure applying a predetermined hash function to the one or more machine learning models and the transformed dataset, a hash value; and storing, within the one or more storage systems, the hash value. 4. The method of claim 1 further comprising: identifying, by the unified management plane, differences between a machine learning model and a machine learning model previously executed by the artificial intelligence infrastructure; and storing, within the one or more storage systems, only the portion of the machine learning model that differs from the machine learning models previously executed by the artificial intelligence infrastructure. 5. The method of claim 1 further comprising identifying, from amongst a plurality of machine learning models, a preferred machine learning model. 6. The method of claim 1 further comprising tracking the improvement of a particular machine learning model over time. 7. An artificial intelligence infrastructure that includes one or more storage systems and one or more graphical processing unit (GMT) servers, the artificial intelligence infrastructure configured to carry out the steps of: identifying, by a unified management plane, one or more transformations applied to a dataset by the artificial intelligence infrastructure, wherein applying the one or more transformations to the dataset causes the artificial intelligence infrastructure to generate a transformed dataset; storing, within the one or more storage systems, information describing the dataset, the one or more transformations applied to the dataset, and the transformed dataset; identifying, by the unified management plane, one or more machine learning models executed by the artificial intelligence infrastructure using the transformed dataset as input; storing, within the one or more storage systems, information describing one or more machine learning models executed using the transformed dataset as input; determining, by the artificial intelligence infrastructure, whether data related to a previously executed machine learning model should be tiered off of the one or more storage systems; and responsive to determining that the data related to the previously executed machine learning model should be tiered off of the one or more storage systems: storing the data related to the previously executed machine learning model in lower-tier storage; and removing, from the one or more storage systems, the data related to the previously executed machine learning model. 8. The artificial intelligence infrastructure of claim 7 wherein storing, within the one or more storage systems, information describing the dataset, the one or more transformations applied to the dataset, and the transformed dataset further comprises: generating, by the artificial intelligence infrastructure applying a predetermined hash function to the one or more transformations applied to the dataset and the transformed dataset, a hash value; and storing, within the one or more storage systems, the hash value. 9. The artificial intelligence infrastructure of claim 7 wherein storing, within the one or more storage systems, information describing one or more machine learning models executed using the transformed dataset as input further comprises: generating, by the artificial intelligence infrastructure applying a predetermined hash function to the one or more machine learning models, a hash value; and storing, within the one or more storage systems, the hash value. 10. The artificial intelligence infrastructure of claim 7 wherein the artificial intelligence infrastructure is further configured to carry out the steps of: identifying, by the unified management plane, differences between a machine learning model and a machine learning model previously executed by the artificial intelligence infrastructure; and storing, within the one or more storage systems, only the portion of the machine learning model that differs from the machine learning models previously executed by the artificial intelligence infrastructure. 11. The artificial intelligence infrastructure of claim 7 wherein the artificial intelligence infrastructure is further configured to carry out the step of identifying, from amongst a plurality of machine learning models, a preferred machine learning model. 12. The artificial intelligence infrastructure of claim 7 wherein the artificial intelligence infrastructure is further configured to carry out the step of tracking the improvement of a particular machine learning model over time. 13. An apparatus for ensuring reproducibility in an artificial intelligence infrastructure that includes one or more storage systems and one or more graphical processing unit (Gal) servers, the apparatus comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the steps of: identifying, by a unified management plane, one or more transformations applied to a dataset by the artificial intelligence infrastructure, wherein applying the one or more transformations to the dataset causes the artificial intelligence infrastructure to generate a transformed dataset; storing, within the one or mo

Assignees

Inventors

Classifications

  • Migration mechanisms · CPC title

  • Configuration or reconfiguration of storage systems · CPC title

  • G06F3/061Primary

    Improving I/O performance · CPC title

  • Query rewriting; Transformation · CPC title

  • involving image processing hardware · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10360214B2 cover?
Ensuring reproducibility in an artificial intelligence infrastructure that includes one or more storage systems and one or more graphical processing unit (‘GPU’) servers, including: identifying, by a unified management plane, one or more transformations applied to a dataset by the artificial intelligence infrastructure, wherein applying the one or more transformations to the dataset causes the …
Who is the assignee on this patent?
Pure Storage Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/061. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).