Automating application provisioning for heterogeneous datacenter environments
US-9766935-B2 · Sep 19, 2017 · US
US11537936B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11537936-B2 |
| Application number | US-201916250770-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 17, 2019 |
| Priority date | Jan 17, 2019 |
| Publication date | Dec 27, 2022 |
| Grant date | Dec 27, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system may include memory containing: (i) a master data set representable in columns and rows, and (ii) a query expression. The system may include a software application configured to apply a machine learning (ML) pipeline to an input data set. The system may include a computing device configured to: obtain the master data set and the query expression; apply the query expression to the master data set to generate a test data set, where applying the query expression comprises, based on content of the query expression, generating the test data set to have one or more columns or one or more rows fewer than the master data set; apply the ML pipeline to the test data set, where applying the ML pipeline results in either generation of a test ML model from the test data set or indication of an error in the test data set; and delete the test data set from the memory.
Opening claim text (preview).
What is claimed is: 1. A system comprising: memory containing: (i) a master data set representable in columns and rows, wherein the columns define fields of the master data set and the rows define entries in the master data set, and (ii) a query expression; a software application configured to apply a machine learning (ML) pipeline to a test data set, wherein the ML pipeline includes a build determination phase and an ML model building phase, wherein the build determination phase decides whether to invoke the ML model building phase based on characteristics of the test data set, and wherein the ML model building phase generates an ML model from the test data set; and a computing device configured to: obtain, from the memory, the master data set and the query expression; apply the query expression to the master data set to generate the test data set from the master data set, wherein applying the query expression comprises, based on content of the query expression, generating the test data set to have one or more columns or one or more rows fewer than the master data set, wherein the query expression specifies one or more columns of the master data set, one or more rows of the master data set, or a combination thereof; store, in the memory, the test data set; apply, by way of the software application, the ML pipeline to the test data set, wherein applying the ML pipeline results in either generation of a test ML model from the test data set or indication of an error in the test data set; and in response to applying the ML pipeline to the test data set, delete the test data set from the memory. 2. The system of claim 1 , wherein the memory, the software application, and the computing device are disposed within a computational instance of a remote network management platform, and wherein the master data set was derived from activity that took place on a managed network associated with the computational instance. 3. The system of claim 2 , wherein the computational instance is a centralized computational instance shared by a plurality of managed networks, and wherein the managed network accesses the central computational instance by way of a particular computational instance that is dedicated to the managed network. 4. The system of claim 1 , wherein obtaining the master data set comprises: determining that the query expression specifies combining two or more input files; and performing a merge or a join operation on the two or more input files to generate the master data set. 5. The system of claim 1 , wherein applying the query expression to the master data set comprises: generating the test data set to have only the one or more columns that were specified, only the one or more rows that were specified, or a combination thereof. 6. The system of claim 1 , wherein the query expression specifies replacing instances of a string in a particular one of the columns with a replacement string, and wherein applying the query expression to the master data set comprises: finding each of the instances of the string in the particular one of the columns; and representing, in the test data set, each of the instances of the string with the replacement string. 7. The system of claim 1 , wherein the query expression specifies replacing rows of text in a particular one of the columns with one of a plurality of replacement strings, and wherein applying the query expression to the master data set comprises: representing, in the test data set, rows of text in a particular one of the columns with a string randomly selected from the plurality of replacement strings. 8. The system of claim 1 , wherein the query expression specifies translating rows of text in a particular one of the columns from a first language to a second language, and wherein applying the query expression to the master data set comprises: transmitting, to an external application programming interface, the rows of text; receiving, from the external application programming interface, the rows of text as translated into the second language; and representing, in the test data set, the rows of text with the translations thereof. 9. The system of claim 1 , wherein the master data set is stored in an input file, wherein the query expression specifies the input file as a source and an output file as a destination, and wherein applying the query expression to the master data set comprises: reading, from the input file, the master data set; and writing, to the output file, the test data set. 10. The system of claim 1 , wherein the query expression contains a filter to be applied to a particular one of the columns, wherein the filter is based on a type of content in the particular one of the columns, and wherein applying the query expression to the master data set comprises: representing, in the test data set, only rows with entries for the particular one of the columns that match the filter. 11. The system of claim 10 , wherein the filter specifies a range of values or a text string. 12. The system of claim 10 , wherein the filter specifies a density for the particular one of the columns, and wherein representing, in the test data set, only rows with entries for the particular one of the columns that match the filter comprises: representing, in the test data set, rows with null and non-null values with in accordance with the density. 13. The system of claim 10 , wherein the filter specifies a distribution for the particular one of the columns, and wherein representing, in the test data set, only rows with entries for the particular one of the columns that match the filter comprises: representing, in the test data set, rows that exhibit values in accordance with the distribution. 14. The system of claim 10 , wherein the filter specifies a user-defined operation for the particular one of the columns, and wherein representing, in the test data set, only rows with entries for the particular one of the columns that match the filter comprises: representing, in the test data set, rows that exhibit values in accordance with the user-defined operation. 15. The system of claim 1 , wherein the query expression specifies a limit to rows in the test data set, and wherein generating the test data set to have one or more columns or one or more rows fewer than the master data set comprises: generating the test data set to have no more than a number of rows defined by the limit. 16. A computer-implemented method comprising: obtaining, by a computing device and from a memory, a master data set and a query expression, wherein the master data set is representable in columns and rows, and wherein the columns define fields of the master data set and the rows define entries in the master data set; applying, by the computing device, the query expression to the master data set to generate a test data set from the master data set, wherein applying the query expression comprises, based on content of the query expression, generating the test data set to have one or more columns or one or more rows fewer than the master data set, wherein the query expression specifies one or more columns of the master data set, one or more rows of the master data set, or a combination thereof; storing, by the computing device and in the memory, the test data set; applying, by the computing device, a machine learning (ML) pipeline to the test data set, wherein the ML pipeline includes a build determination phase and an ML model building phase, wherein the build determination phase decides whether to invoke the ML model building phase based on characteristics of an input dat
Tablespace storage structures; Management thereof · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Classification techniques · CPC title
Machine learning · CPC title
Join operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.