Using specified performance attributes to configure machine learning pipepline stages for an ETL job

US11941016B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11941016-B2
Application numberUS-202217687492-A
CountryUS
Kind codeB2
Filing dateMar 4, 2022
Priority dateNov 23, 2018
Publication dateMar 26, 2024
Grant dateMar 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Specified performance attributes may be used to configure machine learning transformations for ETL jobs. Performance attributes for a machine learning pipeline that applies a model to as part of a transformation for an ETL job may be used to configure a parameter in a stage of the machine learning pipeline. The configured stage may then be used when training the model. The trained machine learning pipeline may then be applied as part of a transformation operation included in an ETL job performed by the ETL system.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a plurality of computing devices, respectively comprising a processor and a memory, that are configured to implement an Extract Transform Load service, wherein the Extract Transform Load service is configured to: receive, via an interface for the Extract Transform Load service offered by a provider network, one or more ETL job creation requests to create an ETL job, wherein the one or more requests include a selection of a machine learning pipeline with a trained machine learning model to perform a transformation operation in addition to one or more other operations to include in the ETL job, wherein the one or more requests configure one or more parameters of the machine learning pipeline; configure the one or more parameters of a stage in the machine learning pipeline that applies the machine learning model according to the one or more requests; and execute the ETL job including the transformation operation performed by the machine learning pipeline and the one or more other operations in the ETL job. 2. The system of claim 1 , wherein the transformation operation is a record linking operation according to a similarity determined between two or more items by the machine learning model. 3. The system of claim 1 , wherein the transformation operation is a data scrubbing operation. 4. The system of claim 1 , wherein the interface of the ETL service displays a graph of the ETL job. 5. The system of claim 1 , wherein one of the one or more parameters specifies a threshold for including an item in a cluster of similar items determined using the machine learning model. 6. The system of claim 1 , wherein the Extract Transform Load service is further configured to display, via the interface one or more trained machine learning models, including the trained machine learning model responsive to a search request received via the interface. 7. The system of claim 1 , wherein the Extract Transform Load service is implemented as part of a provider network, wherein ETL job obtains data stored in another service of the provider network and stores a result of the ETL job in the other service of the provider network or a different service of the provider network. 8. A method, comprising: receiving, via an interface for an Extract Transform Load service offered by a provider network, one or more ETL job creation requests to create an ETL job, wherein the one or more requests include a selection of a machine learning pipeline with a trained machine learning model to perform a transformation operation in addition to one or more other operations to include in the ETL job, wherein the one or more requests configure one or more parameters of the machine learning pipeline; configuring, by the ETL service, the one or more parameters of a stage in the machine learning pipeline that applies the machine learning model according to the one or more requests; and executing, by the ETL service, the ETL job including the transformation operation performed by the machine learning pipeline and the one or more other operations in the ETL job. 9. The method of claim 8 , wherein the transformation operation is a record linking operation according to a similarity determined between two or more items by the machine learning model. 10. The method of claim 8 , wherein the transformation operation is a data scrubbing operation. 11. The method of claim 8 , wherein the interface of the ETL service displays a graph of the ETL job. 12. The method of claim 8 , wherein one of the one or more parameters specifies a threshold for including an item in a cluster of similar items determined using the machine learning model. 13. The method of claim 8 , further comprising displaying, via the interface, one or more trained machine learning models, including the trained machine learning model responsive to a search request received via the interface. 14. The method of claim 8 , wherein the Extract Transform Load service is implemented as part of a provider network, wherein ETL job obtains data stored in another service of the provider network and stores a result of the ETL job in the other service of the provider network or a different service of the provider network. 15. One or more non-transitory computer-readable storage media storing program instructions that, when executed on or across one or more computing devices, cause the one or more computing devices to implement: receiving, via an interface for an Extract Transform Load service offered by a provider network, one or more ETL job creation requests to create an ETL job, wherein the one or more requests include a selection of a machine learning pipeline with a trained machine learning model to perform a transformation operation in addition to one or more other operations to include in the ETL job, wherein the one or more requests configure one or more parameters of the machine learning pipeline; configuring, by the ETL service, the one or more parameters of a stage in the machine learning pipeline that applies the machine learning model according to the one or more requests; and executing, by the ETL service, the ETL job including the transformation operation performed by the machine learning pipeline and the one or more other operations in the ETL job. 16. The one or more non-transitory computer-readable storage media of claim 15 , wherein the transformation operation is a record linking operation according to a similarity determined between two or more items by the machine learning model. 17. The one or more non-transitory computer-readable storage media of claim 15 , wherein the transformation operation is a data scrubbing operation. 18. The one or more non-transitory computer-readable storage media of claim 15 , wherein the interface of the ETL service displays a graph of the ETL job. 19. The one or more non-transitory computer-readable storage media of claim 15 , wherein one of the one or more parameters specifies a threshold for including an item in a cluster of similar items determined using the machine learning model. 20. The one or more non-transitory computer-readable storage media of claim 15 , storing further program instructions that when executed by the one or more computing devices cause the one or more computing devices to further implement displaying, via the interface, one or more trained machine learning models, including the trained machine learning model responsive to a search request received via the interface.

Assignees

Inventors

Classifications

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • User-generated data transfer, e.g. clipboards, dynamic data exchange [DDE], object linking and embedding [OLE] · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11941016B2 cover?
Specified performance attributes may be used to configure machine learning transformations for ETL jobs. Performance attributes for a machine learning pipeline that applies a model to as part of a transformation for an ETL job may be used to configure a parameter in a stage of the machine learning pipeline. The configured stage may then be used when training the model. The trained machine learn…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).