What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic accuracy-based deployment and monitoring of machine learning models in provider networks

US11257002B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11257002-B2
Application number	US-201815919628-A
Country	US
Kind code	B2
Filing date	Mar 13, 2018
Priority date	Nov 22, 2017
Publication date	Feb 22, 2022
Grant date	Feb 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for dynamic accuracy-based experimentation and deployment of machine learning (ML) models are described. Inference traffic flowing to ML models and the accuracy of the models is analyzed and used to ensure that better performing models are executed more often via model selection. A predictive component can evaluate which model is more likely to be accurate for certain input data elements. Ensemble techniques can combine inference results of multiple ML models to aim to achieve a better overall result than any individual model could on its own.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a request to deploy a plurality of machine learning (ML) models within a provider network in association with a Hypertext Transfer Protocol (HTTP) endpoint, wherein the plurality of ML models were trained to perform a common type of inference task; configuring a model selector, within the provider network, to select between ones of the plurality of ML models according to a first distribution for inference requests received at the HTTP endpoint, the first distribution indicating that each ML model is to be selected according to a same likelihood; obtaining a plurality of inference results generated by the plurality of ML models; determining, based at least in part on the plurality of inference results, a plurality of accuracy scores corresponding to the plurality of ML models; updating the model selector, based on the plurality of accuracy scores, to cause the model selector to select ones of the plurality of ML models to generate inferences for inference requests received at the HTTP endpoint according to an updated distribution that is different than the first distribution; and providing, by the model selector, a plurality of inference requests received at the HTTP endpoint to the plurality of ML models according to the updated distribution. 2. The method of claim 1 , wherein: the updated distribution indicates that a first ML model of the plurality of ML models is to be selected to generate inferences at a higher likelihood compared to a corresponding likelihood of the first distribution; and the updated distribution indicates that a second ML model of the plurality of ML models is to be selected to generate inferences at a lower likelihood compared to a corresponding likelihood of the first distribution. 3. The method of claim 1 , wherein the plurality of inference results includes a first plurality of inference results generated by the plurality of ML models using a common input data. 4. The method of claim 3 , wherein determining the plurality of accuracy scores is based at least in part on comparing the first plurality of inference results. 5. The method of claim 1 , wherein determining the plurality of accuracy scores is based at least in part on comparing the plurality of inference results with a corresponding plurality of ground truth confirmations obtained using input data that was used by the plurality of ML models to generate the plurality of inference results. 6. The method of claim 1 , wherein determining the plurality of accuracy scores is based at least in part on an analysis of explicit or implied user feedback provided by one or more users that caused inference requests to be issued that resulted in the plurality of inference results being generated by the plurality of ML models. 7. The method of claim 1 , further comprising: receiving a request to perform an inference using an input data; selecting, by the model selector based on an analysis of the input data, a first ML model from a second plurality of ML models to be used to perform the inference; and providing the input data to the first ML model. 8. The method of claim 7 , wherein: the selecting the first ML model comprises using the input data or other data generated based on the input data as input to a second ML model; and the second ML model generates a result identifying the first ML model. 9. The method of claim 1 , wherein the plurality of ML models are executed by a corresponding plurality of containers that are executed by one or more computing devices within the provider network. 10. The method of claim 1 , further comprising: providing, by the model selector, an inference request to each of the plurality of ML models; and generating a result based on a plurality of inference results generated by the plurality of ML models. 11. The method of claim 1 , further comprising: receiving a message indicating that a second ML model is to be tested alongside a first ML model; providing, by the model selector, an inference request to the first ML model and the second ML model; sending a response to the inference request including a first inference result generated by the first ML model but not a second inference result generated by the second ML model; and determining a first accuracy score for the first ML model based at least in part on the first inference result and a second accuracy score for the second ML model based on a second inference result generated by the second ML model. 12. The method of claim 1 , further comprising determining an unbiased estimate of accuracy for each of the plurality of ML models that indicates how the corresponding ML model would have performed if it had processed the plurality of inference requests despite not having actually processed the plurality of inference requests. 13. The method of claim 1 , wherein the request was originated on behalf of a user of the provider network and includes an identifier of the HTTP endpoint and identifiers of the plurality of ML models. 14. The method of claim 1 , further comprising: obtaining performance metrics associated with the plurality of ML models in generating the plurality of inference results, the performance metrics including at least one of a time to execute or a computing resource utilization amount, wherein the causing of the model selector to be updated is further based at least in part on an analysis of the performance metrics. 15. The method of claim 1 , further comprising: determine that a first ML model, of the plurality of ML models, has an accuracy amount for a period of time that satisfies a threshold; and causing the model selector to be updated to no longer pass any inference requests for inference requests received at the HTTP endpoint to the first ML model. 16. A system comprising: a first one or more electronic devices to implement a dynamic router, the dynamic router including first instructions that upon execution cause the dynamic router to implement a model selector to select one or more of a plurality of machine learning (ML) models to perform inferences for inference requests, and cause the inference requests to be provided to the selected ML models; and a second one or more electronic devices to implement a machine learning service, the machine learning service including second instructions that upon execution cause the machine learning service to: receive a request to deploy the plurality of ML models in association with a Hypertext Transfer Protocol (HTTP) endpoint, wherein the plurality of ML models were trained to perform a common type of inference task; configure a model selector, within a provider network, to select between ones of the plurality of ML models according to a first distribution for inference requests received at the HTTP endpoint, the first distribution indicating that each ML model is to be selected according to a same likelihood; obtain a plurality of inference results generated by the plurality of ML models; determine, based at least in part on the plurality of inference results, a plurality of accuracy scores corresponding to the plurality of ML models; and cause the model selector to be updated, based on the plurality of accuracy scores, to use an updated distribution to select ones of the plurality of ML models to generate inferences for inference requests received at the HTTP endpoint, wherein the updated distribution is different than the first distribution. 17. The system of claim 16 , wherein the plurality of inference results includes a first plurality of inference results genera

Assignees

Amazon Tech Inc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/09
Supervised learning · CPC title
G06N5/04
Inference or reasoning models · CPC title
G06N20/00Primary
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 66533094

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11257002B2 cover?: Techniques for dynamic accuracy-based experimentation and deployment of machine learning (ML) models are described. Inference traffic flowing to ML models and the accuracy of the models is analyzed and used to ensure that better performing models are executed more often via model selection. A predictive component can evaluate which model is more likely to be accurate for certain input data elem…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for AIDA based role models

Method and apparatus for real-time personalization

Systems for second-order predictive data analytics, and related methods and apparatus

Automatic accent detection using acoustic models

Introducing user trustworthiness in implicit feedback based search result ranking

Modifying search result ranking based on implicit user feedback and a model of presentation bias

Frequently asked questions