System for routing machine learning model inferences
US-11170309-B1 · Nov 9, 2021 · US
US11669377B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11669377-B2 |
| Application number | US-202217680859-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 25, 2022 |
| Priority date | Aug 21, 2019 |
| Publication date | Jun 6, 2023 |
| Grant date | Jun 6, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One or more virtual machines are launched at an application platform. At each of the one or more virtual machines, a machine learning model execution environment is instantiated for an instance of a machine learning model. A respective instance of the machine learning model is loaded to each machine learning model execution environment. Each loaded instance of the machine learning model is associated with an application programming interface (API) endpoint which can receive input data for the loaded instance of the machine learning model from a client device and return output data produced by the loaded instance of the machine learning model based on the input data.
Opening claim text (preview).
What is claimed is: 1. A method comprising: instantiating, at each virtual machine of one or more virtual machines, a machine learning model execution environment for an instance of a machine learning model; loading, by a processing device, a respective instance of the machine learning model to each machine learning model execution environment; associating each loaded instance of the machine learning model with an application programming interface (API) endpoint, the API endpoint configured to receive input data for the loaded instance of the machine learning model from a client device and to return output data produced by the loaded instance of the machine learning model based on the input data; receiving a request by the client device to configure the API endpoint; and identifying configuration information specified by the request, wherein an identifier of the machine learning model and a resource locator of the API endpoint are specified by the configuration information. 2. The method of claim 1 , wherein the API endpoint is further configured to: receive a first request of the client device, the first request comprising first input data, provide the first input data as input for the loaded instance of the machine learning model, obtain first output data of the loaded instance of the machine learning model, and cause a first response comprising an indication of the first output data of the machine learning model to be sent to the client device. 3. The method of claim 2 , further comprising: identifying an audit record that is associated with the API endpoint; and recording audit information at the audit record, wherein the audit information comprises one or more of the first input data of the first request, the first output data of the first response, or contextual information with respect to the first request or first response. 4. The method of claim 3 , further comprising: performing one or more operations using the audit information of the audit record, the one or more operations comprising a validation operation to validate the first output data obtained from the loaded instance of the machine learning model at the respective virtual machine of the one or more virtual machines against second output data obtained from another loaded instance of the machine learning model at another respective virtual machine, the second output data obtained by applying the first input data as input to the other loaded instance of the machine learning model. 5. The method of claim 4 , wherein performing the one or more operations using the audit information of the audit record further comprises: performing a data processing operation on the audit information to generate an audit data output; and providing a graphical user interface (GUI) to the client device that presents a graphical representation of the audit data output. 6. The method of claim 1 , further comprising: receiving, from the client device, an authentication request comprising authentication credentials corresponding to an account; authenticating the account based on the authentication credentials; and generating an access token based on the authentication, wherein the access token to allow the client device to access the API endpoint. 7. The method of claim 1 , wherein the API endpoint is further configured to receive the input data for the loaded instance of the machine learning model from the client device via an HTTP request, wherein the API endpoint is further configured to return output data produced by the loaded instance of the machine learning model based on the input data via an HTTP response. 8. The method of claim 1 , wherein the configuration information further specifies quality of service parameters, the method further comprising: monitoring quality metrics indicative of the quality of service parameters specified by the configuration information subsequent to configuring the API endpoint; determining that one or more of the quality metrics satisfy a threshold; and responsive to determining that the one or more of the quality metrics satisfy the threshold, adjusting a number of the one or more virtual machines executing at an application platform and associated with the API endpoint. 9. A method, comprising: accessing a machine learning model execution environment for a machine learning model at a virtual machine; determining whether the machine learning model is associated with a dataset that is to be preloaded for use by the machine learning model execution environment during run-time; in response to determining that the machine learning model is associated with the dataset that is to be preloaded, preloading the dataset that is associated with the machine learning model that is accessible by the virtual machine; and associating the machine learning model with an application programming interface (API) endpoint, wherein the API endpoint is configured to receive input data provided by a client device for the machine learning model, the received input data configured to be aggregated with data of the preloaded dataset and provided as aggregated input data for the machine learning model to obtain output data of the machine learning model. 10. The method of claim 9 , further comprising: instantiating the virtual machine at an application platform. 11. The method of claim 9 , further comprising: receiving a request by the client device to configure the API endpoint; and identifying configuration information specified by the request and stored at an application platform, wherein the configuration information comprises one or more of an identifier of the machine learning model, an address of the API endpoint, an identifier of the preloaded dataset, or instructions to preload the preloaded dataset into the memory. 12. The method of claim 9 , wherein the API endpoint is to: receive, from the client device, a first request comprising first input data that is to be combined with the data of the preloaded dataset to generate the aggregated input data and be applied as input to the machine learning model, obtain from the machine learning model first output data based on the aggregated input data, and cause a first response comprising an indication of the output data of the machine learning model to be sent to the client device. 13. The method of claim 12 , wherein the first request comprises a data identifier associated with the data of the preloaded dataset. 14. The method of claim 13 , wherein the data identifier comprises a user identifier of a user of the client device, and wherein the data of the preloaded dataset that is associated with the data identifier comprises user information associated with the user of the client device. 15. The method of claim 12 , further comprising: generating the preloaded dataset based on a threshold number of recent requests to the API endpoint by the client device. 16. A system comprising: a memory; and a processing device, coupled to the memory to: instantiate, at each virtual machine of one or more virtual machines, a machine learning model execution environment for an instance of a machine learning model; load a respective instance of the machine learning model to each machine learning model execution environment; associate each loaded instance of the machine learning model with an application programming interface (API) endpoint, the API endpoint to receive input data for the loaded instance of the machine learning model from a client device and to return output data produced by the loaded instance of the machine learning model based on the input data; rec
Machine learning · CPC title
Remote procedure calls [RPC]; Web services · CPC title
Protocols for remote procedure calls [RPC] · CPC title
Starting, stopping, suspending or resuming virtual machine instances · CPC title
Monitoring or debugging support · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.