Intelligent management of machine learning inference in edge-cloud systems

US12580824B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12580824-B2
Application numberUS-202318541996-A
CountryUS
Kind codeB2
Filing dateDec 15, 2023
Priority dateDec 15, 2023
Publication dateMar 17, 2026
Grant dateMar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method relate to managing a cloud computing system. Queries are received from one or more edge devices of a set of edge devices. Each query includes sensor data from the respective edge device. Prediction data is generated via one or more cloud machine learning models, using the sensor data, during a current time period. System state data is generated and indicates a current state of an environment during the current time period. The environment is defined by the cloud computing system and the set of edge devices. A machine learning system generates policy data by optimizing an expected return of a reward with respect to taking a particular action given the system state data. The machine learning system is employed by the cloud computing system. The policy data indicates a recommended action from a set of actions. The cloud computing system performs the recommended action.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer-implemented method for managing a cloud computing system, the computer-implemented method comprising: receiving queries from one or more edge devices of a set of edge devices that are connected to the cloud computing system, each query including sensor data from the respective edge device; generating prediction data, via one or more machine learning models, using the sensor data, the one or more machine learning models being employed by the cloud computing system during a current time period; generating system state data indicating a current state of an environment during the current time period, the environment being defined by the cloud computing system and the set of edge devices; generating, via a reinforcement learning (RL) agent, policy data by optimizing an expected return of a reward with respect to taking a particular action given the system state data, the policy data indicating a recommended action from a set of actions; and performing the recommended action, wherein the cloud computing system employs the RL agent as a machine learning system to control an offload tendency of transmitting a respective query of a machine learning inference task from one or more of the edge devices to the cloud computing system via the policy data. 2 . The computer-implemented method of claim 1 , wherein: the machine learning system comprises a deep neural network (DNN) that is configured to optimize the expected return by approximating a Q-value for each action of the set of actions; and the recommended action returns a particular action having the best Q-value from among the set of actions. 3 . The computer-implemented method of claim 1 , wherein the system state data is generated for the current time period based on a number of the edge devices being serviced by the cloud computing system; a change in the number of edge devices between a previous time period and the current time period, an amount of computer resources that are being used to process the queries; a number of the queries being processed by the cloud computing system; latency data regarding the queries being processed by the cloud computing system; cost data of the cloud computing system incurred by processing the queries from the previous time period to the current time period; and query threshold data being used by each edge device to determine whether or not to generate a respective query for transmission to the cloud computing system. 4 . The computer-implemented method of claim 1 , wherein the set of actions comprises allocating one or more computer resources of the cloud computing system, deallocating the one or more computer resources, maintaining the one or more computer resources, and modifying a query threshold of the cloud computing system. 5 . The computer-implemented method of claim 1 , further comprising: generating query threshold data of the cloud computing system for the current time period, the query threshold data being used by each edge device to determine whether or not to generate a respective query for transmission to the cloud computing system; and transmitting the query threshold data to each edge device of the set of edge devices. 6 . The computer-implemented method of claim 1 , wherein the reward is computed such that a predetermined cost budget is not exceeded by cost data associated with processing the queries on the cloud computing system and additional cost data associated with performing the particular action. 7 . The computer-implemented method of claim 1 , wherein the reward is computed such that a predetermined latency target is not exceeded by latency data associated with processing the queries. 8 . A system comprising: one or more processors; and one or more memory in data communication with the one or more processors, the one or more memory including computer readable data stored thereon that, when executed by the one or more processors, causes the one or more processors to perform a method for managing a cloud computing system, the method including receiving queries from one or more edge devices of a set of edge devices that are connected to the cloud computing system, each query including sensor data from the respective edge device; generating prediction data, via one or more machine learning models, using the sensor data, the one or more machine learning models being employed by the cloud computing system during a current time period; generating system state data indicating a current state of an environment during the current time period, the environment being defined by the cloud computing system and the set of edge devices; generating, via a reinforcement learning (RL) agent, policy data by optimizing an expected return of a reward with respect to taking a particular action given the system state data, the policy data indicating a recommended action from a set of actions; and performing the recommended action, wherein the cloud computing system employs the RL agent as a machine learning system to control an offload tendency of transmitting a respective query of a machine learning inference task from one or more of the edge devices to the cloud computing system via the policy data. 9 . The system of claim 8 , wherein: the machine learning system comprises a deep neural network (DNN) that is configured to optimize the expected return by approximating a Q-value for each action of the set of actions; and the recommended action returns a particular action having the best Q-value from among the set of actions. 10 . The system of claim 8 , wherein the system state data is generated for the current time period based on a number of the edge devices being serviced by the cloud computing system; a change in the number of edge devices between a previous time period and the current time period, an amount of computer resources that are being used to process the queries; a number of the queries being processed by the cloud computing system; latency data regarding the queries being processed by the cloud computing system; cost data of the cloud computing system incurred by processing the queries from the previous time period to the current time period; and query threshold data being used by each edge device to determine whether or not to generate a respective query for transmission to the cloud computing system. 11 . The system of claim 8 , wherein the set of actions comprises allocating one or more computer resources of the cloud computing system, deallocating the one or more computer resources, maintaining the one or more computer resources, and modifying a query threshold of the cloud computing system. 12 . The system of claim 8 , wherein the method further comprises: generating query threshold data of the cloud computing system for the current time period, the query threshold data being used by each edge device to determine whether or not to generate a respective query for transmission to the cloud computing system; and transmitting the query threshold data to each edge device of the set of edge devices. 13 . The system of claim 8 , wherein the reward is computed such that a predetermined cost budget is not exceeded by cost data associated with processing the queries on the cloud computing system and additional cost data associated with performing the particular action. 14 . The system of claim 8 , wherein the reward is computed such that a predetermined latency target is not exceeded by latency data associated with processing the queries. 15 . One or more non-transitory computer-readable media

Assignees

Inventors

Classifications

  • for reduction of network costs (H04L41/0833 takes precedence) · CPC title

  • H04L41/16Primary

    using machine learning or artificial intelligence · CPC title

  • Querying (for retrieval from the web G06F16/953) · CPC title

  • Machine learning · CPC title

  • G06N5/041Primary

    Abduction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12580824B2 cover?
A system and method relate to managing a cloud computing system. Queries are received from one or more edge devices of a set of edge devices. Each query includes sensor data from the respective edge device. Prediction data is generated via one or more cloud machine learning models, using the sensor data, during a current time period. System state data is generated and indicates a current state …
Who is the assignee on this patent?
Bosch Gmbh Robert
What technology area does this patent fall under?
Primary CPC classification H04L41/0826. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).