Offload multi-dependent machine learning inferences from a central processing unit

US2025139423A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025139423-A1
Application numberUS-202318500071-A
CountryUS
Kind codeA1
Filing dateNov 1, 2023
Priority dateNov 1, 2023
Publication dateMay 1, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An information handling system includes a central processing unit, a neural processing unit, and an offload module. The offload module receives an inference container including multiple inference models and metadata associated with the inference models. Based on the metadata, the offload module determines whether a quality of service for the inference models may be met by the neural processing unit. In response to the quality of service being met in the neural processing unit, the neural processing unit executes the inference models. In response to the quality of service not being met in the neural processing unit, the central processing unit executes the inference models.

First claim

Opening claim text (preview).

What is claimed is: 1 . An information handling system comprising: a central processing unit; a neural processing unit; and an offload module to communicate with the central processing unit and with the neural processing unit, the offload module to: receive an inference container including multiple inference models and metadata associated with the inference models; and based on the metadata, determine whether a quality of service for the inference models may be met by the neural processing unit; in response to the quality of service being met by the neural processing unit, the neural processing unit to execute the inference models; and in response to the quality of service not being met by the neural processing unit, the central processing unit to execute the inference models. 2 . The information handling system of claim 1 , wherein the information handling system further comprises a scheduler in communication with the offload module, wherein prior to the execution of the inference models in the neural processing unit, the scheduler to: receive a schedule neural processing unit request from the offload module; and in response to the schedule neural processing unit request, schedule the inference models for execution in the neural processing unit. 3 . The information handling system of claim 1 , wherein the information handling system further comprises a scheduler in communication with the offload module, wherein prior to the execution of the inference models in the central processing unit, the scheduler to: receive a schedule central processing unit request from the offload module; and in response to the schedule central processing unit request, schedule the inference models for execution in the central processing unit. 4 . The information handling system of claim 1 , further comprising a memory to store telemetry data associated with the information handling system. 5 . The information handling system of claim 4 , wherein the execution of the inference models by the neural processing unit includes the neural processing unit to provide the telemetry data as an input to the inference models. 6 . The information handling system of claim 1 , wherein the offload module is an extension to an operating system of the information handling system. 7 . The information handling system of claim 1 , the execution of the inference models in the neural processing unit enables the central processing unit to perform other operations. 8 . The information handling system of claim 1 , wherein the quality of service indicates a particular time interval for execution of the inference models, a power performance level, and a latency. 9 . A method comprising: receiving, by an offload module of an information handling system, an inference container including multiple inference models and metadata associated with the inference models; based on the metadata, determining whether a quality of service for the inference models may be met by a neural processing unit of the information handling system; in response to the quality of service being met in the neural processing unit, the neural processing unit to execute the inference models; and in response to the quality of service not being met in the neural processing unit, a central processing unit or a graphics processing unit of the information handling system to execute the inference models. 10 . The method of claim 9 , wherein prior to the executing of the inference models in the neural processing unit, the method further comprises: receiving, by a scheduler of the information handling system, a schedule neural processing unit request from the offload module; and in response to the schedule neural processing unit request, scheduling the inference models for execution in the neural processing unit. 11 . The method of claim 9 , wherein prior to the executing of the inference models in the neural processing unit, the method further comprises: receiving, by a scheduler of the information handling system, a schedule central processing unit request from the offload module; and in response to the schedule central processing unit request, scheduling the inference models for execution in the central processing unit. 12 . The method of claim 9 , further comprising storing, in a memory of the information handling system, telemetry data associated with the information handling system. 13 . The method of claim 12 , wherein the executing of the inference models by the neural processing unit, the method further comprises: providing, by the neural processing unit, the telemetry data as an input to the inference models. 14 . The method of claim 9 , wherein the offload module is an extension to an operating system of the information handling system. 15 . The method of claim 9 , further comprising: based on the execution of the inference models in the neural processing unit, enabling the central processing unit to perform other operations. 16 . The method of claim 9 , wherein the quality of service indicates a particular time interval for execution of the inference models, a power performance level, and a latency. 17 . A method comprising: receiving, by an offload module of an information handling system, an inference container including multiple inference models and metadata associated with the inference models; based on the metadata, determining whether a quality of service for the inference models may be met by a neural processing unit of the information handling system; if the quality of service is met in the neural processing unit, then: providing a schedule neural processing unit request to a scheduler of the information handling system; scheduling, by the scheduler, the inference models for execution in the neural processing unit to execute the inference models; and executing, by the neural processing unit, the inference models; and if the quality of service is not met in the neural processing unit, then: providing a schedule central processing unit request to a scheduler of the information handling system; scheduling, by the scheduler, the inference models for execution in the central processing unit to execute the inference models; and executing, by the central processing unit, the inference models. 18 . The method of claim 17 , wherein the offload module is an extension to an operating system of the information handling system. 19 . The method of claim 17 , wherein the executing of the inference models by the neural processing unit, the method further comprises providing, by the neural processing unit, the telemetry data as an input to the inference models. 20 . The method of claim 17 , further comprising based on the execution of the inference models in the neural processing unit, enabling the central processing unit to perform other operations.

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025139423A1 cover?
An information handling system includes a central processing unit, a neural processing unit, and an offload module. The offload module receives an inference container including multiple inference models and metadata associated with the inference models. Based on the metadata, the offload module determines whether a quality of service for the inference models may be met by the neural processing …
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).