CPU tight-coupled accelerator

US12436808B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12436808-B2
Application numberUS-202318225041-A
CountryUS
Kind codeB2
Filing dateJul 21, 2023
Priority dateJun 6, 2023
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An integrated circuit includes: a central processing unit (CPU) core; an accelerator; and an acceleration instruction queue connected to the CPU core and the accelerator. The CPU core is to: fetch and decode one or more instructions from among an instruction sequence in a programmed order; determine an instruction from among the one or more instructions containing an acceleration workload encoded therein; and queue the instruction containing the acceleration workload encoded therein in the acceleration instruction queue.

First claim

Opening claim text (preview).

What is claimed is: 1. An integrated circuit comprising: a central processing unit (CPU) core; an accelerator; and an acceleration instruction queue connected to the CPU core and the accelerator, wherein the CPU core is configured to: fetch and decode one or more instructions from among an instruction sequence in a programmed order, the one or more instructions comprising an acceleration workload for the accelerator and a CPU workload for the CPU core; determine a first instruction from among the one or more instructions containing the acceleration workload encoded therein based on an instruction type of the first instruction indicating the acceleration workload; queue the first instruction containing the acceleration workload encoded therein in the acceleration instruction queue; determine a second instruction from among the one or more instructions containing the CPU workload therein based on an instruction type of the second instruction indicating the CPU workload; and dispatch the second instruction to a CPU data path for the CPU core, and wherein the instruction type indicating the acceleration workload comprises one or more tensor operations, and the instruction type indicating the CPU workload comprises at least one of a scalar workload, a vector workload, or a memory workload. 2. The integrated circuit of claim 1 , wherein the accelerator is configured to: dequeue the first instruction containing the acceleration workload from the acceleration instruction queue; receive operands associated with the acceleration workload from scratch memory of the CPU core; and compute a result based on the operands and the dequeued first instruction. 3. The integrated circuit of claim 2 , wherein the accelerator is configured to dequeue instructions from the acceleration instruction queue in a first-in-first-out method. 4. The integrated circuit of claim 2 , wherein the accelerator is further configured to store the result in embedded memory of the accelerator. 5. The integrated circuit of claim 4 , wherein the CPU core, the accelerator, the scratch memory, and the embedded memory are integrated on the same chip as each other. 6. The integrated circuit of claim 4 , wherein the CPU core is configured to retrieve the result from the embedded memory of the accelerator, and store the result in the scratch memory of the CPU core. 7. The integrated circuit of claim 1 , wherein the accelerator instruction queue comprises a plurality of instruction queues defining different priorities from each other for the accelerator. 8. A computing system comprising: an accelerator; one or more processors integrated with the accelerator in the same integrated circuit; and memory comprising instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to: identify a programmed order for executing one or more CPU instructions, the one or more CPU instructions comprising an acceleration workload for the accelerator and a CPU workload from the one or more processors; and execute the one or more CPU instructions according to the programmed order, wherein to execute the one or more CPU instructions, the instructions cause the one or more processors to: fetch and decode a first instruction in the programmed order from among the one or more CPU instructions; dispatch the decoded first instruction to an accelerator data path from among a CPU pipeline based on an instruction type of the first instruction indicating the acceleration workload; fetch and decode a second instruction in the programmed order from among the one or more CPU instructions; and dispatch the decoded second instruction to a CPU data path from among the CPU pipeline based on an instruction type of the second instruction indicating the CPU workload, and wherein the instruction type indicating the acceleration workload comprises one or more tensor operations, and the instruction type indicating the CPU workload comprises at least one of a scalar workload, a vector workload, or a memory workload. 9. The computing system of claim 8 , wherein the first instruction comprises an accelerator workload encoded therein to be dispatched to the accelerator data path, and wherein the instructions further cause the one or more processors to: enqueue the first instruction in an acceleration instruction queue; and provide corresponding operands to the accelerator for compute based on the first instruction. 10. The computing system of claim 9 , wherein the accelerator is configured to: dequeue the first instruction from the acceleration instruction queue; compute a result based on the corresponding operands and the first instruction dequeued from the acceleration instruction queue; and store the result in embedded memory of the accelerator. 11. The computing system of claim 10 , wherein the accelerator is configured to dequeue instructions from the acceleration instruction queue in a first-in-first-out method. 12. The computing system of claim 10 , wherein the instructions further cause the one or more processors to: retrieve the result from the embedded memory of the accelerator; and store the result in scratch memory. 13. The computing system of claim 12 , wherein the accelerator, the one or more processors, the embedded memory, and the scratch memory are integrated in the same integrated circuit. 14. A method for accelerating instructions, comprising: identifying, by one or more processors, a programmed order for executing one or more instructions, the one or more instructions comprising an acceleration workload and a CPU workload; determining, by the one or more processors, the acceleration workload encoded in a first instruction of the one or more instructions in the programmed order based on an instruction type of the first instruction indicating the acceleration workload; dispatching, by the one or more processors, the first instruction to an accelerator data path from among a plurality of data paths of a CPU pipeline based on the determining that the acceleration workload is encoded in the first instruction; determining, by the one or more processors, the CPU workload encoded in a second instruction of the one or more instructions in the programmed order based on an instruction type of the second instruction indicating the CPU workload; and dispatching, by the one or more processors, the second instruction to a CPU data path from among the plurality of data paths of the CPU pipeline based on the determining that the CPU workload is encoded in the second instruction, wherein the instruction type indicating the acceleration workload comprises one or more tensor operations, and the instruction type indicating the CPU workload comprises at least one of a scalar workload, a vector workload, or a memory workload. 15. The method of claim 14 , wherein the dispatching of the first instruction comprises: enqueueing, by the one or more processors, the first instruction in an acceleration instruction queue; and providing, by the one or more processors, corresponding operands to the accelerator data path for compute based on the first instruction. 16. The method of claim 15 , wherein the accelerator data path comprises an accelerator integrated with the one or more processors in the same integrated circuit, and the method further comprises: dequeuing, by the accelerator, the first instruction from the acceleration instruction queue; computing, by the accelerator, a result based on the corresponding operands and the first instruction dequeued from the acceleration instruction qu

Assignees

Inventors

Classifications

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • using electronic means · CPC title

  • Access to shared memory · CPC title

  • System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package · CPC title

  • Encoding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12436808B2 cover?
An integrated circuit includes: a central processing unit (CPU) core; an accelerator; and an acceleration instruction queue connected to the CPU core and the accelerator. The CPU core is to: fetch and decode one or more instructions from among an instruction sequence in a programmed order; determine an instruction from among the one or more instructions containing an acceleration workload encod…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/5027. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).