What technology area does this patent fall under?

Primary CPC classification G06F9/5027. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

CPU tight-coupled accelerator

US12436808B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12436808-B2
Application number	US-202318225041-A
Country	US
Kind code	B2
Filing date	Jul 21, 2023
Priority date	Jun 6, 2023
Publication date	Oct 7, 2025
Grant date	Oct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An integrated circuit includes: a central processing unit (CPU) core; an accelerator; and an acceleration instruction queue connected to the CPU core and the accelerator. The CPU core is to: fetch and decode one or more instructions from among an instruction sequence in a programmed order; determine an instruction from among the one or more instructions containing an acceleration workload encoded therein; and queue the instruction containing the acceleration workload encoded therein in the acceleration instruction queue.

First claim

Opening claim text (preview).

What is claimed is: 1. An integrated circuit comprising: a central processing unit (CPU) core; an accelerator; and an acceleration instruction queue connected to the CPU core and the accelerator, wherein the CPU core is configured to: fetch and decode one or more instructions from among an instruction sequence in a programmed order, the one or more instructions comprising an acceleration workload for the accelerator and a CPU workload for the CPU core; determine a first instruction from among the one or more instructions containing the acceleration workload encoded therein based on an instruction type of the first instruction indicating the acceleration workload; queue the first instruction containing the acceleration workload encoded therein in the acceleration instruction queue; determine a second instruction from among the one or more instructions containing the CPU workload therein based on an instruction type of the second instruction indicating the CPU workload; and dispatch the second instruction to a CPU data path for the CPU core, and wherein the instruction type indicating the acceleration workload comprises one or more tensor operations, and the instruction type indicating the CPU workload comprises at least one of a scalar workload, a vector workload, or a memory workload. 2. The integrated circuit of claim 1 , wherein the accelerator is configured to: dequeue the first instruction containing the acceleration workload from the acceleration instruction queue; receive operands associated with the acceleration workload from scratch memory of the CPU core; and compute a result based on the operands and the dequeued first instruction. 3. The integrated circuit of claim 2 , wherein the accelerator is configured to dequeue instructions from the acceleration instruction queue in a first-in-first-out method. 4. The integrated circuit of claim 2 , wherein the accelerator is further configured to store the result in embedded memory of the accelerator. 5. The integrated circuit of claim 4 , wherein the CPU core, the accelerator, the scratch memory, and the embedded memory are integrated on the same chip as each other. 6. The integrated circuit of claim 4 , wherein the CPU core is configured to retrieve the result from the embedded memory of the accelerator, and store the result in the scratch memory of the CPU core. 7. The integrated circuit of claim 1 , wherein the accelerator instruction queue comprises a plurality of instruction queues defining different priorities from each other for the accelerator. 8. A computing system comprising: an accelerator; one or more processors integrated with the accelerator in the same integrated circuit; and memory comprising instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to: identify a programmed order for executing one or more CPU instructions, the one or more CPU instructions comprising an acceleration workload for the accelerator and a CPU workload from the one or more processors; and execute the one or more CPU instructions according to the programmed order, wherein to execute the one or more CPU instructions, the instructions cause the one or more processors to: fetch and decode a first instruction in the programmed order from among the one or more CPU instructions; dispatch the decoded first instruction to an accelerator data path from among a CPU pipeline based on an instruction type of the first instruction indicating the acceleration workload; fetch and decode a second instruction in the programmed order from among the one or more CPU instructions; and dispatch the decoded second instruction to a CPU data path from among the CPU pipeline based on an instruction type of the second instruction indicating the CPU workload, and wherein the instruction type indicating the acceleration workload comprises one or more tensor operations, and the instruction type indicating the CPU workload comprises at least one of a scalar workload, a vector workload, or a memory workload. 9. The computing system of claim 8 , wherein the first instruction comprises an accelerator workload encoded therein to be dispatched to the accelerator data path, and wherein the instructions further cause the one or more processors to: enqueue the first instruction in an acceleration instruction queue; and provide corresponding operands to the accelerator for compute based on the first instruction. 10. The computing system of claim 9 , wherein the accelerator is configured to: dequeue the first instruction from the acceleration instruction queue; compute a result based on the corresponding operands and the first instruction dequeued from the acceleration instruction queue; and store the result in embedded memory of the accelerator. 11. The computing system of claim 10 , wherein the accelerator is configured to dequeue instructions from the acceleration instruction queue in a first-in-first-out method. 12. The computing system of claim 10 , wherein the instructions further cause the one or more processors to: retrieve the result from the embedded memory of the accelerator; and store the result in scratch memory. 13. The computing system of claim 12 , wherein the accelerator, the one or more processors, the embedded memory, and the scratch memory are integrated in the same integrated circuit. 14. A method for accelerating instructions, comprising: identifying, by one or more processors, a programmed order for executing one or more instructions, the one or more instructions comprising an acceleration workload and a CPU workload; determining, by the one or more processors, the acceleration workload encoded in a first instruction of the one or more instructions in the programmed order based on an instruction type of the first instruction indicating the acceleration workload; dispatching, by the one or more processors, the first instruction to an accelerator data path from among a plurality of data paths of a CPU pipeline based on the determining that the acceleration workload is encoded in the first instruction; determining, by the one or more processors, the CPU workload encoded in a second instruction of the one or more instructions in the programmed order based on an instruction type of the second instruction indicating the CPU workload; and dispatching, by the one or more processors, the second instruction to a CPU data path from among the plurality of data paths of the CPU pipeline based on the determining that the CPU workload is encoded in the second instruction, wherein the instruction type indicating the acceleration workload comprises one or more tensor operations, and the instruction type indicating the CPU workload comprises at least one of a scalar workload, a vector workload, or a memory workload. 15. The method of claim 14 , wherein the dispatching of the first instruction comprises: enqueueing, by the one or more processors, the first instruction in an acceleration instruction queue; and providing, by the one or more processors, corresponding operands to the accelerator data path for compute based on the first instruction. 16. The method of claim 15 , wherein the accelerator data path comprises an accelerator integrated with the one or more processors in the same integrated circuit, and the method further comprises: dequeuing, by the accelerator, the first instruction from the acceleration instruction queue; computing, by the accelerator, a result based on the corresponding operands and the first instruction dequeued from the acceleration instruction qu

Assignees

Samsung Electronics Co Ltd

Inventors

Classifications

G06F9/4881
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
G06N3/063
using electronic means · CPC title
G06F13/1663
Access to shared memory · CPC title
G06F15/7807
System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package · CPC title
G06F8/44
Encoding · CPC title

Patent family

Related publications grouped by family.

View patent family 93662737

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12436808B2 cover?: An integrated circuit includes: a central processing unit (CPU) core; an accelerator; and an acceleration instruction queue connected to the CPU core and the accelerator. The CPU core is to: fetch and decode one or more instructions from among an instruction sequence in a programmed order; determine an instruction from among the one or more instructions containing an acceleration workload encod…
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06F9/5027. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).