Computing system and method employing processing of operation corresponding to offloading instructions from host processor by memory's internal processor

US10613871B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10613871-B2
Application numberUS-201615067494-A
CountryUS
Kind codeB2
Filing dateMar 11, 2016
Priority dateSep 1, 2015
Publication dateApr 7, 2020
Grant dateApr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing system includes a host processor configured to process operations and a memory configured to include an internal processor and store host instructions to be processed by the host processor. The host processor offloads processing of a predetermined operation to the internal processor. The internal processor possibly provides specialized hardware designed to process the operation efficiently, improving the efficiency and performance of the computing system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: a memory controller; a host processor configured to process operations; and a memory comprising an internal processor and a memory array configured to store host instructions to be processed by the host processor, wherein in response to an offloading instruction being included in the host instructions to be processed, the host processor offloads processing of an operation corresponding to the offloading instruction to the internal processor, wherein the host processor offloads the processing of the operation corresponding to the offloading instruction to the internal processor based on a result of comparing a cost required when the operation corresponding to the offloading instruction is processed using a software library to a cost required when the operation corresponding to the offloading instruction is processed by the offloading to the internal processor, wherein the memory array includes a portion dedicated for data processed by the internal processor and the internal processor accesses the portion of the memory array via a dedicated pin, wherein the operation corresponding to the offloading instruction is a special type operation comprising at least one of a square root operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, and a trigonometric operation, and the internal processor comprises hardware logic comprising a special function unit (SFU) configured to process the special type operation, wherein the internal processor is a dedicated processor for processing the special type operation, wherein the host processor comprises a cache and a processing element (PE) configured to process the host instructions to generate a memory request, wherein the generated memory request comprises a load request, a store request, and an offloading request corresponding to the offloading instruction, wherein the offloading request bypasses the cache and is transferred to the memory controller, and wherein the load request and the store request do not bypass the cache. 2. The computing system of claim 1 , wherein: the load request or the store request is transferred to the memory controller in response to a cache miss occurring in the cache with respect to the load request or the store request. 3. The computing system of claim 2 , wherein the offloading request bypasses the cache and is transferred to the memory controller regardless of the occurrence of a cache hit or the cache miss. 4. The computing system of claim 1 , wherein the internal processor stores a result of processing the operation corresponding to the offloading instruction in a buffer of the internal processor or the memory array implemented separately from the internal processor in the memory. 5. The computing system of claim 1 , wherein the host processor is a central processing unit (CPU) or a graphics processing unit (GPU), and the internal processor is a processor-in-memory (PIM). 6. A method of processing an operation in a computing system, the method comprising: loading host instructions to be processed by a host processor from a memory; determining, by the host processor, whether an offloading instruction is included in the host instructions by analyzing source code; generating a memory request based on the host instructions processed by a processing element (PE) included in the host processor, in response to the offloading instruction being included in the host instructions, comparing a cost required when an operation corresponding to the offloading instruction is processed using a software library with a cost required when the operation corresponding to the offloading instruction is processed by the offloading to an internal processor included in the memory; offloading processing of the operation corresponding to the offloading instruction from the host processor to the internal processor; generating a code for using the internal processor when the comparison indicates that the cost required when the operation corresponding to the offloading instruction is processed by the offloading to the internal processor is less than the cost required when the operation corresponding to the offloading instruction is processed using a software library; and driving hardware logic of a special function unit (SFU) implemented in the internal processor in order to process the operation corresponding to the offloading instruction, wherein the offloading comprises offloading the processing of the operation corresponding to the offloading instruction based on a result of the comparison, wherein the operation corresponding to the offloading instruction is a special type operation comprising at least one of a square root operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, and a trigonometric operation, wherein the internal processor is a dedicated processor for processing the special type operation, wherein the generated memory request comprises a load request, a store request, and an offloading request corresponding to the offloading instruction, wherein the offloading request bypasses a cache of the host processor and is transferred to a memory controller, and wherein the load request and the store request do not bypass the cache. 7. The method of claim 6 , wherein, the load request or the store request is transferred to the memory controller of the computing system when a cache miss occurs in the cache of the host processor with respect to the load request or the store request. 8. The method of claim 6 , further comprising storing a result of processing the operation corresponding to the offloading instruction, which is performed by the internal processor, in a buffer of the internal processor or a memory array implemented separately from the internal processor in the memory. 9. The method of claim 6 , wherein the host processor is a central processing unit (CPU) or graphics processing unit (GPU), and the internal processor is a processor-in-memory (PIM). 10. A host processor comprising: a loader/storer configured to load host instructions stored in a memory; a cache; a processing element (PE) configured to process the host instructions to generate a memory request; a determiner configured to determine whether an offloading instruction is included in the host instructions; and a controller configured to offload processing of an operation corresponding to the offloading instruction from the host processor to an internal processor included in the memory in response to an offloading instruction being included in the host instructions, wherein the controller offloads the processing of the operation corresponding to the offloading instruction to the internal processor based on a result of comparing a cost required when the operation corresponding to the offloading instruction is processed using a software library to a cost required when the operation corresponding to the offloading instruction is processed by the offloading to the internal processor, and wherein a code is generated for using the internal processor when the comparison indicates that the cost required when the operation corresponding to the offloading instruction is processed by the offloading to the internal processor is less than the cost required when the operation corresponding to the offloading instruction is processed using a software library, wherein the operation corresponding to the offloading instruction is a special type operation comprising at least one of a square root operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, and a trigonometric operation, and the

Assignees

Inventors

Classifications

  • Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory · CPC title

  • G06F9/3881Primary

    Arrangements for communication of instructions and data · CPC title

  • Cross-Sectional Technologies · mapped topic

  • comprising an array of processing units with common control, e.g. single instruction multiple data processors (G06F15/82 takes precedence {; for correlation function computation G06F17/15}) · CPC title

  • Cross-Sectional Technologies · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10613871B2 cover?
A computing system includes a host processor configured to process operations and a memory configured to include an internal processor and store host instructions to be processed by the host processor. The host processor offloads processing of a predetermined operation to the internal processor. The internal processor possibly provides specialized hardware designed to process the operation effi…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/3881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).