Parallel processor with integrated correlation and convolution engine

US9760966B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9760966-B2
Application numberUS-201313736863-A
CountryUS
Kind codeB2
Filing dateJan 8, 2013
Priority dateJan 8, 2013
Publication dateSep 12, 2017
Grant dateSep 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for performing computer algorithms. The system includes a graphics pipeline operable to perform graphics processing and an engine operable to perform at least one of a correlation determination and a convolution determination for the graphics pipeline. The graphics pipeline is further operable to execute general computing tasks. The engine comprises a plurality of functional units operable to be configured to perform at least one of the correlation determination and the convolution determination. In one embodiment, the engine is coupled to the graphics pipeline. The system further includes a configuration module operable to configure the engine to perform at least one of the correlation determination and the convolution determination.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for performing computer processes, said system comprising: a streaming processor; a graphics pipeline associated with said streaming processor and operable to perform graphics processing, wherein said graphics pipeline is further operable to execute general computing tasks; an engine coupled to said streaming processor and comprising a plurality of functional units, wherein said plurality of functional units are configurable to perform at least one of a correlation determination and a convolution determination for said graphics pipeline, and wherein said engine is coupled to said graphics pipeline; and a configuration module disposed within said engine and operable to configure said engine to perform at least one of said correlation determination and said convolution determination, wherein a computation request is sent to said engine based an application programming interface (API) call, wherein said engine is coupled to another streaming processor, and wherein said engine is operable to perform said at least one of a correlation determination and a convolution determination for said streaming processor and said another streaming processor. 2. The system as described in claim 1 wherein each of said functional units is operable to perform one of a plurality of functions related to said at least one of said correlation determination and said convolution determination, wherein said one of said plurality of functions is selectable by said configuration module. 3. The system as described in claim 1 wherein said configuration module is operable to determine whether to configure said plurality of functional units to enable said engine to perform said at least one of said correlation determination and said convolution determination based on an instruction received from said graphics pipeline. 4. The system as described in claim 1 wherein said engine is operable to perform said at least one of said correlation determination and said convolution determination in at least one of less time and less energy than said graphics pipeline. 5. The system as described in claim 1 wherein said graphics pipeline is operable to pre-compute a portion of said at least one of said correlation determination and said convolution determination. 6. The system as described in claim 1 wherein said wherein said graphics pipeline is operable to post-compute a portion of said at least one of said correlation determination and said convolution determination. 7. The system of claim 1 , wherein said streaming processor is configured to switch to processing a different thread when waiting for a thread computation being performed by said engine. 8. The system of claim 1 , wherein said streaming processor is configured to operate using a general purpose graphics processing unit (GPGPU) programming framework. 9. A method of accelerating computer computations, said method comprising: an engine receiving a request via an application programming interface (API) to perform at least one of a correlation computation and a convolution computation, wherein said engine is coupled to a streaming processor and another streaming processor of a graphics processing unit (GPU); a configuration module determining a configuration of a plurality of execution units of said engine, wherein said configuration corresponds to said at least one of said correlation computation and said convolution computation, wherein said configuration module is disposed within said engine; said engine performing said at least one of said correlation computation and said convolution computation based on said configuration to generate a result; sending said result of said at least one of said correlation computation and said convolution computation from said engine to said streaming processor of said graphics processing unit (GPU); said engine performing said at least one of said correlation computation and said convolution computation to generate another result; and sending said another result from said engine to said another streaming processor of said graphics processing unit (GPU). 10. The method as described in claim 9 further comprising: precomputing a portion of said at least one of said correlation computation and said convolution computation. 11. The method as described in claim 9 further comprising: postcomputing a portion of said at least one of said correlation computation and said convolution computation. 12. The method as described in claim 9 wherein each of said execution units is operable to perform one of a plurality of functions, wherein said one of said plurality of functions is selectable by a configuration module. 13. The method as described in claim 9 wherein said request is received from a graphics pipeline of said GPU, and wherein said streaming processor is part of said graphics pipeline. 14. The method as described in claim 13 wherein said at least one of said correlation computation and said convolution computation comprises a computation of a sum of squared differences (SSD), a sum of absolute of differences (SAD), normalized cross correlation (NCC), convolution, formal correlation, or a combination thereof. 15. The method of claim 9 , wherein said streaming processor is configured to switch to processing a different thread when waiting for a thread computation being performed by said engine. 16. A programmable processor comprising: a graphics pipeline comprising a plurality of programmable processing elements operable to perform computations, wherein said plurality of programming processing elements comprise streaming processors; and a plurality of engines each coupled to a respective programmable processing element of said plurality of programmable processing elements, wherein said engine is dedicated to perform at least one of a correlation computation and a convolution computation responsive to a request sent via an Application Programming Interface (API), and wherein each engine of said plurality of engines comprises a plurality of execution units and a configuration module operable to configure said plurality of execution units to enable said engine to perform said at least one of said correlation computation and said convolution computation for at least two streaming processors in said graphics pipeline. 17. The programmable processor as described in claim 16 wherein said plurality of programmable processing elements is operable to perform general computing tasks. 18. The programmable processor as described in claim 16 wherein each of said execution units of said engine is operable to perform one of a plurality of functions related to said at least one of said correlation computation and said convolution computation, wherein said one of said plurality of functions is operable to be configured by said configuration module. 19. The programmable processor as described in claim 16 wherein each of said programmable processing elements is operable to precompute a portion of said at least one of said correlation computation and said convolution computation. 20. The programmable processor as described in claim 16 wherein each of said programmable processing elements is operable to postcompute a portion of said at least one of said correlation computation and said convolution computation. 21. The programmable processor of claim 16 , wherein said respective programmable processing element is configured to switch to processing a different thread when waiting for a thread computation being performed by s

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Multidimensional correlation or convolution · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9760966B2 cover?
A system and method for performing computer algorithms. The system includes a graphics pipeline operable to perform graphics processing and an engine operable to perform at least one of a correlation determination and a convolution determination for the graphics pipeline. The graphics pipeline is further operable to execute general computing tasks. The engine comprises a plurality of functional…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).