What technology area does this patent fall under?

Primary CPC classification G06F9/30036. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Programmable vision accelerator

US11630800B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11630800-B2
Application number	US-201615141703-A
Country	US
Kind code	B2
Filing date	Apr 28, 2016
Priority date	May 1, 2015
Publication date	Apr 18, 2023
Grant date	Apr 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for executing a collapsed multi-dimensional loop, the system comprising: a memory that stores a loop configuration instruction for a multi-dimensional loop and stores a plurality of loop instructions included in a one-dimensional loop; a multi-dimensional address generator that generates a plurality of addresses according to an address pattern by: precomputing an address modifier for each dimension included in the multi-dimensional loop based on a respective number of iterations for each dimension included in the multi-dimensional loop and a respective weight associated with each dimension included in the multi-dimensional loop; and after precomputing the address modifiers for the dimensions of the multi-dimensional loop, generating the plurality of addresses by iteratively applying the precomputed address modifiers to a base address when a corresponding loop index is incremented; a load/store unit that accesses an object based on a first address from the plurality of addresses; and a vector unit that performs one or more operations on the object based on a first loop instruction included in the plurality of loop instructions. 2. The system of claim 1 , wherein the loop configuration instruction further specifies a plurality of iteration numbers and a plurality of iteration weights. 3. The system of claim 1 , wherein the multi-dimensional address generator further performs an increment or decrement operation based on the address pattern and a second loop instruction included in the plurality of loop instructions. 4. The system of claim 1 , wherein at least one of the plurality of loop instructions is associated with a flag, and the system further comprises a branch/predicate unit that generates the flag. 5. The system of claim 4 , wherein the branch/predicate unit comprises a modulo counter. 6. The system of claim 1 , wherein the loop configuration instruction comprises a very long instruction word (VLIW) instruction. 7. The system of claim 1 , wherein the load/store unit includes saturation logic, at least one of the plurality of loop instructions specifies a saturation option, and the saturation logic performs a saturation operation on the object based on the saturation option. 8. The system of claim 1 , wherein the load/store unit includes rounding logic, at least one of the plurality of loop instructions specifies a rounding option, and the rounding logic performs a rounding operation on the object based on the rounding option. 9. The system of claim 1 , wherein at least one of the plurality of loop instructions specifies at least one of a data type and a data distribution option. 10. The system of claim 1 , wherein the first address is further based on a first modifier included in the address pattern that is associated with a current iteration of the one-dimensional loop. 11. A computer-implemented method for executing a collapsed multi-dimensional loop, the method comprising: receiving a configuration instruction for a multi-dimensional loop; generating a plurality of addresses according to an address pattern by: precomputing an address modifier for each dimension included in the multi-dimensional loop based on a respective number of iterations for each dimension included in the multi-dimensional loop and a respective weight associated with each dimension included in the multi-dimensional loop; and after precomputing the address modifiers for the dimensions of the multi-dimensional loop, generating the plurality of addresses by iteratively applying the precomputed address modifiers to a base address when a corresponding loop index is incremented; and executing the collapsed multi-dimensional loop as a single loop based on the plurality of addresses by accessing an object based on a first address from the plurality of addresses. 12. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises performing one or more operations on the object accessed based on the plurality of addresses. 13. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises performing an increment or decrement operation on the object accessed based on the plurality of addresses. 14. The method of claim 11 , wherein the configuration instruction further specifies a plurality of iteration numbers and a plurality of iteration weights. 15. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises accessing the object based on the plurality of addresses. 16. The method of claim 15 , wherein accessing the object is further based on at least one of a data type and a distribution type. 17. The method of claim 15 , wherein executing the collapsed multi-dimensional loop further comprises executing one or more operations on the object based on at least one of a saturation option and a rounding option. 18. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises: computing a flag; and conditionally controlling, based on the flag, operations on the object accessed based on the plurality of addresses. 19. The method of claim 18 , wherein computing the flag comprises performing a modulo operation on a counter variable that is associated with the flag. 20. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises: computing a flag; if the flag matches a first condition, then executing a first operation on the object accessed based on the plurality of addresses; or if the flag does not match an activation condition, then executing a second operation on the object accessed based on the plurality of addresses. 21. A system for executing a computer vision application, the system comprising: a programmable vector processor that: executes a collapsed multi-dimensional loop included in the computer vision application as a single loop based on a plurality of addresses generated according to an address pattern by: precomputing an address modifier for each dimension included in the multi-dimensional loop based on a respective number of iterations for each dimension included in the multi-dimensional loop and a respective weight associated with each dimension included in the multi-dimensional loop; and after precomputing the address modifiers for the dimensions of the multi-dimensional loop, generating the plurality of addresses by iteratively applying the precomputed address modifiers to a base address when a corresponding loop index is incremented in order to access an object based on a first address from the plurality of addresses; a fixed-function accelerator that accelerates a fixed processing operation included in the computer vision application; and a reduced instruction set computer (RISC) core that coordinates the programmable vector processor and the fixed-function accelerator.

Assignees

Nvidia Corp

Inventors

Classifications

G06F9/30036Primary
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
G06F9/30072
to perform conditional operations, e.g. using predicates or guards · CPC title
G06F12/0207
with multidimensional access, e.g. row/column, matrix · CPC title
G06F9/3001
Arithmetic instructions · CPC title
G06F15/82Primary
data or demand driven · CPC title

Patent family

Related publications grouped by family.

View patent family 57205735

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11630800B2 cover?: In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).