Hardware-efficient deep convolutional neural networks

US9904874B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9904874-B2
Application numberUS-201514934016-A
CountryUS
Kind codeB2
Filing dateNov 5, 2015
Priority dateNov 5, 2015
Publication dateFeb 27, 2018
Grant dateFeb 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer media for implementing convolutional neural networks efficiently in hardware are disclosed herein. A memory is configured to store a sparse, frequency domain representation of a convolutional weighting kernel. A time-domain-to-frequency-domain converter is configured to generate a frequency domain representation of an input image. A feature extractor is configured to access the memory and, by a processor, extract features based on the sparse, frequency domain representation of the convolutional weighting kernel and the frequency domain representation of the input image. The feature extractor includes convolutional layers and fully connected layers. A classifier is configured to determine, based on extracted features, whether the input image contains an object of interest. Various types of memory can be used to store different information, allowing information-dense data to be stored in faster (e.g., faster access time) memory and sparse data to be stored in slower memory.

First claim

Opening claim text (preview).

We claim: 1. A convolutional neural network system, comprising: one or more processors; a memory configured to store a sparse, frequency domain representation of a convolutional weighting kernel, the sparse, frequency domain representation comprising a dense matrix and one or more sparse matrices; a time-domain-to-frequency-domain converter configured to, by the one or more processors, generate a frequency domain representation of an input image; a feature extractor comprising a plurality of convolutional layers and a plurality of fully connected layers, the feature extractor configured to, by the one or more processors: access the memory, and extract a plurality of features based at least in part on the sparse, frequency domain representation of the convolutional weighting kernel and the frequency domain representation of the input image, wherein a first convolutional layer of the plurality of convolutional layers is configured to: multiply the frequency domain representation of the input image by the one or more sparse matrices and apply a nonlinear function to a result of the multiplication, and wherein prior to generation, by the feature extractor, of a feature vector of the plurality of extracted features, an output of a last convolutional layer is multiplied by the dense matrix; and a classifier configured to, by the one or more processors, determine, based on the plurality of extracted features, whether the input image contains an object of interest. 2. The system of claim 1 , wherein the memory is a first memory of a first memory type, and further comprising a second memory configured to store coefficients for the plurality of fully connected layers, wherein the second memory is of a second memory type, and wherein the first memory type has a slower access time or lower energy consumption than an access time or energy consumption of the second memory type. 3. The system of claim 2 , wherein the first memory type is DRAM, and wherein the second memory type is SRAM. 4. The system of claim 2 , further comprising a third memory configured to store input image coefficients, wherein the third memory is of a third memory type and has an access time or energy consumption between the access time or energy consumption of the first memory type and the access time or energy consumption of the second memory type. 5. The system of claim 1 , wherein the nonlinear function is a frequency domain function. 6. The system of claim 1 , wherein a second convolutional layer of the plurality of convolutional layers is configured to: multiply a frequency domain output of the first convolutional layer by the one or more sparse matrices and apply a second nonlinear function to a result of the multiplication. 7. The system of claim 1 , further comprising a camera configured to capture video, and wherein the input image is a video frame captured by the camera. 8. The system of claim 7 , wherein the system is part of a virtual reality or augmented reality system. 9. A method, comprising: receiving an input image; generating a frequency domain representation of the input image; in a convolutional neural network comprising a plurality of convolutional layers and at least one fully connected layer, extracting a plurality of features based at least in part on the frequency domain representation of the input image and a sparse, frequency domain representation of a convolutional weighting kernel, wherein the sparse, frequency domain representation of the convolutional weighting kernel comprises a dense matrix and one or more sparse matrices, wherein the extracting comprises, in a first convolutional layer of the plurality of convolutional layers, multiplying the frequency domain representation of the input image by the one or more sparse matrices and applying a nonlinear function to a result of the multiplying, and wherein prior to generation of a feature vector of the plurality of extracted features, an output of a last convolutional layer is multiplied by the dense matrix; classifying the input image based on the plurality of extracted features; and based on the classifying, identifying the input image as containing an object of interest. 10. The method of claim 9 , wherein extracting the plurality of features comprises: performing convolutional processing in a convolutional portion of the convolutional neural network; and based on an output of the convolutional processing, performing fully connected processing in a fully connected portion of the convolutional neural network, wherein an output of the fully connected processing comprises the extracted features. 11. The method of claim 9 , wherein values for the convolutional weighting kernel are determined through training, wherein the one or more sparse matrices are stored in a first memory of a first memory type, wherein the dense matrix is stored in a second memory of a second memory type, and wherein the first memory type has a slower access time than the second memory type. 12. The method of claim 11 , wherein the first memory type has lower energy consumption than the second memory type. 13. One or more computer-readable storage media storing computer-executable instructions for recognizing images, the recognizing comprising: receiving an input image; generating a frequency domain representation of the input image; determining a sparse, frequency domain representation of a convolutional weighting kernel, the sparse, frequency domain representation comprising one or more sparse matrices and a dense matrix; in a plurality of convolutional layers of a deep convolutional neural network, processing the input image based on the frequency domain representation of the input image, the one or more sparse matrices, and a frequency domain nonlinear function; in a plurality of fully connected layers of the deep convolutional neural network, processing the input image based on an output of the plurality of convolutional layers; determining a plurality of extracted features based on an output of the plurality of fully connected layers, wherein prior to determination of a feature vector of the plurality of extracted features, an output of a last convolutional layer is multiplied by the dense matrix; classifying the input image based on the plurality of extracted features; and based on the classification, identifying the input image as containing an object of interest. 14. The one or more computer-readable storage media of claim 13 , wherein the one or more sparse matrices are stored in a first memory of a first memory type, wherein the dense matrix is stored in a second memory of a second memory type, and wherein the first memory type has a slower access time than the second memory type.

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

  • relating to the classification model, e.g. parametric or non-parametric approaches · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9904874B2 cover?
Systems, methods, and computer media for implementing convolutional neural networks efficiently in hardware are disclosed herein. A memory is configured to store a sparse, frequency domain representation of a convolutional weighting kernel. A time-domain-to-frequency-domain converter is configured to generate a frequency domain representation of an input image. A feature extractor is configured…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).