Method and apparatus for hardware-accelerated machine learning

US11416778B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11416778-B2
Application numberUS-202017101495-A
CountryUS
Kind codeB2
Filing dateNov 23, 2020
Priority dateDec 22, 2016
Publication dateAug 16, 2022
Grant dateAug 16, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A feature extractor for a convolutional neural network (CNN) is disclosed, wherein the feature extractor is deployed on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP). A processing pipeline can be implemented on the member, where the processing pipeline implements a plurality convolution layers for the CNN, wherein each of a plurality of the convolutional layers comprises (1) a convolution stage that convolves first data with second data if activated and (2) a sub-sampling stage that performs a member of the group consisting of (i) a max pooling operation, (ii) an averaging operation, and (iii) a sampling operation on data received thereby if activated. The processing pipeline can be controllable with respect to which of the convolution stages are activated/deactivated and which of the sub-sampling stages are activated/deactivated when processing streaming data through the processing pipeline. The deactivated convolution and sub-sampling stages can remain instantiated within the processing pipeline but act as pass-throughs when deactivated. The processing pipeline performs feature vector extraction on the streaming data using the activated convolution stages and the activated sub-sampling stages.

First claim

Opening claim text (preview).

What is claimed is: 1. A machine-learning apparatus comprising: a feature extractor for a convolutional neural network (CNN), wherein the feature extractor is deployed on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP), wherein the member comprises a processing pipeline that implements a plurality of convolution layers for the CNN, wherein each of a plurality of the convolutional layers comprises: a convolution stage that convolves first data with second data if activated; and a sub-sampling stage that performs a member of the group consisting of (i) a max pooling operation, (ii) an averaging operation, and (iii) a sampling operation on data received thereby if activated; wherein the processing pipeline is controllable with respect to (1) which of the convolution stages are activated, (2) which of the convolution stages are deactivated, (3) which of the sub-sampling stages are activated, and (4) which of the sub-sampling stages are deactivated when processing streaming data through the processing pipeline, wherein the deactivated convolution stages and the deactivated sub-sampling stages remain instantiated within the processing pipeline but act as pass-throughs when deactivated; and wherein the processing pipeline receives streaming data and performs feature vector extraction on the streaming data using the activated convolution stages and the activated sub-sampling stages. 2. The apparatus of claim 1 wherein each of a plurality of the sub-sampling stages comprises: max pooling logic; averaging logic; and sampling logic; wherein the sub-sampling stage is controllable with respect to whether the max pooling logic, the averaging logic, or the sampling logic is activated within the sub-sampling stage. 3. The apparatus of claim 1 wherein each of a plurality of the convolution stages comprises: correlation logic that convolves a sliding window of the first data with the second data. 4. The apparatus of claim 3 wherein a first convolution stage in the processing pipeline processes pixel data from an image as the first data and a plurality of weights as the second data. 5. The apparatus of claim 3 wherein each of a plurality of the convolution stages includes (1) a data shift register through which the first data is streamed and (2) a register that holds the second data, and wherein the correlation logic of each convolution stage comprises a plurality of multipliers and summation logic, wherein the multipliers multiply values in a plurality of cells of the data shift register and the register, and wherein the summation logic is connected to a plurality of outputs of the multipliers and sums the outputs from the multipliers. 6. The apparatus of claim 1 wherein the processing pipeline is controllable to deactivate at least one of the convolution stages or sub-sampling stages while a plurality of the convolution stages and a plurality of the sub-sampling stages are activated. 7. The apparatus of claim 1 wherein the processing pipeline is controllable to activate and deactivate different mixes of the convolution stages and sub-sampling stages based on whether the processing pipeline is to operate in a training mode or a classification mode. 8. The apparatus of claim 1 wherein the processing pipeline is controllable to disconnect power from a deactivated convolution stage or sub-sampling stage while retaining power to the activated convolution stages and sub-sampling stages. 9. The apparatus of claim 1 wherein the processing pipeline further comprises at least one of (1) an encryption engine, (2) a decryption engine, (3) a compression engine, (4) a decompression engine, or (5) a search engine. 10. The apparatus of claim 1 wherein the member comprises the reconfigurable logic device, wherein at least a portion of the processing pipeline resides on the reconfigurable logic device. 11. The apparatus of claim 10 wherein the member further comprises at least one of a GPU or a CMP, and wherein another portion of the processing pipeline resides on the at least one GPU or CMP. 12. The apparatus of claim 10 wherein the reconfigurable logic device comprises a field programmable gate array (FPGA), wherein at least a portion of the processing pipeline resides on the FPGA. 13. The apparatus of claim 1 wherein the member comprises the GPU, wherein at least a portion of the processing pipeline resides on the GPU. 14. The apparatus of claim 13 wherein the member further comprises at least one of a reconfigurable logic device or a CMP, and wherein another portion of the processing pipeline resides on the at least one reconfigurable logic device or CMP. 15. The apparatus of claim 1 wherein the member comprises the CMP, wherein at least a portion of the processing pipeline resides on the CMP. 16. A machine-learning apparatus comprising: a feature extractor for a convolutional neural network (CNN), wherein the feature extractor is deployed on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP), wherein the member comprises a processing pipeline; wherein the processing pipeline implements a controllable number of convolution layers for the CNN, the processing pipeline comprising (1) a plurality of convolution stages that convolve first data with second data if activated and (2) a plurality of sub-sampling stages that perform a member of the group consisting of (i) a max pooling operation, (ii) an averaging operation, and (iii) a sampling operation on data received thereby if activated; wherein the processing pipeline is controllable with respect to (1) which of the convolution stages are activated, (2) which of the convolution stages are deactivated, (3) which of the sub-sampling stages are activated, and (4) which of the sub-sampling stages are deactivated when processing streaming data through the processing pipeline, wherein the deactivated convolution stages and the deactivated sub-sampling stages remain instantiated within the processing pipeline but act as pass-throughs when deactivated; and wherein the processing pipeline receives streaming data and performs feature vector extraction on the streaming data using the activated convolution stages and the activated sub-sampling stages. 17. The apparatus of claim 16 wherein each of a plurality of the sub-sampling stages comprises: max pooling logic; averaging logic; and sampling logic; wherein the sub-sampling stage is controllable with respect to whether the max pooling logic, the averaging logic, or the sampling logic is activated within the sub-sampling stage. 18. The apparatus of claim 16 wherein each of a plurality of the convolution stages comprises: correlation logic that convolves a sliding window of the first data with the second data. 19. The apparatus of claim 18 wherein a first convolution stage in the processing pipeline processes pixel data from an image as the first data and a plurality of weights as the second data. 20. A machine-learning method comprising: controlling which of a plurality of convolution stages and which of a plurality of sub-sampling stages are activated and deactivated within a processing pipeline that serves as a feature extractor for a convolutional neural network (CNN), wherein the processing pipeline is deployed on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • H04L9/14Primary

    using a plurality of keys or algorithms · CPC title

  • Plurality of storage devices · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11416778B2 cover?
A feature extractor for a convolutional neural network (CNN) is disclosed, wherein the feature extractor is deployed on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP). A processing pipeline can be implemented on the member, where the processing pipeline implements a plurality convolution layers fo…
Who is the assignee on this patent?
Ip Reservoir Llc
What technology area does this patent fall under?
Primary CPC classification H04L9/14. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).