Specializing neural networks for heterogeneous systems

US11620516B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620516-B2
Application numberUS-201916724849-A
CountryUS
Kind codeB2
Filing dateDec 23, 2019
Priority dateDec 23, 2019
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure advantageously provides a heterogenous system, and a method for generating an artificial neural network (ANN) for a heterogenous system. The heterogenous system includes a plurality of processing units coupled to a memory configured to store an input volume. The plurality of processing units includes first and second processing units. The first processing unit includes a first processor and is configured to execute a first ANN, and the second processing unit includes a second processor and is configured to execute a second ANN. The first and second ANNs respectively include an input layer, at least one processor-optimized hidden layer and an output layer. The second ANN hidden layers are different than the first ANN hidden layers.

First claim

Opening claim text (preview).

What is claimed is: 1. A heterogenous system, comprising: a memory configured to store an input volume having an input width, an input height, an input depth and a plurality of input values, the input depth being determined by a number of input channels; and a plurality of processing units, coupled to the memory, including: a first processing unit, including at least one first processor, configured to execute a first artificial neural network (ANN) including an input layer configured to receive at least a first portion of the input volume, one or more first ANN hidden layers optimized for the first processor, and an output layer; and a second processing unit, including at least one second processor that is different than the first processor, configured to execute a second ANN including an input layer configured to receive at least a second portion of the input volume, one or more second ANN hidden layers optimized for the second processor, and an output layer, the second ANN hidden layers being different than the first ANN hidden layers, where the first ANN output layer generates a first set of normalized probability values or a first set of values, the second ANN output layer generates a second set of normalized probability values or a second set of values, and where the first processing unit is configured to ensemble average the first and second sets of normalized probability values, using respective first and second weights, into a final set of normalized probability values or the first processing unit is configured to concatenate the first and second sets of values into a set of probability values and convert the set of probability values into a set of normalized probability values; and a third processing unit, having at least one third processor that is different than the first processor and the second processor, configured to execute a third ANN including an input layer to receive at least a third portion of the input volume, one or more third ANN hidden layers optimized for the third processor, and an output layer, the third ANN hidden layers being different than the first ANN hidden layers and the second ANN hidden layers, where the first ANN is a first convolutional neural network (CNN) that includes convolutional layers having small and large kernels, activation layers, pooling layers, and fully-connected layers; the second ANN is a second CNN that includes convolutional layers having small and large kernels, activation layers, pooling layers, and fully connected layers, the second CNN convolutional layers having fewer small kernels and more large kernels than the first CNN; and the third ANN is a third CNN that includes convolutional layers having small and large kernels, activation layers, pooling layers, and fully connected layers, the third CNN convolutional layers having fewer small kernels and more large kernels than the first CNN or the second CNN. 2. The heterogenous system of claim 1 , where: the first processing unit is a central processing unit (CPU), the second processing unit is a graphics processing unit (GPU), and the third processing unit is a neural processing unit (NPU). 3. The heterogenous system of claim 1 , where: the small kernel is convolution filter having a size of 3×3 or smaller; and the large kernel is convolution filter having a size of 5×5 or larger. 4. The heterogenous system of claim 1 , where the first processing unit is configured to execute a facial recognition application, the input volume is an image of a face, and the first, second and third ANNs extract facial features from the image. 5. A heterogenous system, comprising: a memory configured to store an input volume having an input width, an input height, an input depth and a plurality of input values, the input depth being determined by a number of input channels; and a plurality of processing units, coupled to the memory, including: a first processing unit, including at least one first processor, configured to execute a first artificial neural network (ANN) including an input layer configured to receive at least a first portion of the input volume, one or more first ANN hidden layers optimized for the first processor, and an output layer; and a second processing unit, including at least one second processor that is different than the first processor, configured to execute a second ANN including an input layer configured to receive at least a second portion of the input volume, one or more second ANN hidden layers optimized for the second processor, and an output layer, the second ANN hidden layers being different than the first ANN hidden layers, the first ANN output layer generates a first set of normalized probability values or a first set of values, the second ANN output layer generates a second set of normalized probability values or a second set of values, and where the first processing unit is configured to ensemble average the first and second sets of normalized probability values, using respective first and second weights, into a final set of normalized probability values or the first processing unit is configured to concatenate the first and second sets of values into a set of probability values and convert the set of probability values into a set of normalized probability values; a third processing unit, having at least one third processor that is different than the first processor and the second processor, configured to execute a third ANN including an input layer to receive at least a third portion of the input volume, one or more third ANN hidden layers optimized for the third processor, and an output layer, the third ANN hidden layers being different than the first ANN hidden layers and the second ANN hidden layers, where: the first ANN output layer generates the first set of normalized probability values, the second ANN output layer generates the second set of normalized probability values, and the third ANN output layer generates a third set of normalized probability values; and the first processing unit is configured to ensemble average the first, second and third sets of normalized probability values, using respective first, second and third weights, into a final set of normalized probability values. 6. The heterogenous system of claim 5 , where the first, second and third weights are 1. 7. The heterogenous system of claim 5 , where the first weight is based on a number of floating point operations per second (FLOPS) for the first processor, the second weight is based on a number of FLOPS for the second processor, and the third weight is based on a number of FLOPS for the third processor. 8. A heterogenous system, comprising: a memory configured to store an input volume having an input width, an input height, an input depth and a plurality of input values, the input depth being determined by a number of input channels; and a plurality of processing units, coupled to the memory, including: a first processing unit, including at least one first processor, configured to execute a first artificial neural network (ANN) including an input layer configured to receive at least a first portion of the input volume, one or more first ANN hidden layers optimized for the first processor, and an output layer; and a second processing unit, including at least one second processor that is different than the first processor, configured to execute a second ANN including an input layer configured to receive at least a second portion of the input volume, one or more second ANN hidden layers optimized for the second processor, and an output layer, the second ANN hidden layers being different than the first ANN hidden layers, the first ANN output layer generates a first set of normalized probability values or a first set of values, the second ANN

Assignees

Inventors

Classifications

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Reinforcement learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620516B2 cover?
The present disclosure advantageously provides a heterogenous system, and a method for generating an artificial neural network (ANN) for a heterogenous system. The heterogenous system includes a plurality of processing units coupled to a memory configured to store an input volume. The plurality of processing units includes first and second processing units. The first processing unit includes a …
Who is the assignee on this patent?
Advanced Risc Mach Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).