Memory bandwidth management for deep learning applications

US2016379111A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016379111-A1
Application numberUS-201514750277-A
CountryUS
Kind codeA1
Filing dateJun 25, 2015
Priority dateJun 25, 2015
Publication dateDec 29, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a data center, neural network evaluations can be included for services involving image or speech recognition by using a field programmable gate array (FPGA) or other parallel processor. The memory bandwidth limitations of providing weighted data sets from an external memory to the FPGA (or other parallel processor) can be managed by queuing up input data from the plurality of cores executing the services at the FPGA (or other parallel processor) in batches of at least two feature vectors. The at least two feature vectors can be at least two observation vectors from a same data stream or from different data streams. The FPGA (or other parallel processor) can then act on the batch of data for each loading of the weighted datasets.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of performing neural network processes, the method comprising: receiving, at a field programmable gate array (FPGA), a batch of input data for accelerated processing of a neural network evaluation, wherein the batch of input data comprises at least two feature vectors; loading the FPGA with a first layer set of weights for the neural network evaluation from an external memory; and applying, within the FPGA, the first layer set of weights to the batch of the input data to generate intermediates. 2 . The method of claim 1 , wherein the at least two feature vectors comprise one observation vector from each of at least two data streams. 3 . The method of claim 2 , wherein the neural network evaluation is a recurrent neural network evaluation. 4 . The method of claim 1 , wherein the at least two feature vectors comprise at least two observation vectors from each of at least two data streams. 5 . The method of claim 1 , wherein the at least two feature vectors comprise at least two observation vectors from a single data stream. 6 . The method of claim 1 , further comprising: after applying the first layer set of weights to the batch, loading the FPGA with a second layer set of weights for the neural network evaluation from the external memory; and applying, within the FPGA, the second layer set of weights to the intermediates. 7 . The method of claim 1 , wherein the neural network evaluation is a deep neural network multi-layer perceptron evaluation. 8 . One or more computer readable storage media having instructions stored thereon that when executed by a processing system, direct the processing system to manage memory bandwidth for deep learning applications by: directing a batch of at least two observation vectors from at least one core to queue up at a field programmable gate array (FPGA); loading at least one weighted dataset on the FPGA, each of the at least one weighted dataset being loaded once per batch of the at least two observation vectors directed to queue up at the FPGA; and directing an evaluation output from the FPGA to the at least one core for further processing. 9 . The media of claim 8 , wherein the instructions that direct the batch of the at least two observation vectors from at least one core to queue up at the FPGA direct one observation vector from each of at least two cores to queue up at the FPGA. 10 . The media of claim 8 , wherein the instructions that direct the batch of the at least two observation vectors from at least one core to queue up at the FPGA direct at least two observation vectors from each of at least two cores to queue up at the FPGA. 11 . A system comprising: one or more storage media; a plurality of processing cores; a service, the service being stored on at least one of the one or more storage media and executed on at least the plurality of processing cores; a parallel processor in communication with the plurality of cores to perform a neural network evaluation on a batch of data for a process of the service; and weight datasets for the neural network evaluation stored on at least one of the one or more storage media. 12 . The system of claim 11 , wherein the parallel processor is a field programmable gate array (FPGA). 13 . The system of claim 11 , wherein the parallel processor receives one observation vector from each core of the plurality of cores as the batch of data. 14 . The system of claim 13 , wherein the neural network evaluation comprises a recurrent neural network evaluation. 15 . The system of claim 11 , wherein the neural network evaluation comprises a deep neural network multi-layer perceptron evaluation. 16 . The system of claim 11 , wherein the parallel processor receives at least two observation vectors from each core of the plurality of cores as the batch of data. 17 . The system of claim 11 , wherein the service comprises a speech recognition service. 18 . The system of claim 11 , further comprising: a manager agent stored, at least in part, on at least one of the one or more storage media, that when executed, directs the system to: direct the batch of data from at least one of the plurality of processing cores to queue up at the parallel processor; load at least one weighted dataset of the weight datasets onto the parallel processor, each of the at least one weighted dataset being loaded once per batch; and direct an evaluation output from the parallel processor to the plurality of processing cores. 19 . The system of claim 18 , wherein the manager agent directs the system to direct the batch of data to queue up at the parallel processor by directing at least one observation vector from of each of at least two cores of the plurality of cores to the parallel processor. 20 . The system of claim 18 , wherein the manager agent directs the system to direct the batch of data to queue up at the parallel processor by directing at least two observation vectors from of each of at least two cores of the plurality of cores to the parallel processor.

Assignees

Inventors

Classifications

  • G06N3/044Primary

    Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016379111A1 cover?
In a data center, neural network evaluations can be included for services involving image or speech recognition by using a field programmable gate array (FPGA) or other parallel processor. The memory bandwidth limitations of providing weighted data sets from an external memory to the FPGA (or other parallel processor) can be managed by queuing up input data from the plurality of cores executing…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/044. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).