Deep learning model for structured outputs with high-order interaction
US-2016098633-A1 · Apr 7, 2016 · US
US10509996B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10509996-B2 |
| Application number | US-201615258691-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 7, 2016 |
| Priority date | May 17, 2016 |
| Publication date | Dec 17, 2019 |
| Grant date | Dec 17, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure is drawn to the reduction of parameters in fully connected layers of neural networks. For a layer whose output is defined by y=Wx, where y is the output vector, x is the input vector, and W is a matrix of connection parameters, vectors uij and vij are defined and submatrices Wi,j are computed as the outer product of uij and vij, so that Wi,j=vij⊗uij, and W is obtained by appending submatrices Wi,j.
Opening claim text (preview).
The invention claimed is: 1. A method for reducing a number of learnable parameters in a fully connected layer of a neural network, the fully connected layer comprising n inputs and m outputs, the method comprising: defining an n-dimensional input vector x representative of n inputs of the layer of the neural network and defining an m-dimensional output vector y representative of the m outputs of the layer; selecting a divisor s of m and a divisor t of n; partitioning the output vector y into equally sized subvectors y i of length s and partitioning the input vector x into equally sized subvectors x j of length t; learning a vector u ij comprising t learnable parameters and a vector v ij comprising s learnable parameters for i=(1, . . . , m/s) and j=(1, . . . , n/t) during a training phase of the neural network; computing submatrices W ij as an outer product of the vector u ij and the vector v ij so that W ij =u ij T ⊗v ij ; and computing the output vector y representative of the m outputs of the layer from the input vector x and the submatrices W ij . 2. The method of claim 1 , wherein computing the output vector y representative of the m outputs of the layer from the input vector x and the submatrices W ij comprises: computing y i =Σ j=1 n/t (W ij x j ) for i=(1, . . . . m/s); and appending all subvectors y, to obtain the output vector y as y=[y 1 ,y 2 ,y 3 , . . . ym /s]. 3. The method of claim 1 , wherein computing the output vector y representative of the m outputs of the layer from the input vector x and the submatrices W ij comprises: appending submatrices W ij for i=(1, . . . , m/s) and j=(1, . . . , n/t) to obtain matrix W; and computing y=W·x. 4. The method of claim 1 , further comprising storing the vectors v ij and u ij . 5. The method of claim 1 , wherein computing the output vector y representative of the m outputs of the layer from the input vector x and the submatrices W ij comprises: computing y i =Σ j=1 n/t v ij (u ij T x j ) for i=(1, . . . ,m/s); and appending all subvectors y i to obtain the output vector y as y=[y 1 ,y 2 ,y 3 , . . . ,ym /s ]. 6. A system comprising: a processing unit; and a non-transitory memory communicatively coupled to the processing unit and comprising computer-readable program instructions executable by the processing unit for reducing a number of learnable parameters in a fully connected layer of a neural network comprising n inputs and m outputs by: defining an n-dimensional input vector x representative of the n inputs of the aver and defining an m-dimensional output vector y representative of the m outputs of the layer; selecting a divisor s of m and a divisor t of n; partitioning the output vector y into equally sized subvectors y i of length s and partitioning the input vector x into equally sized subvectors x j of length t; learning a vector u ij comprising t learnable parameters and a vector v ij comprising s learnable parameters for i=(1, . . . , m/s) and j=(1, . . . , n/t) during a training phase of the neural network; computing submatrices W ij , as an outer product of the learned vector u ij and the learned vector v ij so that W ij =u ij T ⊗v ij ; and computing the output vector y representative of the in outputs of the layer from the input vector x and the submatrices W ij . 7. The system of claim 6 , wherein computing the output vector y representative of the m outputs of the layer from the input vector x and the submatrices W ij comprises: computing y i =Σ j=1 n/t (W ij x j )) for i=(1, . . . , m/s); and appending all subvectors y i to obtain the output vector y as y=[y 1 ,y 2 ,y 3 , . . . ,ym /s ]. 8. The system of claim 6 , wherein computing the output vector y representative of the m outputs of the layer from the input vector x and the submatrices W ij comprises: appending submatrices W ij for i=(1, . . . , m/s) and j=(1, . . . , n/t) to obtain matrix W; and computing y=W·x. 9. The system of claim 6 , the non-transitory memory further comprising computer-readable program instructions executable by the processing unit for storing the vectors v ij and u ij . 10. The system of claim 6 , wherein computing the output vector y representative of the m outputs of the layer from the input vector x and the submatrices W ij comprises: computing y i =Σ j=1 n/t v ij (u ij T x j ) for i=(1, . . . , m/s); and appending all subvectors y i to obtain the output vector y as y=[y 1 ,y 2 ,y 3 , . . . ,ym /s ]. 11. A method for implementing a neural network layer comprising n inputs and m outputs, the method comprising: receiving an n-dimensional input vector x representative of the n inputs of the layer of a neural network; computing an m-dimensional output vector y representative of the m outputs of the layer of the neural network by: retrieving from memory a vector v ij comprising s learned parameters and a vector u ij comprising t learned parameters, wherein the vectors u ij and v ij are learned during a training phase of the neural network; partitioning the output vector y into equally sized subvectors y i of length s and partitioning the input vector x into equally sized subvectors x j of length t; computing submatrices W ij as an outer product of the vector u ij and the vector v ij so that W ij =u ij T ⊗v ij ; and computing the output vector y from the input vector x and the submatrices W ij ; and outputting the output vector y as the m outputs of the neural network layer. 12. The method of claim 11 , wherein computing the output vector y from the input vector x and the submatrices W ij comprises: computing y i =Σ j=1 n/t (W ij x j ) for i=(1, . . . , m/s); and appending all subvectors y i to obtain the output vector y as y=[y 1 ,y 2 ,y 3 , . . . ,ym /s ]. 13. The method of claim 11 , wherein computing the output vector y from the input vector x and the submatrices W ij comprises: appending submatrices W ij to obtain matrix W; and computing y=W·x. 14. The method of claim 11 , wherein computing the output vector y from the input vector x and the submatrices W ij comprises: computing y i =Σ j=1 n/t v ij (u ij T x j ) for i=(1, . . . , m/s); and appending all subvectors y i to obtain the output vector y as y=[y 1 ,y 2 ,y 3 , . . . ,ym /s ]. 15. A system for implementing a neural network layer comprising n inputs and m outputs, the system comprising: a processing unit; and a non-transitory memory communicatively coupled to the processing unit and comprising computer-readable program instructions executable by the processing unit for: receiving an n-dimensional input vector x representative of the n inputs of the layer of a neural network; computing an m-dimensional output vector y representative of the m outputs of the layer of the neural network by: retrieving from memory a vector v ij comprising s learned parameters and a vector u ij comprising t learned parameters, wherein the learned parameters included in the vectors v ij and u ij are learned during a training phase of the neural network; partitioning the output vector y into equally sized subvectors y i of length s and partitioning the input vector x into equally sized subvectors x j using divisor t; computing submatrices W ij as an outer product of the vector u ij and the vector v ij so that W ij =u ij T ⊗v ij ;and computing the output vector y from the input vector x and the submatrices W ij and outputting the output vector y at the m outputs of the layer of the neural network. 16. The system of claim 15 , wherein computing the output vector y fr
Architecture, e.g. interconnection topology · CPC title
Learning methods · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Feedforward networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.