What technology area does this patent fall under?

Primary CPC classification G06F40/123. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Slim embedding layers for recurrent neural language models

US11030997B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11030997-B2
Application number	US-201816197945-A
Country	US
Kind code	B2
Filing date	Nov 21, 2018
Priority date	Nov 22, 2017
Publication date	Jun 8, 2021
Grant date	Jun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods for compressing or otherwise reducing the memory requirements for storing and computing the model parameters in recurrent neural language models. Embodiments include space compression methodologies that share the structured parameters at the input embedding layer, the output embedding layers, or both of a recurrent neural language model to significantly reduce the size of model parameters, but still compactly represent the original input and output embedding layers. Embodiments of the methodology are easy to implement and tune. Experiments on several data sets show that embodiments achieved similar perplexity and BLEU score results while only using a fraction of the parameters.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for compressing a matrix of a neural network model, the method comprising: for each vector from a set of vectors from the matrix, dividing the vector from the matrix into a plurality of parts; for each part of the vector from the matrix, mapping the part to a substitute sub-vector, which comprises one or more parameters, wherein the substitute sub-vector is selected from a set of substitute sub-vectors, which set has fewer substitute sub-vectors than there are parts mapped from the matrix; and training the neural network model using the mapped substitute sub-vectors until a stop condition is reached. 2. The computer-implemented method of claim 1 wherein the vector is an input word embedding vector, an output word embedding vector, or both. 3. The computer-implemented method of claim 1 wherein the step of mapping the part to a substitute sub-vector further comprises forming a mapping table and the mapping table is fixed during training but the one or more parameters of each substitute sub-vector are subject to updating during training. 4. The computer-implemented method of claim 1 wherein the step of mapping the part to a substitute sub-vector comprises: initializing a list of substitute sub-vector indicators, the list comprising a same number of entries for a sub-vector indicator as there are parts in the vector; shuffling the list; and generating a mapping table from the shuffled list. 5. The computer-implemented method of claim 4 wherein the list is randomly shuffled. 6. The computer-implemented method of claim 1 wherein the step of mapping the part to a substitute sub-vector comprises: using a pre-trained matrix to estimate sub-vectors to facilitate mapping of parts of the matrix with similar estimated sub-vectors to the same substitute sub-vector. 7. The computer-implemented method of claim 6 wherein the step of using a pre-trained matrix to estimate sub-vectors to facilitate mapping of parts of the matrix with similar estimated sub-vectors to the same substitute sub-vector comprises: clustering parts of the pre-trained source embedding matrix into a plurality of clusters; and mapping the parts of the matrix that correspond to parts of the pre-trained matrix that were in the same cluster to the same substitute sub-vector. 8. The computer-implemented method of claim 7 wherein the number of clusters in the plurality of clusters corresponds to the number of substitute sub-vectors. 9. A computer-implemented method for compressing embedding of a neural network model, the method comprising: dividing each word embedding vector of an embedding matrix having V word embedding vectors into K parts, K being a number larger than 1, each part comprising at least two elements; for each part of the V*K parts of the embedding matrix, mapping the part to one of M substitute sub-vectors comprising one or more parameters, wherein M is a number less than V*K; and training the neural network model using the mapped substitute sub-vectors. 10. The computer-implemented method of claim 9 wherein each word embedding vector is divided into K parts evenly. 11. The computer-implemented method of claim 9 wherein the step of mapping the part to one of M substitute sub-vectors comprises: initializing a list of V*K sub-vector indicator entries, each indicator entry representing one of the M substitute sub-vectors; randomly shuffling the list; and generating a mapping table from the randomly shuffled list. 12. The computer-implemented method of claim 9 wherein the step of mapping the part to one of M substitute sub-vectors comprises: using a pre-trained embedding matrix to estimate sub-vectors to facilitate mapping of parts of the embedding matrix with similar estimated sub-vectors to the same substitute sub-vector. 13. The computer-implemented method of claim 11 wherein the step of using a pre-trained embedding matrix to estimate sub-vectors to facilitate mapping of parts of the embedding matrix with similar estimated sub-vectors to the same substitute sub-vector comprises: clustering parts of the pre-trained embedding matrix into a plurality of clusters; and mapping the parts of the embedding matrix that correspond to parts of the pre-trained embedding matrix that were in the same cluster to the same substitute sub-vector. 14. The computer-implemented method of claim 9 wherein the step of mapping the parts to one of M substitute sub-vectors further comprises forming a mapping table and the mapping table is fixed during training but the one or more parameters of each substitute sub-vectors are subject to updating during training. 15. A computer-implemented method for compressing an output word embedding layer of a neural network model, the method comprising: mapping an output embedding vector into K sub-vectors, K being a number larger than 1, each part comprising at least two elements; dividing a hidden vector of the neural network model into K parts; for each pair in a set of pairs, obtaining and storing a partial dot product for the pair, in which a pair comprises a hidden vector part and a corresponding output embedding sub-vector; for a word, using at least some of the stored partial dot products, which are selected according to the mapping, to obtain a sum value; and normalizing the sum value by a softmax non-linearity function in a softmax layer in the neural network model to obtain an output probability for the word. 16. The computer-implemented method of claim 15 wherein the K sub-vectors are respectively selected from K non-overlap sub-vector sets. 17. The computer-implemented method of claim 15 wherein the K sub-vectors are uniformly mapped. 18. The computer-implemented method of claim 15 wherein the neural network model is a recurrent neural model. 19. The computer-implemented method of claim 15 wherein the K sub-vectors are estimated by pre-training an output embedding matrix and are assigned using a clustering method. 20. The computer-implemented method of claim 19 wherein the K sub-vectors are shared with an input word embedding layer of the neural network model.

Assignees

Baidu Usa Llc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

View patent family 66533225

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11030997B2 cover?: Described herein are systems and methods for compressing or otherwise reducing the memory requirements for storing and computing the model parameters in recurrent neural language models. Embodiments include space compression methodologies that share the structured parameters at the input embedding layer, the output embedding layers, or both of a recurrent neural language model to significantly …
Who is the assignee on this patent?: Baidu Usa Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/123. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Similarity Search Using Progressive Inner Products and Bounds

Cooperatively training and/or using separate input and response neural network models for determining response(s) for electronic communications

Method and system for distributed machine learning

Frequently asked questions