Pre-trained projection networks for transferable natural language representations

US11526680B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11526680-B2
Application numberUS-202016790917-A
CountryUS
Kind codeB2
Filing dateFeb 14, 2020
Priority dateFeb 14, 2019
Publication dateDec 13, 2022
Grant dateDec 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided to pre-train projection networks for use as transferable natural language representation generators. In particular, example pre-training schemes described herein enable learning of transferable deep neural projection representations over randomized locality sensitive hashing (LSH) projections, thereby surmounting the need to store any embedding matrices because the projections can be dynamically computed at inference time.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a pre-trained projection neural network configured to receive a language input comprising one or more units of text and to dynamically generate an intermediate representation from the language input, the projection neural network comprising: a sequence of one or more projection layers, wherein each projection layer is configured to receive a layer input and apply a plurality of projection layer functions to the layer input to generate a projection layer output; and a sequence of one or more intermediate layers configured to receive the projection layer output generated by a last projection layer in the sequence of one or more projection layers and to generate one or more intermediate layer outputs, wherein the intermediate representation comprises the intermediate layer output generated by a last intermediate layer in the sequence of one or more intermediate layers; instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining the language input; inputting the language input into the pre-trained projection neural network; and receiving the intermediate representation as an output of the pre-trained projection neural network. 2. The computing system of claim 1 , wherein: the one or more non-transitory computer-readable media further collectively store a machine-learned prediction model configured to receive the intermediate representation and to generate a prediction from the intermediate representation; and the operations further comprise: inputting the intermediate representation into the machine-learned prediction model; and receiving the prediction as an output of the machine-learned prediction model. 3. The computing system of claim 1 , wherein the pre-trained projection neural network was previously trained as part of an autoencoder model, the autoencoder model comprising: the pre-trained projection neural network configured to receive the language input and to generate the intermediate representation; and a decoder model configured to receive the intermediate representation and to generate a reconstructed language input based on the intermediate representation. 4. The computing system of claim 3 , wherein the autoencoder model is trained to maximize a probability of the reconstructed language input matching the language input on a token-by-token basis. 5. The computing system of claim 1 , wherein the pre-trained projection neural network was previously trained as a projection skip-gram model configured to receive an input word and to predict a plurality of context words surrounding the input word. 6. The computing system of claim 5 , wherein the projection skip-gram model was trained using an objective function that includes a regularization term that provides a penalty that has a magnitude that is positively correlated with a sum of a cosine similarity between the respective intermediate representations produced by the projection neural network for each pair of words in a training batch. 7. The computing system of claim 2 , wherein one or both: (1) the projection neural network was previously trained using an unsupervised learning technique and at least the machine-learned prediction model was trained using a supervised learning technique; or (2) the projection neural network was previously trained using a first set of training data comprising a first plurality of training examples and at least the machine-learned prediction model was trained using a second, different set of training data comprising a second plurality of training examples. 8. The computing system of claim 1 , wherein the projection neural network further comprises a feature extraction layer configured to receive the language input and generate a feature vector that comprises features extracted from the language input, wherein the layer input for a first projection layer of the one or more projection layers comprises the feature vector, and wherein the features extracted from the language input comprise one or more of the following: skip-grams; n-grams; part of speech tags; dependency relationships; knowledge graph information; or contextual information. 9. The computing system of claim 1 , wherein, for each projection layer, the plurality of projection layer functions are precomputed and held static. 10. The computing system of claim 1 , wherein, for each projection layer, the plurality of projection layer functions are modeled using locality sensitive hashing. 11. The computing system of claim 1 , the operations further comprise: dynamically computing the plurality of projection layer functions at inference time using one or more seeds. 12. The computing system of claim 1 , wherein the projection neural network performs natural language processing without initializing, loading, or storing any feature or vocabulary weight matrices. 13. The computing system of claim 1 , wherein, for each projection layer, each projection function is associated with a respective set of projection vectors, and wherein applying each projection function to the layer input comprises: for each projection vector: determining a dot product between the layer input and the projection vector; when the dot product is negative, assigning a first value to a corresponding position in the projection function output; and when the dot product is positive, assigning a second value to the corresponding position in the projection function output. 14. The computing system of claim 1 , wherein, for each projection layer, the projection functions are each encoded as sparse matrices and are used to generate a binary representation from the layer input. 15. The computing system of claim 1 , wherein the intermediate representation comprises a numerical feature vector. 16. A computer-implemented method to pre-train a projection neural network comprising one or more projection layers and one or more intermediate layers, each projection layer configured to apply one or more projection functions to project a layer input into a different dimensional space, the projection neural network configured to receive an input and to generate an intermediate representation for the input, the method comprising: accessing, by one or more computing devices, a set of training data comprising a plurality of example inputs; inputting, by the one or more computing devices, each of the plurality of example inputs into the projection neural network; receiving, by the one or more computing devices, a respective intermediate representation for each of the plurality of example inputs as an output of the projection neural network; inputting, by the one or more computing devices, each respective intermediate representation into a decoder model configured to reconstruct inputs based on intermediate representations; receiving, by the one or more computing devices, a respective reconstructed input for each of the plurality of example inputs as an output of the decoder model; and learning, by the one or more computing devices, one or more parameter values for the one or more intermediate layers of the projection neural network based at least in part on a comparison of each respective reconstructed input to the corresponding example input. 17. The computer-implemented method of claim 16 , wherein learning, by the one or more computing devices, the one or more parameter values for the one or more inter

Assignees

Inventors

Classifications

  • G06F40/216Primary

    using statistical methods · CPC title

  • using neural networks · CPC title

  • G06F40/56Primary

    Natural language generation · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11526680B2 cover?
Systems and methods are provided to pre-train projection networks for use as transferable natural language representation generators. In particular, example pre-training schemes described herein enable learning of transferable deep neural projection representations over randomized locality sensitive hashing (LSH) projections, thereby surmounting the need to store any embedding matrices because …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).