Training, validating, and monitoring artificial intelligence and machine learning models
US-2019147371-A1 · May 16, 2019 · US
US11526680B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11526680-B2 |
| Application number | US-202016790917-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 14, 2020 |
| Priority date | Feb 14, 2019 |
| Publication date | Dec 13, 2022 |
| Grant date | Dec 13, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided to pre-train projection networks for use as transferable natural language representation generators. In particular, example pre-training schemes described herein enable learning of transferable deep neural projection representations over randomized locality sensitive hashing (LSH) projections, thereby surmounting the need to store any embedding matrices because the projections can be dynamically computed at inference time.
Opening claim text (preview).
What is claimed is: 1. A computing system, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a pre-trained projection neural network configured to receive a language input comprising one or more units of text and to dynamically generate an intermediate representation from the language input, the projection neural network comprising: a sequence of one or more projection layers, wherein each projection layer is configured to receive a layer input and apply a plurality of projection layer functions to the layer input to generate a projection layer output; and a sequence of one or more intermediate layers configured to receive the projection layer output generated by a last projection layer in the sequence of one or more projection layers and to generate one or more intermediate layer outputs, wherein the intermediate representation comprises the intermediate layer output generated by a last intermediate layer in the sequence of one or more intermediate layers; instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining the language input; inputting the language input into the pre-trained projection neural network; and receiving the intermediate representation as an output of the pre-trained projection neural network. 2. The computing system of claim 1 , wherein: the one or more non-transitory computer-readable media further collectively store a machine-learned prediction model configured to receive the intermediate representation and to generate a prediction from the intermediate representation; and the operations further comprise: inputting the intermediate representation into the machine-learned prediction model; and receiving the prediction as an output of the machine-learned prediction model. 3. The computing system of claim 1 , wherein the pre-trained projection neural network was previously trained as part of an autoencoder model, the autoencoder model comprising: the pre-trained projection neural network configured to receive the language input and to generate the intermediate representation; and a decoder model configured to receive the intermediate representation and to generate a reconstructed language input based on the intermediate representation. 4. The computing system of claim 3 , wherein the autoencoder model is trained to maximize a probability of the reconstructed language input matching the language input on a token-by-token basis. 5. The computing system of claim 1 , wherein the pre-trained projection neural network was previously trained as a projection skip-gram model configured to receive an input word and to predict a plurality of context words surrounding the input word. 6. The computing system of claim 5 , wherein the projection skip-gram model was trained using an objective function that includes a regularization term that provides a penalty that has a magnitude that is positively correlated with a sum of a cosine similarity between the respective intermediate representations produced by the projection neural network for each pair of words in a training batch. 7. The computing system of claim 2 , wherein one or both: (1) the projection neural network was previously trained using an unsupervised learning technique and at least the machine-learned prediction model was trained using a supervised learning technique; or (2) the projection neural network was previously trained using a first set of training data comprising a first plurality of training examples and at least the machine-learned prediction model was trained using a second, different set of training data comprising a second plurality of training examples. 8. The computing system of claim 1 , wherein the projection neural network further comprises a feature extraction layer configured to receive the language input and generate a feature vector that comprises features extracted from the language input, wherein the layer input for a first projection layer of the one or more projection layers comprises the feature vector, and wherein the features extracted from the language input comprise one or more of the following: skip-grams; n-grams; part of speech tags; dependency relationships; knowledge graph information; or contextual information. 9. The computing system of claim 1 , wherein, for each projection layer, the plurality of projection layer functions are precomputed and held static. 10. The computing system of claim 1 , wherein, for each projection layer, the plurality of projection layer functions are modeled using locality sensitive hashing. 11. The computing system of claim 1 , the operations further comprise: dynamically computing the plurality of projection layer functions at inference time using one or more seeds. 12. The computing system of claim 1 , wherein the projection neural network performs natural language processing without initializing, loading, or storing any feature or vocabulary weight matrices. 13. The computing system of claim 1 , wherein, for each projection layer, each projection function is associated with a respective set of projection vectors, and wherein applying each projection function to the layer input comprises: for each projection vector: determining a dot product between the layer input and the projection vector; when the dot product is negative, assigning a first value to a corresponding position in the projection function output; and when the dot product is positive, assigning a second value to the corresponding position in the projection function output. 14. The computing system of claim 1 , wherein, for each projection layer, the projection functions are each encoded as sparse matrices and are used to generate a binary representation from the layer input. 15. The computing system of claim 1 , wherein the intermediate representation comprises a numerical feature vector. 16. A computer-implemented method to pre-train a projection neural network comprising one or more projection layers and one or more intermediate layers, each projection layer configured to apply one or more projection functions to project a layer input into a different dimensional space, the projection neural network configured to receive an input and to generate an intermediate representation for the input, the method comprising: accessing, by one or more computing devices, a set of training data comprising a plurality of example inputs; inputting, by the one or more computing devices, each of the plurality of example inputs into the projection neural network; receiving, by the one or more computing devices, a respective intermediate representation for each of the plurality of example inputs as an output of the projection neural network; inputting, by the one or more computing devices, each respective intermediate representation into a decoder model configured to reconstruct inputs based on intermediate representations; receiving, by the one or more computing devices, a respective reconstructed input for each of the plurality of example inputs as an output of the decoder model; and learning, by the one or more computing devices, one or more parameter values for the one or more intermediate layers of the projection neural network based at least in part on a comparison of each respective reconstructed input to the corresponding example input. 17. The computer-implemented method of claim 16 , wherein learning, by the one or more computing devices, the one or more parameter values for the one or more inter
using statistical methods · CPC title
using neural networks · CPC title
Natural language generation · CPC title
Probabilistic or stochastic networks · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.