Model generation for real-time rate of penetration prediction
US-2018025269-A1 · Jan 25, 2018 · US
US11615310B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11615310-B2 |
| Application number | US-201716302592-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 19, 2017 |
| Priority date | May 20, 2016 |
| Publication date | Mar 28, 2023 |
| Grant date | Mar 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.
Opening claim text (preview).
What is claimed is: 1. A method implemented by one or more computers, comprising: obtaining a machine learning model, wherein (i) the machine learning model has a plurality of model parameters, and (ii) the machine learning model is trained using gradient descent techniques to optimize an objective function; and for each time step in a plurality of time steps: determining an update rule for the plurality of model parameters for the time step using a recurrent neural network (RNN) having a plurality of RNN parameters, wherein the RNN is different from the machine learning model and the RNN parameters are different from the plurality of model parameters, wherein the RNN is configured to operate coordinate-wise with respect to the plurality of model parameters, wherein operating coordinate-wise with respect to the plurality of model parameters comprises operating the RNN independently on each of the plurality of model parameters of the machine learning model, and wherein the determining comprises: for each particular model parameter of the plurality of model parameters, processing, using the RNN and in accordance with values of the RNN parameters for the time step, a parameter-specific input that is specific for the particular model parameter of the plurality of model parameters for the time step that comprises a gradient of the objective function with respect to the particular model parameter for the time step to generate a respective RNN output for the particular model parameter for the time step that specifies the update rule for the particular model parameter of the plurality of model parameters for the time step, wherein the RNN shares one or more of the plurality of RNN parameters across the plurality of model parameters and maintains a separate hidden state for each particular model parameter of the plurality of model parameters; applying the update rule for the time step generated by the RNN to values of the plurality of model parameters for the time step to update the values of the model parameters; and training the RNN on an RNN objective function that depends on respective values of the plurality of model parameters that have been at the time step and at each of one or more preceding time steps in the plurality of time steps, comprising determining an update to the values of the RNN parameters at the time step that minimizes the RNN objective function for the time step using gradient descent techniques. 2. The method of claim 1 , wherein applying the update rule for a final time step in the plurality of time steps to the plurality of model parameters generates trained values of the plurality of model parameters. 3. The method of claim 1 , wherein the machine learning model comprises a neural network. 4. The method of claim 1 , wherein the determined update rule for the plurality of model parameters that minimizes the objective function is given by θ t+1 =θ t +g t (∇ f (θ t ),ϕ) wherein θ t represents values of the plurality of model parameters at time t, ∇f(θ t ) represents the gradient of objective function ƒ, ϕ represents RNN parameters and g t represents the RNN output for a time step t. 5. The method of claim 1 , wherein the RNN implements separate activations for each model parameter of the plurality of model parameters. 6. The method of claim 1 , wherein the RNN is a long short-term memory (LSTM) neural network. 7. The method of claim 6 , wherein the LSTM neural network comprises two LSTM layers. 8. The method of claim 6 , wherein the LSTM neural network shares one or more of the plurality of RNN parameters across different coordinates of the objective function. 9. The method of claim 6 , wherein a subset of cells in each of one or more LSTM layers of the LSTM neural network comprise global average units, wherein a global average unit is a unit whose update includes averaging activations of the global average units globally at each time step across different coordinates of the objective function. 10. The method of claim 1 , wherein the RNN is invariant to an order of the plurality of model parameters. 11. The method of claim 1 , further comprising providing a previous hidden state of the RNN as input to the RNN at each time step. 12. The method of claim 1 , wherein, at each time step, the update rule for the time step depends on the hidden state of the RNN for the time step. 13. The method of claim 1 , wherein the RNN objective function is given by ℒ ( ϕ ) = E f [ ∑ t = 1 T w t f ( θ t ) ] where θ t + 1 = θ t + g t , [ g t h t + 1 ] = m ( ∇ t , h t ,
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.