Crop yield estimation using agronomic neural network
US-2018211156-A1 · Jul 26, 2018 · US
US12100198B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12100198-B2 |
| Application number | US-202017642451-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 8, 2020 |
| Priority date | Sep 11, 2019 |
| Publication date | Sep 24, 2024 |
| Grant date | Sep 24, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some embodiments are directed to a computer-implemented method of interacting with a physical environment according to a policy. The policy determines multiple action probabilities of respective actions based on an observable state of the physical environment. The policy includes a neural network parameterized by a set of parameters. The neural network determines the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural network to the final layer input. The final layer is applied by applying a linear combination of a set of equivariant base weight matrices to the final layer input. The base weight matrices are equivariant in the sense that, for a set of multiple predefined transformations of the final layer input, each transformation causes a corresponding predefined action permutation of the base weight matrix output for the final layer input.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method of interacting with a physical environment according to a policy, the policy determining multiple action probabilities of respective actions based on an observable state of the physical environment, wherein the policy includes a neural network parameterized by a set of parameters, the neural network determining the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural network to the final layer input, the method comprising: accessing the set of parameters of the policy; obtaining base weight matrix data representing a set of base weight matrices for the final layer of the neural network, wherein, for a set of multiple predefined transformations of the final layer input, each transformation causes a corresponding predefined action permutation of a base weight matrix output for the final layer input; controlling the interaction with the physical environment by repeatedly: obtaining, from one or more sensors, sensor data indicative of the observable state of the physical environment; determining the action probabilities based on the observable state, including applying the final layer of the neural network by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; providing, to an actuator, actuator data causing the actuator to effect an action in the physical environment based on the determined action probabilities. 2. The method of claim 1 , wherein the sensor data includes an image of the physical environment. 3. The method of claim 2 , wherein a feature transformation corresponds to a rotation and/or a feature transformation corresponds to a reflection. 4. The method of claim 2 , wherein the sensor data additionally includes one or more additional sensor measurement values. 5. The method of claim 1 , wherein applying the final layer further includes applying a further linear combination of the set of base weight matrices to the final layer input, coefficients of the further linear combination being included in the set of parameters. 6. The method of claim 1 , wherein applying the final layer further includes applying a further linear combination of a further set of base weight matrices to the final layer input, wherein, for a further set of multiple predefined transformations of the final layer input, each transformation causes a corresponding further predefined action permutation of a further base weight matrix output for the final layer input. 7. The method of claim 1 , wherein a layer input of a layer of the neural network includes multiple feature vectors corresponding to respective transformations of the observable state, a feature of said layer input being determined by average pooling over feature vectors corresponding to translations of the observable state. 8. A computer-implemented method of configuring a system which interacts with a physical environment according to a policy using the method of claim 1 , including optimizing the set of parameters of the policy to maximize an expected reward of interacting with the environment according to the policy by repeatedly: obtaining interaction data indicative of a sequence of observed environment states and corresponding actions performed by the system; determining a reward of said interaction; determining an action probability, in an observed state of the sequence of observed environment states, of the policy selecting the corresponding action, including applying the final layer of the neural network by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; adjusting the set of parameters to increase the expected reward based on the determined reward and action probability. 9. The method of claim 8 , comprising obtaining the set of base weight matrices by determining the set of base weight matrices from the multiple predefined transformations and corresponding predefined action permutations. 10. The method of claim 9 , comprising determining a base weight matrix by obtaining an initial weight matrix, applying transformations and inverses of a corresponding action permutations to the initial weight matrix, and adding together said transformed and permuted initial weight matrices. 11. The method of claim 10 , further comprising orthogonalizing the set of determined base weight matrices. 12. A non-transitory computer-readable medium comprising transitory or non-transitory data representing instructions which, when executed by a processor system cause the processor system to perform the computer-implemented method according to claim 8 . 13. A training system for configuring a computer-controlled system which interacts with a physical environment according to a policy using the method of claim 1 , the training system comprising: a data interface for accessing the set of parameters of the policy and base weight matrix data representing a set of base weight matrices for the final layer of the neural network wherein, for a set of multiple predefined transformations of the final layer input, each transformation causes a corresponding predefined action permutation of a base weight matrix output for the final layer input; a processor subsystem configured to optimize the set of parameters of the policy to maximize an expected reward of interacting with the environment according to the policy by repeatedly: obtaining interaction data indicative of a sequence of observed environment states and corresponding actions performed by the computer-controlled system; determining a reward of said interaction; determining an action probability, in an observed state of the sequence of observed environment states, of the policy selecting the corresponding action, including applying the final layer of the neural network by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; adjusting the set of parameters to increase the expected reward based on the determined reward and action probability. 14. A non-transitory computer-readable medium comprising data representing one or more of: instructions which, when executed by a processor system, cause the processor system to perform the computer-implemented method according to claim 1 ; a set of parameters of a policy for interacting with a physical environment, the policy determining multiple action probabilities of respective actions based on an observable state of the physical environment, the policy including a neural network, the neural network determining the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural network to the final layer input, the final layer of the neural network being applied by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; base weight matrix data representing a set of base weight matrices for a policy for interacting with a physical environment, the policy determining multiple action probabilities of respective actions based on an observable state of the physical environment, the policy including a neural network, the neural network determining the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural ne
Convolutional networks [CNN, ConvNet] · CPC title
Reinforcement learning · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Learning methods · CPC title
using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.