Physical environment interaction with an equivariant policy

US12100198B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12100198-B2
Application numberUS-202017642451-A
CountryUS
Kind codeB2
Filing dateSep 8, 2020
Priority dateSep 11, 2019
Publication dateSep 24, 2024
Grant dateSep 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments are directed to a computer-implemented method of interacting with a physical environment according to a policy. The policy determines multiple action probabilities of respective actions based on an observable state of the physical environment. The policy includes a neural network parameterized by a set of parameters. The neural network determines the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural network to the final layer input. The final layer is applied by applying a linear combination of a set of equivariant base weight matrices to the final layer input. The base weight matrices are equivariant in the sense that, for a set of multiple predefined transformations of the final layer input, each transformation causes a corresponding predefined action permutation of the base weight matrix output for the final layer input.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of interacting with a physical environment according to a policy, the policy determining multiple action probabilities of respective actions based on an observable state of the physical environment, wherein the policy includes a neural network parameterized by a set of parameters, the neural network determining the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural network to the final layer input, the method comprising: accessing the set of parameters of the policy; obtaining base weight matrix data representing a set of base weight matrices for the final layer of the neural network, wherein, for a set of multiple predefined transformations of the final layer input, each transformation causes a corresponding predefined action permutation of a base weight matrix output for the final layer input; controlling the interaction with the physical environment by repeatedly: obtaining, from one or more sensors, sensor data indicative of the observable state of the physical environment; determining the action probabilities based on the observable state, including applying the final layer of the neural network by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; providing, to an actuator, actuator data causing the actuator to effect an action in the physical environment based on the determined action probabilities. 2. The method of claim 1 , wherein the sensor data includes an image of the physical environment. 3. The method of claim 2 , wherein a feature transformation corresponds to a rotation and/or a feature transformation corresponds to a reflection. 4. The method of claim 2 , wherein the sensor data additionally includes one or more additional sensor measurement values. 5. The method of claim 1 , wherein applying the final layer further includes applying a further linear combination of the set of base weight matrices to the final layer input, coefficients of the further linear combination being included in the set of parameters. 6. The method of claim 1 , wherein applying the final layer further includes applying a further linear combination of a further set of base weight matrices to the final layer input, wherein, for a further set of multiple predefined transformations of the final layer input, each transformation causes a corresponding further predefined action permutation of a further base weight matrix output for the final layer input. 7. The method of claim 1 , wherein a layer input of a layer of the neural network includes multiple feature vectors corresponding to respective transformations of the observable state, a feature of said layer input being determined by average pooling over feature vectors corresponding to translations of the observable state. 8. A computer-implemented method of configuring a system which interacts with a physical environment according to a policy using the method of claim 1 , including optimizing the set of parameters of the policy to maximize an expected reward of interacting with the environment according to the policy by repeatedly: obtaining interaction data indicative of a sequence of observed environment states and corresponding actions performed by the system; determining a reward of said interaction; determining an action probability, in an observed state of the sequence of observed environment states, of the policy selecting the corresponding action, including applying the final layer of the neural network by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; adjusting the set of parameters to increase the expected reward based on the determined reward and action probability. 9. The method of claim 8 , comprising obtaining the set of base weight matrices by determining the set of base weight matrices from the multiple predefined transformations and corresponding predefined action permutations. 10. The method of claim 9 , comprising determining a base weight matrix by obtaining an initial weight matrix, applying transformations and inverses of a corresponding action permutations to the initial weight matrix, and adding together said transformed and permuted initial weight matrices. 11. The method of claim 10 , further comprising orthogonalizing the set of determined base weight matrices. 12. A non-transitory computer-readable medium comprising transitory or non-transitory data representing instructions which, when executed by a processor system cause the processor system to perform the computer-implemented method according to claim 8 . 13. A training system for configuring a computer-controlled system which interacts with a physical environment according to a policy using the method of claim 1 , the training system comprising: a data interface for accessing the set of parameters of the policy and base weight matrix data representing a set of base weight matrices for the final layer of the neural network wherein, for a set of multiple predefined transformations of the final layer input, each transformation causes a corresponding predefined action permutation of a base weight matrix output for the final layer input; a processor subsystem configured to optimize the set of parameters of the policy to maximize an expected reward of interacting with the environment according to the policy by repeatedly: obtaining interaction data indicative of a sequence of observed environment states and corresponding actions performed by the computer-controlled system; determining a reward of said interaction; determining an action probability, in an observed state of the sequence of observed environment states, of the policy selecting the corresponding action, including applying the final layer of the neural network by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; adjusting the set of parameters to increase the expected reward based on the determined reward and action probability. 14. A non-transitory computer-readable medium comprising data representing one or more of: instructions which, when executed by a processor system, cause the processor system to perform the computer-implemented method according to claim 1 ; a set of parameters of a policy for interacting with a physical environment, the policy determining multiple action probabilities of respective actions based on an observable state of the physical environment, the policy including a neural network, the neural network determining the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural network to the final layer input, the final layer of the neural network being applied by applying a linear combination of the set of base weight matrices to the final layer input, coefficients of the linear combination being included in the set of parameters; base weight matrix data representing a set of base weight matrices for a policy for interacting with a physical environment, the policy determining multiple action probabilities of respective actions based on an observable state of the physical environment, the policy including a neural network, the neural network determining the action probabilities by determining a final layer input from an observable state and applying a final layer of the neural ne

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Reinforcement learning · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • Learning methods · CPC title

  • using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12100198B2 cover?
Some embodiments are directed to a computer-implemented method of interacting with a physical environment according to a policy. The policy determines multiple action probabilities of respective actions based on an observable state of the physical environment. The policy includes a neural network parameterized by a set of parameters. The neural network determines the action probabilities by det…
Who is the assignee on this patent?
Bosch Gmbh Robert, Koninklijke Philips Nv
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).