Quantum reinforcement learning agent

US12175342B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12175342-B2
Application numberUS-202318163907-A
CountryUS
Kind codeB2
Filing dateFeb 3, 2023
Priority dateAug 14, 2019
Publication dateDec 24, 2024
Grant dateDec 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, computer-implemented methods, and computer program products that can facilitate applying a reinforcement learning policy to available actions are described. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a state encoder that maps, based on one or more encoding parameters, a state of an environment on to one or more qubits of a quantum device. The system can further comprise a variational component that combines a reinforcement learning policy with a sampling of the one or more qubits, resulting, based on one or more variational parameters, in a probability distribution of a plurality of available actions at the state of the environment.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a memory that stores computer executable components; and a processor that executes at least one of the computer executable components that, iteratively over a period of time: encodes, based on encoding parameters, continuous variables describing a current state of a defined environment to span a plurality of qubits of a quantum device, resulting in a quantum state representation of the current state of the defined environment, wherein the encoding parameters parameterize possible states of the defined environment; maps, based on variational parameters, a plurality of available actions on to the quantum state representation of the current state of the defined environment, wherein the variational parameters represent a reinforcement learning policy; and determines, based on the variational parameters, using a sampling of a subset of qubits of the plurality of qubits, a probability distribution of the plurality of available actions at the current state of the defined environment represented in the quantum state representation. 2. The system of claim 1 , wherein the at least one of the computer executable components further, during respective iterations: selects an action of the plurality of available actions based on the probability distribution, resulting in a selected action of the plurality of actions. 3. The system of claim 2 , wherein the at least one of the computer executable components further, during respective iterations: maps the probability distribution on to a first number of qubits of the plurality of qubits; and selects the action by sampling a second number of qubits of the first number of qubits. 4. The system of claim 3 , wherein the second number is based on a third number of the plurality of available actions. 5. The system of claim 4 , wherein the second number is a log 2 of the third number. 6. The system of claim 2 , wherein the probability distribution of the plurality of available actions is based on respective cumulative rewards determined for ones of the plurality of available actions. 7. The system of claim 6 , wherein the at least one of the computer executable components further during respective iterations: updates, based on the respective cumulative reward determined for the selected action of the plurality of actions, at least one of the encoding parameters or the variational parameters. 8. The system of claim 1 , wherein the encoding comprises: receiving the continuous parameters corresponding to the current state of the defined environment; and mapping a continuous parameter of the continuous parameters on to a qubit of the plurality of qubits. 9. The system of claim 8 , wherein the encoding comprises: continuously receiving the continuous parameters, and continuously mapping the continuous parameter of the continuous parameters on to the qubit of the plurality of qubits. 10. The system of claim 1 , wherein the sampling of the subset of qubits comprises sampling the subset of qubits based on a quantum entanglement of a pair of qubits of the plurality of qubits. 11. A computer program product facilitating selecting actions based on a quantum reinforcement learning policy, the computer program product comprising a non-transitory computer readable medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to iteratively over a period of time: encode, based on encoding parameters, continuous variables describing a current state of a defined environment to span a plurality of qubits of a quantum device, resulting in a quantum state representation of the current state of the defined environment, wherein the encoding parameters parameterize possible states of the defined environment; map, based on variational parameters, a plurality of available actions on to the quantum state representation of the current state of the defined environment, wherein the variational parameters represent a reinforcement learning policy; determine, based on the variational parameters, using a sampling of a subset of qubits of the plurality of qubits a probability distribution of the plurality of available actions at the current state of the defined environments represented in the quantum state representation; and select an action of the plurality of available actions based on the probability distribution, resulting in a selected action. 12. The computer program product of claim 11 , wherein the program instructions are further executable by the processor to cause the processor to, during respective iterations: map the probability distribution on to a first number of qubits of the plurality of qubits. 13. The computer program product of claim 12 , wherein the selecting the action comprises selecting the action by sampling a second number of qubits of the first number of qubits. 14. The computer program product of claim 12 , wherein the second number is based on a third number of the plurality of available actions. 15. The computer program product of claim 14 , wherein the second number is a log 2 of the third number. 16. The computer program product of claim 11 , wherein the probability distribution of the plurality of available actions is based on respective cumulative rewards determined for ones of the plurality of available actions. 17. The computer program product of claim 16 , wherein the program instructions are further executable by the processor to cause the processor to, during respective iterations: evaluate, based on the reinforcement learning policy, the respective cumulative reward determined for the selected action of the plurality of actions. 18. The computer program product of claim 16 , wherein the program instructions are further executable by the processor to cause the processor to, during respective iterations: update, based on the respective cumulative reward determined for the selected action of the plurality of actions, at least one of the encoding parameters or the variational parameters. 19. The computer program product of claim 11 , wherein the encoding comprises: receiving the continuous parameters corresponding to the current state of the defined environment; and mapping a continuous parameter of the continuous parameters on to a qubit of the plurality of qubits. 20. The computer program product of claim 11 , wherein the sampling of the subset of qubits comprises sampling the subset of the plurality of qubits based on a quantum entanglement of a pair of qubits of the plurality of qubits.

Assignees

Inventors

Classifications

  • Quantum computing, i.e. information processing based on quantum-mechanical phenomena · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • G06N10/60Primary

    Quantum algorithms, e.g. based on quantum optimisation, quantum Fourier or Hadamard transforms · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12175342B2 cover?
Systems, computer-implemented methods, and computer program products that can facilitate applying a reinforcement learning policy to available actions are described. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can com…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).