Online partially rewarded learning

US11508480B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11508480-B2
Application numberUS-201916554344-A
CountryUS
Kind codeB2
Filing dateAug 28, 2019
Priority dateAug 28, 2019
Publication dateNov 22, 2022
Grant dateNov 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A feature vector characterizing a system to be analyzed via online partially rewarded machine learning is obtained. Based on the feature vector, a decision is made, via the machine learning, using an online policy. The system is observed for environmental feedback. In at least a first instance, wherein the observing indicates that the environmental feedback is available, the environmental feedback is obtained. In at least a second instance, wherein the observing indicates that the environmental feedback is missing, the environmental feedback is imputed via an online imputation method. the online policy is updated based on results of the obtained environmental feedback and the online imputation method. A decision is output based on the updated online policy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a feature vector characterizing a system to be analyzed via online partially rewarded machine learning; based on said feature vector, making a decision, via said machine learning, using an online policy; observing said system for environmental feedback; in at least a first instance, wherein said observing indicates that said environmental feedback is available, obtaining said environmental feedback; in at least a second instance, wherein said observing indicates that said environmental feedback is missing, imputing said environmental feedback via an online imputation method; updating said online policy based on results of said obtained environmental feedback and said online imputation method; and outputting a decision based on said updated online policy. 2. The method of claim 1 , wherein said system comprises a medical system conducting clinical trials. 3. The method of claim 1 , wherein said system comprises a human-machine dialog system. 4. The method of claim 1 , wherein said system comprises a medical diagnostic system. 5. The method of claim 1 , wherein said imputing and said making of said decision comprise applying a rewarded online graph convolutional network by updating weights of said online policy via graph convolutional network back-propagation. 6. The method of claim 1 , wherein said making of said decision comprises applying a linear upper confidence bound bandit and wherein said imputation comprises a bounded imputation. 7. The method of claim 1 , wherein: making said decision includes retrieving a graph convolutional network embedding of said feature vector and providing same to a linear upper confidence bound bandit to make said decision; imputing said environmental feedback via said online imputation method comprises applying said graph convolutional network; and said updating comprises updating said linear upper confidence bound bandit with said environmental feedback and updating said graph convolutional network with said environmental feedback and said results of said online imputation method. 8. A non-transitory computer readable medium comprising computer executable instructions which when executed by a computer cause the computer to perform a method of: obtaining a feature vector characterizing a system to be analyzed via online partially rewarded machine learning; based on said feature vector, making a decision, via said machine learning, using an online policy; observing said system for environmental feedback; in at least a first instance, wherein said observing indicates that said environmental feedback is available, obtaining said environmental feedback; in at least a second instance, wherein said observing indicates that said environmental feedback is missing, imputing said environmental feedback via an online imputation method; updating said online policy based on results of said obtained environmental feedback and said online imputation method; and outputting a decision based on said updated online policy. 9. The non-transitory computer readable medium of claim 8 , wherein said system comprises a medical system conducting clinical trials. 10. The non-transitory computer readable medium of claim 8 , wherein said system comprises a human-machine dialog system. 11. The non-transitory computer readable medium of claim 8 , wherein said system comprises a medical diagnostic system. 12. The non-transitory computer readable medium of claim 8 , wherein said imputing and said making of said decision comprise applying a rewarded online graph convolutional network by updating weights of said online policy via graph convolutional network back-propagation. 13. The non-transitory computer readable medium of claim 8 , wherein said making of said decision comprises applying a linear upper confidence bound bandit and wherein said imputation comprises a bounded imputation. 14. The non-transitory computer readable medium of claim 8 , wherein: making said decision includes retrieving a graph convolutional network embedding of said feature vector and providing same to a linear upper confidence bound bandit to make said decision; imputing said environmental feedback via said online imputation method comprises applying said graph convolutional network; and said updating comprises updating said linear upper confidence bound bandit with said environmental feedback and updating said graph convolutional network with said environmental feedback and said results of said online imputation method. 15. An apparatus comprising: a memory; and at least one processor, coupled to said memory, and operative to: obtain a feature vector characterizing a system to be analyzed via online partially rewarded machine learning; based on said feature vector, make a decision, via said machine learning, using an online policy; observe said system for environmental feedback; in at least a first instance, wherein said observing indicates that said environmental feedback is available, obtain said environmental feedback; in at least a second instance, wherein said observing indicates that said environmental feedback is missing, impute said environmental feedback via an online imputation method; update said online policy based on results of said obtained environmental feedback and said online imputation method; and output a decision based on said updated online policy. 16. The apparatus of claim 15 , wherein said system to be analyzed is selected from the group consisting of a medical system conducting clinical trials and a medical diagnostic system. 17. The apparatus of claim 15 , wherein said system comprises a human-machine dialog system. 18. The apparatus of claim 15 , wherein said imputing and said making of said decision comprise applying a rewarded online graph convolutional network by updating weights of said online policy via graph convolutional network back-propagation. 19. The apparatus of claim 15 , wherein said making of said decision comprises applying a linear upper confidence bound bandit and wherein said imputation comprises a bounded imputation. 20. The apparatus of claim 15 , wherein: making said decision includes retrieving a graph convolutional network embedding of said feature vector and providing same to a linear upper confidence bound bandit to make said decision; imputing said environmental feedback via said online imputation method comprises applying said graph convolutional network; and said updating comprises updating said linear upper confidence bound bandit with said environmental feedback and updating said graph convolutional network with said environmental feedback and said results of said online imputation method.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Correlation function computation {including computation of convolution operations (arithmetic circuits for sum of products per se, e.g. multiply-accumulators G06F7/5443; digital filters, e.g. FIR, IIR, adaptive filters H03H17/00)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11508480B2 cover?
A feature vector characterizing a system to be analyzed via online partially rewarded machine learning is obtained. Based on the feature vector, a decision is made, via the machine learning, using an online policy. The system is observed for environmental feedback. In at least a first instance, wherein the observing indicates that the environmental feedback is available, the environmental feedb…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).