What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Online partially rewarded learning

US11508480B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11508480-B2
Application number	US-201916554344-A
Country	US
Kind code	B2
Filing date	Aug 28, 2019
Priority date	Aug 28, 2019
Publication date	Nov 22, 2022
Grant date	Nov 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A feature vector characterizing a system to be analyzed via online partially rewarded machine learning is obtained. Based on the feature vector, a decision is made, via the machine learning, using an online policy. The system is observed for environmental feedback. In at least a first instance, wherein the observing indicates that the environmental feedback is available, the environmental feedback is obtained. In at least a second instance, wherein the observing indicates that the environmental feedback is missing, the environmental feedback is imputed via an online imputation method. the online policy is updated based on results of the obtained environmental feedback and the online imputation method. A decision is output based on the updated online policy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a feature vector characterizing a system to be analyzed via online partially rewarded machine learning; based on said feature vector, making a decision, via said machine learning, using an online policy; observing said system for environmental feedback; in at least a first instance, wherein said observing indicates that said environmental feedback is available, obtaining said environmental feedback; in at least a second instance, wherein said observing indicates that said environmental feedback is missing, imputing said environmental feedback via an online imputation method; updating said online policy based on results of said obtained environmental feedback and said online imputation method; and outputting a decision based on said updated online policy. 2. The method of claim 1 , wherein said system comprises a medical system conducting clinical trials. 3. The method of claim 1 , wherein said system comprises a human-machine dialog system. 4. The method of claim 1 , wherein said system comprises a medical diagnostic system. 5. The method of claim 1 , wherein said imputing and said making of said decision comprise applying a rewarded online graph convolutional network by updating weights of said online policy via graph convolutional network back-propagation. 6. The method of claim 1 , wherein said making of said decision comprises applying a linear upper confidence bound bandit and wherein said imputation comprises a bounded imputation. 7. The method of claim 1 , wherein: making said decision includes retrieving a graph convolutional network embedding of said feature vector and providing same to a linear upper confidence bound bandit to make said decision; imputing said environmental feedback via said online imputation method comprises applying said graph convolutional network; and said updating comprises updating said linear upper confidence bound bandit with said environmental feedback and updating said graph convolutional network with said environmental feedback and said results of said online imputation method. 8. A non-transitory computer readable medium comprising computer executable instructions which when executed by a computer cause the computer to perform a method of: obtaining a feature vector characterizing a system to be analyzed via online partially rewarded machine learning; based on said feature vector, making a decision, via said machine learning, using an online policy; observing said system for environmental feedback; in at least a first instance, wherein said observing indicates that said environmental feedback is available, obtaining said environmental feedback; in at least a second instance, wherein said observing indicates that said environmental feedback is missing, imputing said environmental feedback via an online imputation method; updating said online policy based on results of said obtained environmental feedback and said online imputation method; and outputting a decision based on said updated online policy. 9. The non-transitory computer readable medium of claim 8 , wherein said system comprises a medical system conducting clinical trials. 10. The non-transitory computer readable medium of claim 8 , wherein said system comprises a human-machine dialog system. 11. The non-transitory computer readable medium of claim 8 , wherein said system comprises a medical diagnostic system. 12. The non-transitory computer readable medium of claim 8 , wherein said imputing and said making of said decision comprise applying a rewarded online graph convolutional network by updating weights of said online policy via graph convolutional network back-propagation. 13. The non-transitory computer readable medium of claim 8 , wherein said making of said decision comprises applying a linear upper confidence bound bandit and wherein said imputation comprises a bounded imputation. 14. The non-transitory computer readable medium of claim 8 , wherein: making said decision includes retrieving a graph convolutional network embedding of said feature vector and providing same to a linear upper confidence bound bandit to make said decision; imputing said environmental feedback via said online imputation method comprises applying said graph convolutional network; and said updating comprises updating said linear upper confidence bound bandit with said environmental feedback and updating said graph convolutional network with said environmental feedback and said results of said online imputation method. 15. An apparatus comprising: a memory; and at least one processor, coupled to said memory, and operative to: obtain a feature vector characterizing a system to be analyzed via online partially rewarded machine learning; based on said feature vector, make a decision, via said machine learning, using an online policy; observe said system for environmental feedback; in at least a first instance, wherein said observing indicates that said environmental feedback is available, obtain said environmental feedback; in at least a second instance, wherein said observing indicates that said environmental feedback is missing, impute said environmental feedback via an online imputation method; update said online policy based on results of said obtained environmental feedback and said online imputation method; and output a decision based on said updated online policy. 16. The apparatus of claim 15 , wherein said system to be analyzed is selected from the group consisting of a medical system conducting clinical trials and a medical diagnostic system. 17. The apparatus of claim 15 , wherein said system comprises a human-machine dialog system. 18. The apparatus of claim 15 , wherein said imputing and said making of said decision comprise applying a rewarded online graph convolutional network by updating weights of said online policy via graph convolutional network back-propagation. 19. The apparatus of claim 15 , wherein said making of said decision comprises applying a linear upper confidence bound bandit and wherein said imputation comprises a bounded imputation. 20. The apparatus of claim 15 , wherein: making said decision includes retrieving a graph convolutional network embedding of said feature vector and providing same to a linear upper confidence bound bandit to make said decision; imputing said environmental feedback via said online imputation method comprises applying said graph convolutional network; and said updating comprises updating said linear upper confidence bound bandit with said environmental feedback and updating said graph convolutional network with said environmental feedback and said results of said online imputation method.

Assignees

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/006
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title
G06F17/15
Correlation function computation {including computation of convolution operations (arithmetic circuits for sum of products per se, e.g. multiply-accumulators G06F7/5443; digital filters, e.g. FIR, IIR, adaptive filters H03H17/00)} · CPC title

Patent family

Related publications grouped by family.

View patent family 74681892

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11508480B2 cover?: A feature vector characterizing a system to be analyzed via online partially rewarded machine learning is obtained. Based on the feature vector, a decision is made, via the machine learning, using an online policy. The system is observed for environmental feedback. In at least a first instance, wherein the observing indicates that the environmental feedback is available, the environmental feedb…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

High Purity Distillation Process Control

Machine learning system to detect, label, and spread heat in a graph structure

Contextual memory bandit for proactive dialogs

Cooperative neural network deep reinforcement learning with partial input assistance

Frequently asked questions