Methods for enhancing complete data extraction of dia data
US-2024428893-A1 · Dec 26, 2024 · US
US11308401B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11308401-B2 |
| Application number | US-201916263930-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 31, 2019 |
| Priority date | Jan 31, 2018 |
| Publication date | Apr 19, 2022 |
| Grant date | Apr 19, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and computer readable media directed to interactive reinforcement learning with dynamic reuse of prior knowledge are described in various embodiments. The interactive reinforcement learning is adapted for providing computer implemented systems for dynamic action selection based on confidence levels associated with demonstrator data or portions thereof.
Opening claim text (preview).
What is claimed is: 1. A system for biasing a machine learning architecture using one or more demonstrator data sets, the machine learning architecture for controlling one or more actions conducted by an agent in an environment which transitions between one or more states, the system comprising: a physical computer processor operating in conjunction with computer memory and computer storage, the processor configured to provide: a receiver configured to obtain one or more demonstrator data sets, each demonstrator data set including a data structure representing one or more state-action pairs observed in one or more interactions with the environment; a data storage configured to maintain, for each demonstrator data set or sub-portions thereof, one or more confidence data values, associated with at least one state of the one or more states; a supervised classifier for training using the one or more demonstrator data sets or sub-portions thereof; an action execution processor configured to generate control signals for executing an action associated with an action-source selected from at least one of the one or more demonstrator data sets based on the supervised classifier or an internal policy function maintained by the machine learning architecture, the selecting based at least upon the one or more confidence data values; and a state observer configured to monitor a new state resulting from the execution of the action and an associated reward outcome; and to update the internal policy function maintained by the machine learning architecture based at least on the observed reward outcome; wherein the one or more confidence data values are generated using a dynamic temporal difference confidence measurement; and wherein the dynamic temporal difference confidence measurement is based on the relation: C ( s )←(1− F (α)) XC ( s )+ F (α) X [ F ( r )+γ XC ( s ′)] where γ is a discount factor, r is a reward function, and α is an update parameter. 2. The system of claim 1 , wherein the state observer is configured to update at least one of the confidence data values of the one or more confidence data values based on the observed reward outcome. 3. The system of claim 1 , wherein the temporal difference confidence measurement includes a dynamic rate update function based on the relation: F ( α ) = α × max { 1 Σ i exp ( θ i T · x ) [ exp ( θ 1 T · x ) ) exp ( θ 2 T · x ) ) … exp ( θ i T · x ) ) ] } . 4. The system of claim 1 , wherein the temporal difference confidence measurement includes a dynamic confidence update function based on the relation: F ( r ) = r r_max × max { 1 Σ i exp
Related publications grouped by family.
Answers are generated from the same data shown on this page.