A crawler of web automation scripts
US-2023095006-A1 · Mar 30, 2023 · US
US12067068B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-12067068-B1 |
| Application number | US-202318309512-A |
| Country | US |
| Kind code | B1 |
| Filing date | Apr 28, 2023 |
| Priority date | Apr 28, 2023 |
| Publication date | Aug 20, 2024 |
| Grant date | Aug 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides techniques for data retrieval using machine learning. One example method includes receiving a plurality of training episodes associated with different environments, wherein each training episode of the plurality of training episodes includes a sequence of states, computing, based on the plurality of training episodes, total counts of a plurality of values in the states, initializing, for each state of the sequence of states in each training episode of the plurality of training episodes, a reward based on the total counts of the plurality of values, and training a reinforcement learning agent using the rewards.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving, at a dictionary builder, a plurality of training episodes associated with different environments, wherein each training episode of the plurality of training episodes includes a sequence of states; identifying, by the dictionary builder, a set of dictionary values in the training episodes, the set of dictionary values being specified based on a task and being predetermined for the task; computing, based on the plurality of training episodes, for each dictionary value of the set of dictionary values, a total count of the dictionary value in all states of the plurality of training episodes; initializing a reward for each state in the sequence of states based on the dictionary values present in the state and, for each dictionary value from the set of dictionary values present in the state, a total count of the dictionary value in the sequence of states; and training a reinforcement learning agent to retrieve requested data by providing the training episodes and the initialized reward for each state of the sequence of states as inputs to a reinforcement learning framework, wherein the training comprises: using a reward function to learn intermediate rewards for intermediate states of the sequence of states based on the initialized reward for each state of the sequence of states, wherein the reward function is smoothed using a filter function; and suppressing one or more actions available to perform in the state based on reward values determined using the reward function for the one or more actions such that the one or more actions are ignored for purposes of reward calculation. 2. The method of claim 1 , further comprising encoding each state of the sequence of states in each training episode using a neural network. 3. The method of claim 2 , wherein the neural network comprises one or more of a convolutional neural network (CNN) or Bidirectional Encoder Representations from Transformers (BERT). 4. The method of claim 1 , wherein each dictionary value of the set of dictionary values comprises a keyword and a count of the keyword in one or more training episodes. 5. The method of claim 4 , wherein the keyword comprises a transaction code associated with a transaction or a selection from: “welcome,” “login,” “transaction,” or “status”. 6. A system, comprising: a memory including computer executable instructions; and a processor configured to execute the computer executable instructions and cause the system to: receive a plurality of training episodes associated with different environments, wherein each training episode of the plurality of training episodes includes a sequence of states; compute, based on the plurality of training episodes, total counts of a plurality of values in the states corresponding to a plurality of dictionary values of a dictionary; initialize, for each state of the sequence of states in each training episode of the plurality of training episodes, a reward based on the dictionary and on the total counts of the plurality of dictionary values; and train a reinforcement learning agent using the rewards; receive, at a dictionary builder, a plurality of training episodes associated with different environments, wherein each training episode of the plurality of training episodes includes a sequence of states; identify, by the dictionary builder, a set of dictionary values in the training episodes, the set of dictionary values being specified based on a task and being predetermined for the task; compute, based on the plurality of training episodes, for each dictionary value of the set of dictionary values, a total count of the dictionary value in all states of the plurality of training episodes; initialize a reward for each state in the sequence of states based on the dictionary values present in the state and, for each dictionary value from the set of dictionary values present in the state, a total count of the dictionary value in the sequence of states; and train a reinforcement learning agent to retrieve requested data using by providing the training episodes and the initialized reward for each state of the sequence of states as inputs to a reinforcement learning framework, wherein the training comprises: using a reward function to learn intermediate rewards for intermediate states of the sequence of states based on the initialized reward for each state of the sequence of states, wherein the reward function is smoothed using a filter function; and suppressing one or more actions available to perform in the state based on reward values determined using the reward function for the one or more actions such that the one or more actions are ignored for purposes of reward calculation. 7. The system of claim 6 , wherein the computer executable instructions further cause the system to encode each state of the sequence of states in each training episode using a neural network. 8. The system of claim 7 , wherein the neural network comprises one or more of a convolutional neural network (CNN) or Bidirectional Encoder Representations from Transformers (BERT). 9. A non-transitory computer readable medium comprising instructions to be executed in a computer system, wherein the instructions when executed in the computer system perform a method on a computing device, comprising: receiving, at a dictionary builder, a plurality of training episodes associated with different environments, wherein each training episode of the plurality of training episodes includes a sequence of states; identifying, by the dictionary builder, a set of dictionary values in the training episodes, the set of dictionary values being specified based on a task and being predetermined for the task; computing, based on the plurality of training episodes, for each dictionary value of the set of dictionary values, a total count of the dictionary value in all states of the plurality of training episodes; initializing a reward for each state in the sequence of states based on the dictionary values present in the state and, for each dictionary value from the set of dictionary values present in the state, a total count of the dictionary value in the sequence of states; and training a reinforcement learning agent to retrieve requested data by providing the training episodes and the initialized reward for each state of the sequence of states as inputs to a reinforcement learning framework, wherein the training comprises: using a reward function to learn intermediate rewards for intermediate states of the sequence of states based on the initialized reward for each state of the sequence of states, wherein the reward function is smoothed using a filter function; and suppressing one or more actions available to perform in the state based on reward values determined using the reward function for the one or more actions such that the one or more actions are ignored for purposes of reward calculation. 10. The non-transitory computer readable medium of claim 9 , wherein the method further comprises encoding each state of the sequence of states in each training episode using a neural network. 11. The non-transitory computer readable medium of claim 10 , wherein the neural network comprises one or more of a convolutional neural network (CNN) or Bidirectional Encoder Representations from Transformers (BERT).
Navigation, e.g. using categorised browsing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.