System and method for optimizing indirect encodings in the learning of mappings
US-10776691-B1 · Sep 15, 2020 · US
US11704682B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11704682-B2 |
| Application number | US-201715642038-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 5, 2017 |
| Priority date | Jul 6, 2016 |
| Publication date | Jul 18, 2023 |
| Grant date | Jul 18, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for pre-processing data to facilitate efficient and accurate machine learning are provided. The data may include market data. The pre-processing may include partitioning the data into windows assigning categories to windows generate a series of vectors. The series of vectors then being input into a computer system that executes a machine learning algorithm to efficiently train a neural network used to identify structure or patterns therein.
Opening claim text (preview).
The invention claimed is: 1. A computer system comprising: a processor; a tangible computer-readable medium containing computer-executable instructions that when executed by the processor cause the computer system to pre-process a collection of raw market data for use by a machine learning computer by performing the steps comprising: (a) receiving, from a client computer via an electronic communication network, a collection of raw market data that includes time stamps, price levels and order quantities, the collection of raw market data characterized by a first size; (b) determining, for each time stamp, a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp; (c) partitioning the collection of raw market data into a sequence of time period windows, comparing order quantities prior to a time period window to order quantities within the time period window, and determining quantiles for changes in order quantities; (d) dividing the determined differences into predefined portions, each of which is characterized by one of a plurality of categories, each category being assigned to the time period window in accordance with the division of the determined differences and the determined quantiles; (e) generating a new pre-processed data set comprising the sequence of time period windows, each of which includes a multi-dimensional one-hot binary vector encoding of the plurality of categories representative of each price level and time stamp therein, the new pre-processed data set characterized by a second size less than the first size; (f) transmitting the new pre-processed data set as input to a computer system that executes a machine learning algorithm, wherein the execution of the machine learning algorithm includes training a recurrent neural network to identify structure in the pre-processed data and executing a lossy encoded compression to compress the sequence of time period windows to provide a feature mapping from the sequence of time period windows to a feature space, wherein the lossy encoded compression of the sequence removes noise from the sequence of time period windows while retaining the unique features of the feature space; and (g) outputting the compressed sequence of time period windows to a display for user interaction. 2. The computer system of claim 1 , wherein (c) comprises: selecting a length of the time period windows to reveal patterns in market data. 3. The computer system of claim 1 , wherein (c) comprises: selecting a length of the time period windows to reveal structures in market data. 4. The computer system of claim 1 , wherein (d) further comprises: classifying changes in order quantities that are large and small increases and decreases. 5. The computer system of claim 4 , wherein (d) further comprises: analyzing order quantity changes over multiple windows. 6. The computer system of claim 1 , wherein (d) further comprises: assigning a category for each time stamp within a time period window in accordance with the divisions determined in (d). 7. The computer system of claim 6 , wherein the categories comprise: large increase in ask order quantity, small increase/decrease in ask order quantity, large decrease in ask order quantity, no order quantity, large decrease in bid order quantity, small increase/decrease in bid order quantity and large increase in bid order quantity, and wherein the categories are represented as a 7-dimensional one hot-binary vector. 8. A computer implemented method comprising: (a) receiving, from a client computer via an electronic communication network by a processor of a computer system, a collection of raw market data that includes time stamps, price levels and order quantities, the collection of raw market data characterized by a first size, wherein the processor is coupled with a tangible computer-readable medium containing computer executable instructions executed by the processor to pre-process the collection of raw market data for use by a machine learning computer; (b) determining, by the processor, for each time stamp a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp; (c) partitioning, by the processor, the collection of raw market data into a sequence of time period windows, comparing order quantities prior to a time period window to order quantities within the time period window, and determining quantiles for changes in order quantities; and (d) dividing, by the processor, the determined differences into predefined portions, each of which is characterized by one of a plurality of categories, each category being assigned to the time period window in accordance with the division of the determined differences and the determined quantiles; (e) generating, by the processor, a new pre-processed data set comprising the sequence of time period windows, each of which includes a multi-dimensional one-hot binary vector encoding of the plurality of categories representative of each price level and time stamp therein, the new pre-processed data set characterized by a second size less than the first size; (f) transmitting, by the processor, the new pre-processed data set as input to a computer system that executes a machine learning algorithm, wherein the execution of the machine learning algorithm includes training a recurrent neural network to identify structure in the pre-processed data set and executing a lossy encoded compression to compress the sequence of time period windows to provide a feature mapping from the sequence of time period windows to a feature space, wherein the lossy encoded compression of the sequence removes noise from the sequence of time period windows while retaining the unique features of the feature space; and (g) outputting, by the processor, the compressed sequence of time period windows to a display for user interaction. 9. The computer implemented method of claim 8 , wherein (d) further comprises: classifying, by the processor, changes in order quantities that are large and small increases and decreases. 10. The computer implemented method of claim 8 , wherein (d) further comprises: assigning, by the processor, a category for each time stamp within a time period window in accordance with the divisions determined in (d), and wherein the categories comprise: large increase in ask order quantity, small increase/decrease in ask order quantity, large decrease in ask order quantity, no order quantity, large decrease in bid order quantity, small increase/decrease in bid order quantity and large increase in bid order quantity, and wherein the categories are represented as a 7-dimensional one hot-binary vector.
Learning methods · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Market modelling; Market analysis; Collecting market data · CPC title
Clustering or classification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.