Pre-processing financial market data prior to machine learning training

US11704682B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11704682-B2
Application numberUS-201715642038-A
CountryUS
Kind codeB2
Filing dateJul 5, 2017
Priority dateJul 6, 2016
Publication dateJul 18, 2023
Grant dateJul 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for pre-processing data to facilitate efficient and accurate machine learning are provided. The data may include market data. The pre-processing may include partitioning the data into windows assigning categories to windows generate a series of vectors. The series of vectors then being input into a computer system that executes a machine learning algorithm to efficiently train a neural network used to identify structure or patterns therein.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer system comprising: a processor; a tangible computer-readable medium containing computer-executable instructions that when executed by the processor cause the computer system to pre-process a collection of raw market data for use by a machine learning computer by performing the steps comprising: (a) receiving, from a client computer via an electronic communication network, a collection of raw market data that includes time stamps, price levels and order quantities, the collection of raw market data characterized by a first size; (b) determining, for each time stamp, a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp; (c) partitioning the collection of raw market data into a sequence of time period windows, comparing order quantities prior to a time period window to order quantities within the time period window, and determining quantiles for changes in order quantities; (d) dividing the determined differences into predefined portions, each of which is characterized by one of a plurality of categories, each category being assigned to the time period window in accordance with the division of the determined differences and the determined quantiles; (e) generating a new pre-processed data set comprising the sequence of time period windows, each of which includes a multi-dimensional one-hot binary vector encoding of the plurality of categories representative of each price level and time stamp therein, the new pre-processed data set characterized by a second size less than the first size; (f) transmitting the new pre-processed data set as input to a computer system that executes a machine learning algorithm, wherein the execution of the machine learning algorithm includes training a recurrent neural network to identify structure in the pre-processed data and executing a lossy encoded compression to compress the sequence of time period windows to provide a feature mapping from the sequence of time period windows to a feature space, wherein the lossy encoded compression of the sequence removes noise from the sequence of time period windows while retaining the unique features of the feature space; and (g) outputting the compressed sequence of time period windows to a display for user interaction. 2. The computer system of claim 1 , wherein (c) comprises: selecting a length of the time period windows to reveal patterns in market data. 3. The computer system of claim 1 , wherein (c) comprises: selecting a length of the time period windows to reveal structures in market data. 4. The computer system of claim 1 , wherein (d) further comprises: classifying changes in order quantities that are large and small increases and decreases. 5. The computer system of claim 4 , wherein (d) further comprises: analyzing order quantity changes over multiple windows. 6. The computer system of claim 1 , wherein (d) further comprises: assigning a category for each time stamp within a time period window in accordance with the divisions determined in (d). 7. The computer system of claim 6 , wherein the categories comprise: large increase in ask order quantity, small increase/decrease in ask order quantity, large decrease in ask order quantity, no order quantity, large decrease in bid order quantity, small increase/decrease in bid order quantity and large increase in bid order quantity, and wherein the categories are represented as a 7-dimensional one hot-binary vector. 8. A computer implemented method comprising: (a) receiving, from a client computer via an electronic communication network by a processor of a computer system, a collection of raw market data that includes time stamps, price levels and order quantities, the collection of raw market data characterized by a first size, wherein the processor is coupled with a tangible computer-readable medium containing computer executable instructions executed by the processor to pre-process the collection of raw market data for use by a machine learning computer; (b) determining, by the processor, for each time stamp a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp; (c) partitioning, by the processor, the collection of raw market data into a sequence of time period windows, comparing order quantities prior to a time period window to order quantities within the time period window, and determining quantiles for changes in order quantities; and (d) dividing, by the processor, the determined differences into predefined portions, each of which is characterized by one of a plurality of categories, each category being assigned to the time period window in accordance with the division of the determined differences and the determined quantiles; (e) generating, by the processor, a new pre-processed data set comprising the sequence of time period windows, each of which includes a multi-dimensional one-hot binary vector encoding of the plurality of categories representative of each price level and time stamp therein, the new pre-processed data set characterized by a second size less than the first size; (f) transmitting, by the processor, the new pre-processed data set as input to a computer system that executes a machine learning algorithm, wherein the execution of the machine learning algorithm includes training a recurrent neural network to identify structure in the pre-processed data set and executing a lossy encoded compression to compress the sequence of time period windows to provide a feature mapping from the sequence of time period windows to a feature space, wherein the lossy encoded compression of the sequence removes noise from the sequence of time period windows while retaining the unique features of the feature space; and (g) outputting, by the processor, the compressed sequence of time period windows to a display for user interaction. 9. The computer implemented method of claim 8 , wherein (d) further comprises: classifying, by the processor, changes in order quantities that are large and small increases and decreases. 10. The computer implemented method of claim 8 , wherein (d) further comprises: assigning, by the processor, a category for each time stamp within a time period window in accordance with the divisions determined in (d), and wherein the categories comprise: large increase in ask order quantity, small increase/decrease in ask order quantity, large decrease in ask order quantity, no order quantity, large decrease in bid order quantity, small increase/decrease in bid order quantity and large increase in bid order quantity, and wherein the categories are represented as a 7-dimensional one hot-binary vector.

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

  • Clustering or classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11704682B2 cover?
Systems and methods for pre-processing data to facilitate efficient and accurate machine learning are provided. The data may include market data. The pre-processing may include partitioning the data into windows assigning categories to windows generate a series of vectors. The series of vectors then being input into a computer system that executes a machine learning algorithm to efficiently tra…
Who is the assignee on this patent?
Chicago Mercantile Exchange Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0201. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).