Deep Learning for Market by Order Data

ABSTRACT Market by order (MBO) data – a detailed feed of individual trade instructions for a given stock on an exchange – is arguably one of the most granular sources of microstructure information. While limit order books (LOBs) are implicitly derived from it, MBO data is largely neglected by current academic literature, which focuses primarily on LOB modelling. In this paper, we demonstrate the utility of MBO data for forecasting high-frequency price movements, providing an orthogonal source of information to LOB snapshots and expanding the universe of alpha discovery. We provide the first predictive analysis on MBO data by carefully introducing the data structure and presenting a specific normalization scheme to consider level information in order books and to allow model training with multiple instruments. Through forecasting experiments using deep neural networks, we show that while MBO-driven and LOB-driven models individually provide similar performance, ensembles of the two can lead to improvements in forecasting accuracy – indicating that MBO data is additive to LOB-based features.


Introduction
High-frequency microstructure data has received growing attention both in academia and industry with the computerisation of financial exchanges and the increase capacity of data storage.The detailed records of order flow and price dynamics provide us with a granular description of short-term supply and demand, and we can take the dynamics of order books into account during the modelling process.Propelled by the publication of the benchmark dataset (Ntakaris et al. 2018) of high-frequency limit order book (LOB) data, there has been a growing interest in research studying LOB data.Recent works by Tsantekidis et al. (2017); Sirignano and Cont (2019); Zhang, Zohren, and Roberts (2019a); Briola, Turiel, and Aste (2020) demonstrate that strong predictive performance can be obtained from modelling high-frequency LOB data and with resulting predictions finding applications in market-making and trade execution which have short holding periods.
In this work, we introduce Market by Order (MBO) data for predictive modelling with deep learning algorithms.MBO data provides full resolution of the underlying market microstructure -with both LOB data and trade sequences being derived from it.Despite MBO data being the original raw data source, current literature on high-frequency predictive modelling focuses predominantly on LOBs and, to the best of our knowledge, MBO data has not been used for direct predicting modelling.We showcase that the usage of MBO data as an additional source information to LOB improves predictive performance and MBO data could inspire a range of meaningful features that are related to individual order positions.
A LOB is a record of all outstanding limit orders (passive orders) for an instrument at a given time point and it is sorted into different levels based on submitted prices.At each price level, a LOB only shows the total available quantity.However, any given price level actually consists of many individual orders with different sizes.MBO data is essentially a message-base data feed that allows us to infer the individual queue position for each individual order by reconstructing the order book step by step.A detailed description of MBO data and how it relates to LOB data is presented in Section 3.
We propose a deep learning model based on MBO data, and in particular, a classification framework is adopted to predict stock price movements.In doing so, we provide a complete analysis of MBO by carefully introducing the data structure and the components of the message-base data feed.A specific data normalisation scheme is introduced to model level information contained in LOBs and to allow model training with multiple instruments.Our dataset consists of MBO data over a period of one year for five highly liquid instruments from the London Stock Exchange.Our testing set contains millions of samples to verify the robustness and generalisation of the results.
In our proposed models, we apply deep learning architectures including LSTMs (Hochreiter and Schmidhuber 1997) and Attention mechanisms (Bahdanau, Cho, and Bengio 2014) to model the dynamics of MBO data for market predictions.Our experiments show consistent and robust results from MBO data that are comparable to models that utilise derived LOB data.We observe that predictive models based on MBO data are complementary to LOB models and we propose an ensemble approach which yields superior results.As such, we observe that MBO data adds diversification to the LOB model and improves prediction performance.
The remainder of the paper is organised as follows.After a short literature review in Section 2, we proceed in Section 3 by introducing MBO data, including data preprocessing, normalisation and labelling.Section 4 presents deep learning architectures.We next describe our experiments and present the results of predicting market movements from MBO data in Section 5. We conclude our findings and discuss promising future extensions in Section 6.

Literature
Research on the high-frequency microstructure data remains largely focused on modelling the limit order book (LOB), where the classical works are referred to O'Hara (1995); Harris (2003) and a review is presented in Gould et al. (2013).However, there is limited work on MBO data in the current literature.NASDAQ (OUCH 2020) and CME Group (CME 2020) provide a preliminary description on MBO data for introducing their exchange match engines, and the works of Byrd, Hybinette, and Balch (2019); Belcak, Calliess, and Zohren (2020) use MBO data for market simulation to model trading scenarios or to study latency effects.To the best of our knowledge, this paper is the first to use MBO data to predict market movements, filling in this literature gap by using deep learning models.
Deep Learning (Goodfellow et al. 2016) algorithms have been heavily used for predicting high-frequency microstructure data (Tsantekidis et al. 2017;Sirignano and Cont 2019;Briola, Turiel, and Aste 2020;Wallbridge 2020).In particular, Zhang, Zohren, andRoberts (2018, 2019a,b) apply convolutional neural networks and LSTMs to model the dynamics of LOB and demonstrate accuracy improvements over linear models.Unlike traditional time-series models (Mills and Mills 1991;Hamilton 2020) or stochastic models (Islam and Nguyen 2020) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing any specific assumptions on the input data.Our experiments also suggest that deep networks deliver better results than linear methods for modelling MBO data.
We investigate deep learning models, including LSTMs (Hochreiter and Schmidhuber 1997) and Attention (Bahdanau, Cho, and Bengio 2014), to model MBO data.Attention is used to solve the problem of diminishing performance with long input sequences by utilising information at each hidden state of a recurrent network (Bahdanau, Cho, and Bengio 2014;Dai et al. 2019), and it can be used for constructing multi-horizon forecasting models (Lim and Zohren 2020).Our experiment suggests that networks with a recurrent nature lead to good predictive results compared to the state-of-art networks trained with LOB data, suggesting the potential benefits of using MBO data as an additional data source.

Descriptions of Market by Order Data
In general, exchanges provide high-frequency microstructure data in three tiers, namely L1, L2 and L3, offering increasingly granular information and capabilities: • Level 1 (L1): L1 shows the price and quantity of the last executed trade and displays real time best bid and ask of an order book, also known as quote data; • Level 2 (L2): L2 data is more granular than L1 by showing bids and asks at deeper levels of an order book, and it is commonly referred as LOB data; • Level 3 (L3): L3 is essentially the MBO data introduced in this work and it provides even more information than L2 as it shows non-aggregated bids and asks placed by individual traders.
In this work, we focus on MBO data, which is essentially a message-base data feed that allows us to observe individual actions of market participants.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In what follows, we focus on the essential components of such messages ignoring certain auxiliary information.Table 1 shows an example of sequences of MBO data, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Type indicates the order type, here limit order (Type = 1) or market order (Type = 2); • Side indicates whether an order is buy (1) or sell (2); • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the matching engine will be able to identify and cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the size (i.e.number of stocks) of the instruction.
A LOB updates whenever there is a new message from the MBO data coming in and this process is illustrated in Figure 1, where we show how a MBO message affect a LOB.For example, if we look at the top of Figure 1, a new limit order (ID=46280) is added to the ask side of the order book with price at 70.04 and size of 7580.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviour.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market impact by knowing individual queue positions.

Data Preprocessing and Normalisation
We focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Figure 2 illustrates the process of data preprocessing and normalisation.In particular, we process the MBO data for an unique ID as: • Side and Price: Missing values correspond to updates and cancellations and we fill those with the corresponding values from the original order of that ID; • Size: Missing values correspond to full cancellation and we fill those with 0 to indicate that no shares are outstanding after the action; • Action: we change Action to have values -1, 0 and 1. -1 means cancelling an order, 0 means updating price or size for the existing order and 1 means adding a new order; • Change price and Change size: We add these two new features to calculate the difference between entries for the price and size of a specific ID to reflect the intention of adding or decreasing positions for the given order.10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market 4 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure Market by Order provides individual orders for all price levels (n 10).Each order is assigned an anonymous OrderID for identificatio for sorting.
Participant queue position can be determined by using the Order and proper sorting of book is done using PriorityID.PriorityID m modified or refreshed, OrderID is consistent for the life of the orde 3 Figure 1.A slice of a LOB at time t (Top).When a message LOB (Bottom) at time t+1.
Size time: t time: t + 1 A LOB updates whenever there is a new message is illustrated in Figure 1.In the example, a new lim is added to the ask side of the order book with price book updates its status and the new order is added LOB only shows the total available quantity at each us with extra information by showing individual does not directly indicate which price level the or scheme introduced in the next section allows us t not only obtain a smaller input space but also obta with LOB data.
In addition, the usage of MBO data increase understanding of order book dynamics without Although, we can access to unique order ID but sequentially by the exchange match engine (CME to the customer, which keeps identification confi where we sometimes only view limited price levels, entire order book with full-depth information.Suc confidence in posting large order size as they can b • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.

3
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we LOB (Bottom) at time t+1.
Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure 1.In the example, a new limit order (ID=462 is added to the ask side of the order book with price at 70.04 and size book updates its status and the new order is added to the right price LOB only shows the total available quantity at each price level but M us with extra information by showing individual behaviours.Alth does not directly indicate which price level the order is added to, scheme introduced in the next section allows us to consider this in not only obtain a smaller input space but also obtain relevant inform with LOB data. In addition, the usage of MBO data increases transparency a understanding of order book dynamics without disclosing custom Although, we can access to unique order ID but this number is g sequentially by the exchange match engine (CME 2020) and a priva to the customer, which keeps identification confidential.Further, where we sometimes only view limited price levels, MBO data allows entire order book with full-depth information.Such a granularity ca confidence in posting large order size as they can better evaluate th • ID shows the unique ID for order • Side indicates which side of the o bid side and 2 means ask side; • Action represents the specific in size for the existing order, 1 mea an existing order.If Action = 2, the match engine will cancel the • Price shows the price level of the • Size shows the position of the ins MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.Bid Ask

3
• Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.• Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market focus on MBO that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO for a single security, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is placed, where 0 means cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market tiny percentage of total order flow.Table 1 shows an example of a for a single security, where: • Time stamp records the time point when an instruction is • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is plac cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means up size for the existing order, 1 means adding a new order and an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure Market by Order provides individual orders for all price levels (n 10).Each order is assigned an anonymous OrderID for identificatio for sorting.
Participant queue position can be determined by using the Order and proper sorting of book is done using PriorityID.PriorityID m modified or refreshed, OrderID is consistent for the life of the orde 3 Figure 1.A slice of a LOB at time t (Top).When a message LOB (Bottom) at time t+1.
Size time: t time: t + 1 A LOB updates whenever there is a new message is illustrated in Figure 1.In the example, a new lim is added to the ask side of the order book with price book updates its status and the new order is added LOB only shows the total available quantity at eac us with extra information by showing individual does not directly indicate which price level the o scheme introduced in the next section allows us t not only obtain a smaller input space but also obt with LOB data.
In addition, the usage of MBO data increas understanding of order book dynamics without Although, we can access to unique order ID but sequentially by the exchange match engine (CME to the customer, which keeps identification confi where we sometimes only view limited price levels, entire order book with full-depth information.Suc confidence in posting large order size as they can b 4 results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.(Mills and Mills 1991;Hamilton 2020) or econometric models (Fama and French 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, andBengio 2014) andTransformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.only account for a tiny percentage of total order sequence of MBO data for a given instrument, w • Time stamp records the time point when • ID shows the unique ID for order identifica • Side indicates which side of the order boo bid side and 2 means ask side; • Action represents the specific instruction size for the existing order, 1 means adding an existing order.If Action = 2, the entrie the match engine will cancel the existing o • Price shows the price level of the instructi • Size shows the position of the instruction.2018-01-02 09:21:18.585446702  462805645  2018-01-02 09:21:20 • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure 1.In the example, a new limit order (ID=462 is added to the ask side of the order book with price at 70.04 and size book updates its status and the new order is added to the right price LOB only shows the total available quantity at each price level but M us with extra information by showing individual behaviours.Alth does not directly indicate which price level the order is added to, scheme introduced in the next section allows us to consider this in not only obtain a smaller input space but also obtain relevant inform with LOB data. In addition, the usage of MBO data increases transparency understanding of order book dynamics without disclosing custom Although, we can access to unique order ID but this number is g sequentially by the exchange match engine (CME 2020) and a priva to the customer, which keeps identification confidential.Further, where we sometimes only view limited price levels, MBO data allows entire order book with full-depth information.Such a granularity ca confidence in posting large order size as they can better evaluate th   (Mills and Mills 1991;Hamilton 2020) or econometric models (Fama and French 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, andBengio 2014) andTransformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market impact by knowing individual queue positions.

Data Preprocessing and Normalisation
Since the raw MBO data contains missing entries, we preprocess the data and then make the normalisation.Figure 2 illustrates the complete process of data preprocessing and normalisation.We process the MBO data for an unique ID as: • Side and Price: we fill in the missing entries with previous values; • Size: we fill in the missing entry as 0 to indicate that this order is cancelled; • Action: we change Action to have values -1, 0 and 1. -1 means cancelling an order, 0 means updating price or size for the existing order and 1 means adding a new order; • Change price and Change size: we add these two new features to calculate the di↵erence between entries for the price and size of a specific ID to reflect the intention of adding or decreasing positions for the given order.4 book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market impact by knowing individual queue positions.

Data Preprocessing and Normalisation
Since the raw MBO data contains missing entries, we preprocess the data and then make the normalisation.Figure 2 illustrates the complete process of data preprocessing and normalisation.We process the MBO data for an unique ID as: • Side and Price: we fill in the missing entries with previous values; • Size: we fill in the missing entry as 0 to indicate that this order is cancelled; • Action: we change Action to have values -1, 0 and 1. -1 means cancelling an order, 0 means updating price or size for the existing order and 1 means adding a new order; • Change price and Change size: we add these two new features to calculate the di↵erence between entries for the price and size of a specific ID to reflect the intention of adding or decreasing positions for the given order.

4
to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.MBO is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO for a single security, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is placed, where 0 means cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market MBO is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO for a single security, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is placed, where 0 means cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market MBO is a message-base data feed that provides individual queue p view price and quantity for each individual order.Essentially, it is a that describes the action of a specific trader at a given time poin focus on MBO that represents limit orders because market orders tiny percentage of total order flow.Table 1 shows an example of a for a single security, where: • Time stamp records the time point when an instruction is • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is plac cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means up size for the existing order, 1 means adding a new order and an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure Market by Order provides individual orders for all price levels (n 10).Each order is assigned an anonymous OrderID for identificatio for sorting.
Participant queue position can be determined by using the Order and proper sorting of book is done using PriorityID.PriorityID m modified or refreshed, OrderID is consistent for the life of the orde 3 Figure 1.A slice of a LOB at time t (Top).When a message LOB (Bottom) at time t+1.
Size time: t time: t + 1 A LOB updates whenever there is a new message is illustrated in Figure 1.In the example, a new lim is added to the ask side of the order book with price book updates its status and the new order is added LOB only shows the total available quantity at eac us with extra information by showing individual does not directly indicate which price level the o scheme introduced in the next section allows us t not only obtain a smaller input space but also obt with LOB data.
In addition, the usage of MBO data increas understanding of order book dynamics without Although, we can access to unique order ID but sequentially by the exchange match engine (CME to the customer, which keeps identification confi where we sometimes only view limited price levels, entire order book with full-depth information.Suc confidence in posting large order size as they can b 4 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, and Bengio 2014) and Transformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.MBO is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO for a single security, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is placed, where 0 means cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure 1.In the example, a new limit order (ID=462 is added to the ask side of the order book with price at 70.04 and size book updates its status and the new order is added to the right price LOB only shows the total available quantity at each price level but M us with extra information by showing individual behaviours.Alth does not directly indicate which price level the order is added to, scheme introduced in the next section allows us to consider this in not only obtain a smaller input space but also obtain relevant inform with LOB data. In addition, the usage of MBO data increases transparency understanding of order book dynamics without disclosing custom Although, we can access to unique order ID but this number is g sequentially by the exchange match engine (CME 2020) and a priva to the customer, which keeps identification confidential.Further, where we sometimes only view limited price levels, MBO data allows entire order book with full-depth information.Such a granularity ca confidence in posting large order size as they can better evaluate th 4 1993) that assume a parametric proces methods are able to capture arbitrary n assumption on the input data.2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market impact by knowing individual queue positions.

Data Preprocessing and Normalisation
Since the raw MBO data contains missing entries, we preprocess the data and then make the normalisation.Figure 2 illustrates the complete process of data preprocessing and normalisation.We process the MBO data for an unique ID as: • Side and Price: we fill in the missing entries with previous values; • Size: we fill in the missing entry as 0 to indicate that this order is cancelled; • Action: we change Action to have values -1, 0 and 1. -1 means cancelling an order, 0 means updating price or size for the existing order and 1 means adding a new order; • Change price and Change size: we add these two new features to calculate the di↵erence between entries for the price and size of a specific ID to reflect the intention of adding or decreasing positions for the given order.4 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, andBengio 2014) andTransformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.

Bid Ask
3 assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, andBengio 2014) andTransformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.MBO is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO for a single security, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is placed, where 0 means cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market MBO is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO for a single security, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is placed, where 0 means cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure 1.In the example, a new limit order (ID=462805645163273214) is added to the ask side of the order book with price at 70.04 and size of 3024.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the MBO is a message-base data feed that provides individual queue p view price and quantity for each individual order.Essentially, it is a that describes the action of a specific trader at a given time poin focus on MBO that represents limit orders because market orders tiny percentage of total order flow.Table 1 shows an example of a for a single security, where: • Time stamp records the time point when an instruction is • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is plac cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means up size for the existing order, 1 means adding a new order and an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure Market by Order provides individual orders for all price levels (n 10).Each order is assigned an anonymous OrderID for identificatio for sorting.
Participant queue position can be determined by using the Order and proper sorting of book is done using PriorityID.PriorityID m modified or refreshed, OrderID is consistent for the life of the orde 3 Figure 1.A slice of a LOB at time t (Top).When a message LOB (Bottom) at time t+1.Size time: t time: t + 1 A LOB updates whenever there is a new message is illustrated in Figure 1.In the example, a new lim is added to the ask side of the order book with price book updates its status and the new order is added LOB only shows the total available quantity at eac us with extra information by showing individual does not directly indicate which price level the o scheme introduced in the next section allows us t not only obtain a smaller input space but also obt with LOB data.
In addition, the usage of MBO data increas understanding of order book dynamics without (2018, 2019a,b) apply convolutional neural network and LSTM to model the dynamics of LOB and demonstrate promising predictive results.Unlike traditional time-series models (Mills and Mills 1991;Hamilton 2020) or econometric models (Fama and French 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, and Bengio 2014) and Transformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.(Mills and Mills 1991;Hamilton 2020) or econometric models (Fama and French 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, and Bengio 2014) and Transformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.MBO is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO for a single security, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for trader identification; • Side indicates which side of the order book the order is placed, where 0 means cancelled order, 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an old order; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message of MBO flowing in and this process is illustrated in Figure Market by Order provides individual orders for all price levels (not restricted to top 10).Each order is assigned an anonymous OrderID for identification and a PriorityID for sorting.
Participant queue position can be determined by using the OrderID.Queue position and proper sorting of book is done using PriorityID.PriorityID may change if order modified or refreshed, OrderID is consistent for the life of the order.Size time: t time: t + 1 A LOB updates whenever there is a new message of MBO flowing is illustrated in Figure 1.In the example, a new limit order (ID=462 is added to the ask side of the order book with price at 70.04 and size book updates its status and the new order is added to the right price LOB only shows the total available quantity at each price level but M us with extra information by showing individual behaviours.Alth does not directly indicate which price level the order is added to, scheme introduced in the next section allows us to consider this in not only obtain a smaller input space but also obtain relevant inform with LOB data. In addition, the usage of MBO data increases transparency understanding of order book dynamics without disclosing custom Although, we can access to unique order ID but this number is g sequentially by the exchange match engine (CME 2020) and a priva to the customer, which keeps identification confidential.Further, where we sometimes only view limited price levels, MBO data allows entire order book with full-depth information.Such a granularity ca confidence in posting large order size as they can better evaluate th (2018, 2019a,b) apply convolutional neu of LOB and demonstrate promising pr models (Mills and Mills 1991;Hamilton 1993) that assume a parametric proces methods are able to capture arbitrary n assumption on the input data.(Mills and Mills 1991;Hamilton 2020) or econometric models (Fama and French 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.
In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, and Bengio 2014) and Transformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (2020) studies Transformers for LOB data and we adopt the Temporal Fusion Transformer (Lim et al. 2019), a novel attention-based architecture designed specifically for time-series, to predict price movements from MBO data.Our experiment shows good predictive results, suggesting the potential ability of modelling MBO data in place of LOB.

Descriptions of Market by Order Data
MBO data is a message-base data feed that provides individual queue position and we can view price and quantity for each individual order.Essentially, it is an order instruction that describes the action of a specific trader at a given time point.In this work, we focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Table 1 shows an example of a sequence of MBO data for a given instrument, where: • Time stamp records the time point when an instruction is given; • ID shows the unique ID for order identification which is anonymous to others; • Side indicates which side of the order book the order is placed, where 1 means bid side and 2 means ask side; • Action represents the specific instruction where 0 means updating the price or size for the existing order, 1 means adding a new order and 2 means cancelling an existing order.If Action = 2, the entries of Side, Price and Size are N/A as the match engine will cancel the existing order using the unique ID; • Price shows the price level of the instruction; • Size shows the position of the instruction.A LOB updates whenever there is a new message from the MBO data coming in and this process is illustrated in Figure 1, where we show how a market order and three di↵erent actions a↵ect a LOB.For example, if we look at the top of Figure 1, a new limit order (ID=46280) is added to the ask side of the order book with price at 70.04 and size of 7580.The order book updates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.
In addition, the usage of MBO data increases transparency and improves the understanding of order book dynamics without disclosing customer identification.Although, we can access to unique order ID but this number is generally assigned sequentially by the exchange match engine (CME 2020) and a private link is provided to the customer, which keeps identification confidential.Further, unlike LOB data where we sometimes only view limited price levels, MBO data allows us to observe the entire order book with full-depth information.Such a granularity can improve traders' confidence in posting large order size as they can better evaluate the potential market impact by knowing individual queue positions.

Data Preprocessing and Normalisation
Since the raw MBO data contains missing entries, we preprocess the data and then make the normalisation.We focus on MBO data that represents limit orders because market orders only account for a tiny percentage of total order flow.Removing market orders also allows us to compare models trained with MBO data and LOB data respectively.Figure 2 illustrates the process of data preprocessing and normalisation and we process the MBO data for an unique ID as: Figure 1.An illustration of how MBO data updates a LOB.Top: An addition of a new limit order; Middle top: A cancellation of an existing order; Middle bottom: An update for a partial cancellation; Bottom: A marketable buy limit order that crosses the spread.down.We use mid-prices to create labels and adopt the labelling method in Zhang, Zohren, and Roberts (2019a) to classify movements.In particular, we define where p t is the mid-price at time t.We denote the prediction horizon as k and it represents the number of arrivals of MBO data, meaning that we are working with tick time instead of clock time.To decide on the label we compare l t with a threshold (α), labelling it as up if l t > α, down if l t < −α and stationary otherwise.The choice of α is related to the prediction horizon (k) and we set α for each instrument to obtain a balanced training set.Our choices of k and α are listed in Section 5 and we show that the dataset are balanced under our choice.Note that Equation (1) introduces a smooth labelling that leads to consistent labels that are better for designing trading signals and the work of Zhang, Zohren, and Roberts (2019a) includes a more detailed discussion demonstrating the effects of different labelling methods.Interested readers are referred to their work for a detailed explanation.

Methodology
In this section, we introduce the different deep learning algorithms studied in our work.For a single input of any time-series, we write x 1:T , where x t represents the features at time t and T is the length of the sequence which will later correspond to the length of the lookback of the input.

Multilayer perceptrons (MLPs)
MLPs are canonical neural network models where a typical network is organised into a series of layers in a chain structure, with each layer being a function of the layer that precedes it.We can define the hidden layer of a MLP as: where h (l) ∈ R Nl represents the l-th hidden layer with weights W (l) ∈ R Nl×Nl−1 and biases b (l) ∈ R Nl .Here g (l) (•) is the activation function that allows networks to model nonlinearities.The final output is a function of the last hidden layer and we compute objective functions to minimise errors between target outputs and estimates.However, for MLPs, we first need to flatten x 1:T and feed it to subsequent hidden layers.Doing this breaks the time dependences and treats features at different time stamps independently.We generally observe inferior results using MLPs and find that recurrent neural networks (RNNs) often deliver better performance.This is because a RNN acts as a memory buffer by summarising past information and recursively updating the hidden state with new observations at each time step of the input (Zhang, Zohren, and Roberts 2020a).

Long Short-Term Memory (LSTMs)
Standard RNNs suffer from vanishing or exploding gradient problems (Bengio, Simard, and Frasconi 1994) and Long Short-Term Memory networks (LSTMs) are proposed to solve this problem.This is done by operating a gating mechanism that efficiently controls the propagation of past information (Hochreiter and Schmidhuber 1997).A LSTM updates its hidden state recursively and has a cell state c t coupled with a series of gates at each hidden state.In mathematical terms, we can write Input gate: where h t−1 is the hidden state of a LSTM at time t − 1 and σ(•) represents the sigmoid activation function.We use W and b to represent weights and biases at different gate operations.Subsequently, the current cell state and hidden state can be written as: where , is the element-wise product and tanh(•) is the hyperbolic tangent activation function.The hidden state h t summarises the information from past states and current observations, and the gating mechanism efficiently addresses the vanishing gradient problem.

Attention Mechanism
The Attention Mechanism (Bahdanau, Cho, and Bengio 2014) is heavily used in machine translation and is proposed to solve the problem of diminishing performance for long input sequences.On the one hand, a LSTM calculates the final output as a function of only the last hidden state.An attention model, on the other hand, with an additional component called context vector, assigns trainable weights to all the hidden states of an input.We can write an attention mechanism for modelling many-to-one problem as: where h t can be the hidden state from a LSTM at time t for an input x 1:T , and we define the context vector c T as: Convext vector: Attention weights: α(t, T ) = exp(e(t, T )) T t=1 exp(e(t, T )) , where v ∈ R Nh and W h ∈ R Nh×Nh are the trainable weights.We can then obtain the attention vector: where the final output is a function of c T , taking information at every hidden state into account.

Descriptions of Datasets
Our datasets consist of MBO data for five highly liquid stocks, Lloyds (LLOY), Barclays (BARC), Tesco (TSCO), BT and Vodafone (VOD), for the entire year of 2018 from the London Stock Exchange.From the MBO data one can derive LOB data which we use for our benchmarks and for references prices.Our LOB dataset contains ask and bid information for an order book up to ten levels.For our modelling we remove messages outside ten levels from the MBO data to align the timestamps of two datasets allowing for fair comparisons in the performance analysis.Afterwards, we train two sets of models by separately using the MBO and LOB data with the same targets.
A direct comparison can be then made to compare predictive performance using the MBO and LOB data respectively.For each trading day, we take the data between 08:30:00 and 16:00:00, restricting ourselves to liquid continuous trading hours, excluding any auctions.Overall, we have more than 169 million samples in our dataset and we take the first 6 months as training data, the next 3 months as validation data and the last 3 months as testing data.In the context of high-frequency microstructure data, we have more than 46 million observations in our testing set, providing sufficient scope for verifying the robustness and generalisability of model performance.
We test our models at three prediction horizons (k = 20, 50, 100) and list the choices of label parameter (α) in Table 2.We choose α for each instrument to have a balanced training set and the proportion of different classes is presented in Figure B1 in Appendix B. Overall, the labels are roughly balanced for the testing set as well (noting that those were fixed on the training set).In terms of the lookback window (T ) of the input, we take the 50 most recent updates of MBO data to form a single input and feed it to our model.Note that we are working with tick time instead of physical clock time.In other words, the notation of time step refers to the arrival of MBO updates.One advantage of working with tick time is to deal with uneven trading volumes throughout a day.When a market opens with great volatility, we obtain more ticks and the model naturally makes faster predictions.

Training Procedure
For the MBO data, we study the deep learning models (MBO-MLP, MBO-LSTM and MBO-Attention) introduced in Section 4 along with a simple linear model (MBO-LM).
We list the values of hyperparameters for different algorithms in Table 3, and the Gradient descent with the Adam optimiser (Kingma and Ba 2015) is used for training all models.The complete search space of hyperparameters is included in Appendix A and we use a grid-search method to select best hyperparameters.
For the LOB data, we include the 10 levels of a limit order book and past 50 observations as a single input.We follow the normalisation scheme in Zhang, Zohren, and Roberts (2019a) and both the MBO and LOB datasets share the same predictive targets, allowing a direct comparison between different models.We choose state-of-art network architectures as comparison models, including the LOB-LSTM (Sirignano and Cont 2019), LOB-CNN (Tsantekidis et al. 2017) and LOB-DeepLOB (Zhang, Zohren, and Roberts 2019a).The details of the network architecture and choices of hyperparameters can be found in their papers.We also include a linear model (LOB-LM) and a multilayer perception (LOB-MLP) as benchmark models.
We use categorical cross-entropy loss as our objective function and the learning is stopped when the validation loss does not decrease for more than 10 epochs.In general, it takes about 30 epochs to finish model training.TensorFlow and Keras (Girija 2016) are used to build all models and four NVIDIA GeForce RTX 2080 are used in our experiment.

Experimental Results
Table 4 summarises the results for all models studied (different rows) and one suitable for each different prediction horizons.We use four evaluation metrics (different columns) to make comparisons: Accuracy, Precision, Recall and F1-score.Kolmogorov-Smirnov (Massey Jr 1951) tests are used to check the statistical significance of results and all differences in evaluation metrics are significant.We observe that the models trained with LOB data are comparable, but slightly outperform the ones using MBO data.While a priori, MBO data contains more information (contents of level and trades), it is harder to model the raw messages rather     improve predictive performance.In particular, Ensemble-MBO-LOB delivers the best performance, indicating the potential benefits of combining the MBO and LOB data.
Since this work aims to study MBO data, we focus on analysing results from the models trained using the MBO data.We can see that the deep learning models outperform the simple linear model, suggesting the existence of nonlinear features in financial time-series, and networks are capable of extracting such features from the raw messages in MBO data.We observe that MBO-MLP delivers inferior results compared to other networks.This is most liekly due to the structure of the MLP which has full connectivity between input and hidden units -leading MLPs to often underperform when compared to other networks in financial applications with low signal-to-noise ratio.MBO-LSTM and MBO-Attention all have a recurrent structure with parameter sharing that enables hidden states to summarise past information and update status with current observations.Such a process filters unnecessary input components and naturally models the propagation of order flow.This observation has also been reported by Lim, Zohren, and Roberts (2019); Zhang, Zohren, and Roberts (2020b,a) where they find that networks with a recurrent nature deliver better results than MLPs when modelling financial time-series.
Figure 4 shows the normalised confusion matrices which helps to understand how models perform at predicting each label class.We calculate the accuracy score for every instrument and for each testing day to understand the consistency of our results.This is summarised in the whisker plots in Figure 5.Each point in the whisker plot represents the accuracy score for one testing day, and we make the box represents the median and interquartile range from these scores.We can see that the MBO-LM and MBO-MLP have large interquartile ranges, suggesting high variances in results, while MBO-LSTM and MBO-Attention show consistent and robust results across the entire testing period.These whisker plots allow us to understand the model performance on a daily basis to ensure the generalisability of our methods.In particular, we see that performance is consistent across the entire testing period and not focused on a few days which could be due to noise.

Conclusion
In this work we introduce deep learning models for Market by Order (MBO) data.To the best of our knowledge this is the first study of predictive modelling of MBO data  using data-driven techniques in the academic literature.Current academic research in this direction is primarily focused on LOB data and we hope that this work helps to popularise the usage of MBO which we see as the next frontier in microstructure modelling in financial data science.We carefully introduce the structure of MBO data and demonstrate a specific normalisation scheme that allows model training with multiple instruments using deep learning.We consider a wide range of deep learning architectures including MLP, LSTM and attention layers.Our dataset consists of millions of sample for highly liquid instruments from the London Stock Exchange, ensuring the consistency and generalisability of our methods.
We compare models trained using MBO and LOB data respectively.We show that we can obtain similar, but slightly inferior, performance by modelling raw MBO messages, when compared to modelling LOB data.While MBO data a priori contains more information, it is harder to model the raw messages rather than LOBs, which can be seen as derived features of the data.Importantly, we show that our models can extract additional information from the MBO data which is not captured by models trained on LOB data.This means that they can add additional value as we demonstrate in an ensemble approach that combines signals from the MBO and LOB data and delivers the best performance.
In subsequent continuation of this work, we can apply MBO data to various financial applications including market-making or trade execution.Further, the work of Briola et al. (2021) applies Reinforcement Learning (RL) algorithms to high-frequency trading, and it would be interesting to test the effectiveness of using MBO data within a RL framework.

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1.

3Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1. 3

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1.

3Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1. 3

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we LOB (Bottom) at time t+1.
Descriptions of Market by OMBO data is a message-base data feed we can view price and quantity for ea instruction that describes the action this work, we focus on MBO data that only account for a tiny percentage of to sequence of MBO data for a given inst • Time stamp records the time p • ID shows the unique ID for orde • Side indicates which side of the bid side and 2 means ask side; • Action represents the specific in size for the existing order, 1 mea an existing order.If Action = 2, the match engine will cancel the • Price shows the price level of the • Size shows the position of the in Descriptions of Market by Order 3

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1.
Descriptions of Market by Order

3Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1.
Descriptions of Market by Order Descriptions of Market by Order 3

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we LOB (Bottom) at time t+1.
Descriptions of Market by Order 3

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1.
Descriptions of Market by Order 3

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we have a new update for LOB (Bottom) at time t+1.
Descriptions of Market by Order Descriptions of Market by Order DaMBO data is a message-base data feed that pro we can view price and quantity for each individ instruction that describes the action of a speci this work, we focus on MBO data that represent only account for a tiny percentage of total order sequence of MBO data for a given instrument, w• Time stamp records the time point when • ID shows the unique ID for order identifica • Side indicates which side of the order boo bid side and 2 means ask side; • Action represents the specific instruction size for the existing order, 1 means adding an existing order.If Action = 2, the entrie the match engine will cancel the existing o • Price shows the price level of the instructi • Size shows the position of the instruction.
Descriptions of Market by Order 3

Figure 1 .
Figure 1.A slice of a LOB at time t (Top).When a message of MBO flows in, we LOB (Bottom) at time t+1.
Our exp better results than linear methods for m In this work, we investigate recent d danau, Cho, and Bengio 2014) and Tra the e↵ectiveness of modelling MBO da diminishing performance with long in hidden state of a recurrent network, an computing to speed up the training pr Transformers for LOB data and we a et al. 2019), a novel attention-based ar to predict price movements from MBO results, suggesting the potential ability 3. Market by Order Data 3.1.Descriptions of Market by O MBO data is a message-base data feed we can view price and quantity for ea instruction that describes the action this work, we focus on MBO data that only account for a tiny percentage of to sequence of MBO data for a given inst • Time stamp records the time p • ID shows the unique ID for orde • Side indicates which side of the bid side and 2 means ask side; • Action represents the specific in size for the existing order, 1 mea an existing order.If Action = 2, the match engine will cancel the • Price shows the price level of the • Size shows the position of the in

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by orde

Table 1 .
An example of a sequence of mark

Table 1 .
An example of a sequence of ma

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.
(2018, 2019a,b)apply convolutional neural network and LSTM to model the dynamics of LOB and demonstrate promising predictive results.Unlike traditional time-series models

Table 1 .
An example of a sequence of market by order data.
70.01 70.02 70.04 70.05A LOB updates whenever there is a new message of MBO flowing in and this process 3

Table 1 .
An example of a sequence of market by orde Time stamp records the time poi • ID shows the unique ID for order i • Side indicates which side of the or bid side and 2 means ask side; • Action represents the specific inst size for the existing order, 1 means an existing order.If Action = 2, th the match engine will cancel the ex • Price shows the price level of the i • Size shows the position of the inst

Table 1 .
An example of a sequence of mark

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of ma (2018, 2019a,b)apply convolutional neural network and LSTM to model the dynamics of LOB and demonstrate promising predictive results.Unlike traditional time-series models

Table 1 .
An example of a sequence of market by order data.

Table 2 .
An example of a sequence of market by order data.

Table 3 .
An example of a sequence of market by order data.

Table 4 .
An example of a sequence of market by order data.

Table 5 .
An example of a sequence of market by order data.bookupdates its status and the new order is added to the right price level.In general, a LOB only shows the total available quantity at each price level but MBO data provides us with extra information by showing individual behaviours.Although, MBO data does not directly indicate which price level the order is added to, our normalisation scheme introduced in the next section allows us to consider this information and we not only obtain a smaller input space but also obtain relevant information comparable with LOB data.

Table 2 .
An example of a sequence of market by order data.

Table 3 .
An example of a sequence of market by order data.

Table 4 .
An example of a sequence of market by order data.

Table 5 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
(Mills and Mills 1991;Hamilton 2020)y order data.)applyconvolutionalneuralnetworkand LSTM to model the dynamics of LOB and demonstrate promising predictive results.Unlike traditional time-series models(Mills and Mills 1991;Hamilton 2020)or econometric models (Fama and French 1993) that assume a parametric process for the underlying time-series, deep learning methods are able to capture arbitrary nonlinear relationships without placing specific assumption on the input data.Our experiment also suggest that deep networks deliver better results than linear methods for modelling MBO data.In this work, we investigate recent developed techniques including Attention (Bahdanau, Cho, and Bengio 2014) and Transformer(Vaswani et al. 2017)to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (

Table 1 .
An example of a sequence of market by order data.Price shows the price level of the instructi • Size shows the position of the instruction.

Table 1 .
An example of a sequence of market by orde Price shows the price level of the i • Size shows the position of the inst

Table 1 .
An example of a sequence of mark

Table 1 .
An example of a sequence of market by order data.

Table 1 .
Price shows the price level of the • Size shows the position of the in An example of a sequence of ma Cho, and Bengio 2014) and Transformer (Vaswani et al. 2017) to demonstrate the e↵ectiveness of modelling MBO data.Attention is used to solve the problem of diminishing performance with long input sequence by utilising information at each hidden state of a recurrent network, and Transformer is designed to allow for parallel computing to speed up the training process.The work of Wallbridge (

Table 1 .
An example of a sequence of market by order data.

Table 2 .
An example of a sequence of market by order data.

Table 3 .
An example of a sequence of market by order data.

Table 4 .
An example of a sequence of market by order data.

Table 5 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of market by order data.assume a parametric process for the u methods are able to capture arbitrary nonlinear assumption on the input data.Our experiment al better results than linear methods for modelling In this work, we investigate recent developed danau, Cho, and Bengio 2014) and Transformer the e↵ectiveness of modelling MBO data.Atten diminishing performance with long input seque hidden state of a recurrent network, and Transfo computing to speed up the training process.Th Transformers for LOB data and we adopt the et al. 2019), a novel attention-based architecture to predict price movements from MBO data.Ou results, suggesting the potential ability of model

Table 1 .
An example of a sequence of market by orde Time stamp records the time poi • ID shows the unique ID for order i • Side indicates which side of the or bid side and 2 means ask side; • Action represents the specific inst size for the existing order, 1 means an existing order.If Action = 2, th the match engine will cancel the ex • Price shows the price level of the i • Size shows the position of the inst

Table 1 .
An example of a sequence of mark

Table 1 .
An example of a sequence of market by order data.

Table 1 .
An example of a sequence of ma

Table 1 .
An example of a sequence of market by order data.

Table 2 .
An example of a sequence of market by order data.

Table B1 .
An example of a sequence of market by order data.

Table B2 .
An example of a sequence of market by order data.

Table B3 .
An example of a sequence of market by order data.

Table B1 .
An example of a sequence of market by order data.

Table B2 .
An example of a sequence of market by order data.

Table B3 .
An example of a sequence of market by order data.

Table B1 .
An example of a sequence of market by order data.

Table B2 .
An example of a sequence of market by order data.

Table B3 .
An example of a sequence of market by order data.