Mission Possible

April 17, 2023

Why a Pre-Trade Tape Works and Needs to be Adopted in Trilogue

EU Policymakers are gathering this week to begin finalising the MiFiR review, and a crucial decision they will need to make is the type of consolidated tape (CT) that will be mandated for the EU equites market.

This is a golden opportunity to deliver what the industry needs to support growth in EU capital markets. But the debate has been marked by claims and counter-claims about what flavour of CT is technically possible and would furnish retail and professional investors with reliable and useful data, thereby encouraging greater participation in EU equity markets.

Here are the facts.

As the chart below clearly shows, the slower the tape, the more misleading it becomes.

A real-time pre-trade CT, including an EBBO, is technically viable and would provide useful price guidance to investors, despite the latency introduced by geography and consolidation:

For the vast majority of stocks in the EU, an EBBO would accurately reflect the current bid/offer prices more than 98% of the time.
The impact of geographic latency is felt more strongly in the minority of most active/volatile stocks, but at its most severe still results in a EBBO that accurately reflects current prices 97% of the time
A CT EBBO that is a reliable source of the current price for most of the time would be of huge value to investors of all types, by significantly simplifying access to stock prices and opening up investment European listed companies to a much more global audience

Alternative proposals advocated by FESE, or proposed by the European Council, would result in the publication of misleading market data of no use to any investor:

Post-trade EBBO “snapshot” data, where an EBBO quote is published only with each trade, reflecting the prevailing EBBO immediately before the trade (as proposed by the European Council and now endorsed by FESE), would see less than a quarter of all EBBO price changes published, would only be an accurate guide to the current price 14% of the time, hence painting a misleading picture.
15-minute delayed data, and 1-minute delayed data are both essentially useless as a guide to the current price, as they are accurate only 2% and 20% of the time respectively.

These results suggest that the EU (and subsequently the UK) needs to decide between the only credible option – a real-time pre-trade CT that provides reliable price data to consumers - and one that provides misleading or compromised data, or no pre-trade data whatsoever, and hence is a near-guaranteed failure.

Background

What is “Geographic Latency”?

Market data latency is the delay between an event on a venue (a trade, or a quote update), and when that event is reflected in the market data you receive.
The geographic component of that latency relates to the transmission time between the data centre where the exchange is hosted, and your systems – which is a function of this physical distance the signal must travel.
Venues and customers choose where to locate their computing infrastructure. If your infrastructure is not co-located in the same data centre as a venue, then transmission latency for the data to cover the distance is an unavoidable reality.
Depending on how far away your own technology infrastructure is from each exchange, you may be exposed to geographic latency of 1, 5, 10, or 20 milliseconds (thousands of a second).
So, for somewhere between 1 and 20 milliseconds after each exchange event, your systems may have a “stale” or “latent” view of the actual market price at which you can trade.

How does a Consolidated Tape affect geographic latency?

Most market participants already experience a degree of geographic latency, because their systems are not co-located with each and every venue. Typically, they centralise their infrastructure in one data centre, and must transmit market data for each venue to that data-centre. The further away the venue, the greater the latency (from the perspective of the participant) in that venue’s market data feed.
Once a Consolidated Tape is introduced, an extra transmission-hop is added. Instead of data travelling directly from each venue to the participant, it would first be transmitted to the data centre hosting the Consolidated Tape infrastructure, after which consolidated data would be transmitted to market participants.
Depending on the sensitivity to latency (for a particular participant, and a particular use case), it might be acceptable to rely on the (more latent) Consolidated Tape, or preferable to rely on (probably faster but more expensive) direct feeds from each venue.

What is the potential impact of latency on the reliability of a Consolidated Tape EBBO?

As discussed above, every venue-level event that drives a change in the consolidated European Best Bid (EBB) or European Best Offer (EBO) means that for a short period time, the published EBBO is “stale” or “latent”.
Everybody agrees that a CT is not intended to serve latency-sensitive use cases (such as order placement by a Smart Order Router), but depending on the number of these updates to the EBB and EBO, it’s possible that in aggregate they render the CT EBBO unreliable as a source of the current price a material proportion of the time, even for non-latency-sensitive use cases (such as a retail investor seeing a price on their mobile phone).
If this were true, then the CT EBBO would be less useful, and potentially even counterproductive; for example, as an online retail broker, would you want your customers to see a price onscreen, but then be unable to execute at that price a significant proportion of the time?
Alternatively, a CT EBBO that is a reliable source of the current price for most of the time would be of huge value to investors of all types, by significantly simplifying access to stock prices and opening up investment European listed companies to a much more global audience

Measuring the Impact of Latency

How can this impact be measured?

To measure the extent to which geographic latency renders the EBBO quote stale/latent, one needs to consider how many venue-events would result in an altered EBBO, and for each such event, calculate (and sum up) the subsequent period during which the EBBO is latent (i.e. would not yet reflect the impact of that event).

What’s the wrong way to measure it?

One exchange – who (along with other EU exchanges) have consistently opposed a pre-trade consolidated tape - recently declared that “physics makes a pre-trade tape impossible” for EU shares and ETFs. To evidence and quantify their argument, they did the following arithmetic:
In a given instrument, and on a selected date, there are “N” events across the competing venues that contribute to a change in the price of either the European Best Bid (EBB) or European Best Offer (EBO).
After each such quote update, the CT quote is “latent” (or unreliable/wrong) for X milliseconds, because it would not yet reflect the impact of the most recent update.
They estimate X depending on the choice of stock and location of the primary exchange relative to an assumed “central” location of the Consolidated Tape infrastructure (roughly speaking they double the geographic latency between the primary venue and the CT).
So they estimate the total time that a CT is “latent” by multiplying together the total number of updates N by the X milliseconds.
And for the selected instrument in their example, they arrive at the conclusion that the CT is potentially “latent” for more than 80% of the trading day.

Why is this approach wrong?

This simple estimation technique contains a hidden (and wrong) assumption – which is that the venue events (trades and quote changes) are evenly distributed.

To illustrate using a simple example – with just 10 updates (N=10) in a 200 millisecond “day”, and the CT is subject to Xms of geographic latency (X=15ms), the chart below indicates that after each quote update from a contributing venue, the CT EBBO is stale for the following 15ms.

The x-axis is the time in milliseconds
Events from a contributing venue causing an EBBO are shown in green
For 15ms after each update the EBBO is latent, and shown in red
After 15ms passes from the last update, the EBBO is valid, and shown in blue, until the next update from a contributing venue
As you can see, in a stock where the updates are (reasonably) evenly distributed, then the EBBO can be latent a high proportion of the day

But – are the quote updates from contributing venues typically evenly distributed as the methodology assumes/implies? No…

The below histogram shows an actual (and typical) distribution of events for a single stock that was used in the study. As you can see, they are “bunched”, mostly coming in a flurry, and mostly less than a quarter of a millisecond apart from the prior trade/update.

And – taking our simplified example - look what happens to the proportion of time the EBBO is latent if we reflect the same number of quote updates, but happening in quick succession to one another as happens in reality

Reflecting a more realistic dispersion of quote updates, the EBBO is now valid (not latent) most of the time.

So, what’s the right way to measure it?

Rather than crudely estimate the impact of latency (multiplying N events by Xms) – which assumes an even distribution of events, one actually needs to measure the latency based on the actual distribution of events.

To do this, you have to take each and every venue-event that results in a change to the EBBO, and then count the subsequent time (in milliseconds or microseconds) during which the published CT EBBO is latent or valid.

To illustrate, let’s assume that X=10ms (which is roughly twice the transmission time between Frankfurt and Paris), the period of latency following each event:

For two events that are relatively far apart in time, the CT EBBO will be latent for 10ms following the first event, and valid thereafter.

Example 1: (X = 10ms, and the duration between two events is 1 second)
We determine the that the EBBO would be stale for the first 10ms of the 1000ms duration
We determine the that the EBBO would be valid for the remaining 990ms of the 1000ms duration

For two events in quick succession, the CT EBBO will be latent for less time following the first event.

Example 2: (X = 10ms, and the duration between two events is 5ms)
We determine the that the EBBO would be stale for the entire 5ms of the 5ms duration
There would be no period following this event during which the EBBO would be valid… instead (and to avoid any double counting) we re-start the clock looking at what happens after the second event.

Detailed Methodologies

We take composite EBBO data for a single day from the vendor BMLL Technologies. This source data includes every trade and every EBBO change across all major venues, with the original venue timestamp.

To generate representative results, we have selected the constituents of four EU indices:

The DE40 index of the most liquid “large-cap” German stocks
The DE M50 index of the next 50 “mid-cap” stocks
The FR40 index of the most liquid “large-cap” French stocks
The FR M20 index of the next 20 “mid-cap” stocks

We calculate metrics on EBBO reliability for each of these instruments, and then calculate an index and index-subset averages.

Whilst the constituents of these indices represent only a small fraction of 6,000 instruments that are traded competitively across venues, the results will be illustrative of patterns that hold true for liquid and midcap index constituents across all EU markets, and any trends in the results (e.g. higher latent/valid time for less active stocks) can be safely extrapolated to other less active instruments outside of these indices.

Calculating the Impact of Geographic Latency

To calculate the impact of geographic latency on a real-time or near real-time EBBO:

For each instrument we gathered all of the market data events/updates that cause a change to the EBB or to the EBO price – these are a mix of new orders setting a new EBB/EBO, and Trades or Cancellations eliminating an existing EBB/EBO.

We then calculate a “duration” as the time between each event and the subsequent event for the instrument – using the most granular timestamps available (millionths of a second or finer).

We assume, due to geographic distances, that the EBBO would be “latent” or “invalid” for a period of X milliseconds after each such event (e.g. for the first Xms of the duration) , and valid thereafter (e.g. for the remainder of the duration) – unless another EBB/EBO update happens prior to the X milliseconds expiring, in which case the entire duration following the event is considered as “latent”.

So – to calculate a full-day result for a particular value of X, we are summing two formula across the trading day for each instrument:

Latent/Invalid time: the Sum of Minimum (Duration, X ms) across all events – e.g. it’s either the first Xms after each event, or the entire event duration, which ever is smaller
Valid time: the Sum of Maximum (Duration – X ms, 0) across all events – e.g. it’s what’s left of each update’s duration after Xms (but never negative)

We compute these results for different values of X

X=7ms (approximately twice the geographic latency between Frankfurt and Zurich, reflecting the “round trip time” in both directions with time for consolidation)
X = 15ms (approximately twice the average value for geographic latency amongst EU markets/participants, reflecting the “round trip time” for data transmission in both directions, plus time for the actual consolidation)
X = 50ms (a very conservative estimate, relevant perhaps to the example of a Scandinavian stock being traded by a Spanish participant)
X = 1000ms / 1 second – included illustrate the sensitivity to additional latency
X = 60,000ms (1 minute) and X = 900,000ms (15 minutes0 – both included because FESE previously advocated for a CT delayed by 1 or 15 minutes "to avoid publication of a misleading quote"

For each instrument and value of X used, we then divide the above summed results for Latent and Valid time respectively by the total length of the trading day (approximately 30.5 million milliseconds; 8.5 hours x 60 minutes x 60 seconds x 1,000 milliseconds) to express the results as a simple percentage of the day during which the EBBO is latent/valid.

Having calculated the results independently for each security, we then calculate the index-level average as a simple average across the constituents

Assessing the Post-Trade EBBO Snapshot model

Under this proposal, an EBBO is only disseminated when a trade happens – so there are many fewer published EBBO updates. So we calculate two different measures:

First, we count the total number of EBBO price changes, and then for each one we determine if it is published. An EBBO will be published only if a trade¹ occurs prior to the quote changing. We can then determine what proportion of EBBOs are/are not published.

Second, we measure the valid duration of the EBBO updates that are published. For the sake of simplicity, we ignore latency in this example, which may flatter the results a little.

Specifically, for each trade we look at the immediately-prior prevailing quote that would be included on the “snapshot”, and sum the duration until either:

A change to either the EBB or EBO (which could even be an instantaneous consequence of the trade) – in which case the published EBBO snapshot is no longer valid
The next Trade - in which case we re-start the clock

Having summed this valid duration by security, we divide by the total length of the trading day to express the result as a simple percentage.

Again, having calculated the two results independently for each security, we then calculate the index-level averages as a simple average across the constituents.

^{1) For this analysis we only count Lit orderbook trades; whilst these represent the significant majority, it's likely that also giving consideration to other trades (RPW, periodic auction) might increase the count.}

Full Results

Here we present detailed results for each methodology, across each the four indices and subsets within them.

Real-Time Pre-Trade EBBO results

The above chart can be read as follows:

For the DE40 index as a whole (i.e. the average across its constituents):

7ms of latency results in the EBBO being valid 99.5% of the time
15ms of latency results in the EBBO being valid 99.1% of the time
50ms of latency results in the EBBO being valid 97.5% of the time
1second of latency results in the EBBO being valid only 74.6% of the time
1minute of latency results in the EBBO being valid only 2.1% of the time
15minutes of latency results in the EBBO being valid 0% of the time, i.e. always wrong

From the three additional charts in the Appendix for the DEM50, FR40 and FRN20, the corresponding numbers are

Additionally, the charts include results for the most, median and least active (defined by the total number of EBBO updates per day) group of three instruments within each index

In conclusion, for a real-time pre-trade EBBO the results demonstrate:

In aggregate, a real-time pre-trade EBBO would be highly reliable even after unavoidable geographic latency is taken into consideration.

Crucially - the slower the tape, the more misleading it will be.

There is a clear trend, whereby a near-realtime EBBO (latent by anything up to 50ms) is increasingly reliable for less active instruments. Hence, for the vast majority of instruments not included within the large-cap or mid-cap indices, the EBBO would be reliable in excess of 99% of the time.
Imposing an artificial delay of 1 second, 1 minute or 15 minutes renders the EBBO data unreliable/misleading.

Historical EBBO Snapshot results

The first row of the above table, for the DE40 index of most liquid German stocks, can be read as follows:

On average across the 40 index constituents, the post-trade EBBO snapshot model results in tonly 23.4% of EBBO quotes actually being published, whilst those EBBO snapshots that are published are a valid guide to the current price only 11.8% of the time.

For the most active constituent instruments, the published snapshots would be valid only 6.9% of the time.
For the least active constituent instruments, the published snapshots would be valid only 11.7% of the time.
At the lowest end of the spectrum is an instrument for which the the post-trade EBBO snapshot model results in a published quote that is valid only 4.3% of the time
At the best end of the spectrum is a constituent for which the the post-trade EBBO snapshot model results in a published quote that is valid only 28.3% of the time

Unlike for a real-time pre-trade EBBO, where the the impact of liquidity differed according to the liquidity/activity of a stock, this post-trade snapshot model is more uniformly poor.

Across both large cap and mid-cap indices, the average validity of snapshot EBBOs as a guide to the current price is below 20%, and even for the least active instruments is no better than 33%. This can be explained by two phenomenon:

In liquid/active stocks, even if the snapshot was correct at the time of publication, the high frequency of quote updates means that a published snapshot EBBO is quickly stale due to a change in the (unpublished) EBBO
In less active stocks, the chance is much higher that a snapshot is already wrong/stale at the point of publication. Thinner liquidity in the stock means it’s more likely that each trade causes the EBBO to change, hence most published snapshots are wrong before they are even published.

In conclusion, the results demonstrate:

The post-trade EBBO snapshot model produces EBBO quotes that are completely useless as a source of price transparency, and hence will confuse/mislead investors.
There will be little or no demand for this data for use by retail investors, for use in risk measurement, or in any use case that requires an intra-day EBBO – and hence a CTP will not be commercially viable.
Following this approach will most likely lead to an expensive failure, and the only winners would be existing exchanges who would prefer to retain full control over market data licensing/pricing than to support a CMU initiative designed to stimulate growth.

Appendix 1 - Addtional Results

Appendix 2 - Responding to Myths about the CT

Myth #1: A CT with real-time pre-trade data would disadvantage or confuse retail investors

The opposite is true. The above analysis demonstrates that the slower the feed, the more unreliable and misleading the data becomes.

Myth #2: A CT with pre-trade will further enable latency arbitrage opportunities by market participants with faster direct feeds, to the disadvantage of investors reliant on the CT

The opposite is true. Latency arbitrage opportunities exist today, but without sufficient transparency that would enable its detection/avoidance by investors. A real-time pre-trade CT will provide end investors with consolidated quote data both before and after execution that will allow them to understand their execution choices, to detect poor execution quality, and to take action where they have been disadvantaged.

Myth #3: A pre-trade CT will increase “price-referencing” and reduce orderbook participation

It is obviously non-sensical to argue that the broader availability of market data is dangerous, and indeed the position is being taken mainly in support of a secondary objective – to restrict trading flexibility for investors and force all their activity into lit orderbooks.

In fact, an EBBO will enhance investor protection, price formation and market transparency by ensuring investors are informed of the full liquidity and best available prices across EU markets.

Myth #4: A real-time pre-trade CT will advantage non-EU participants over EU-based market participants

This appeal to economic nationalism is without merit, and is contradicted by the fact that EU asset managers are amongst the most vocal supporters of a pre-trade CT.

The success of EU firms can be best assured by delivering an integrated, vibrant and growing capital markets ecosystem that better reflects the scale of the real economy. An ambitious pre-trade CT will strengthen EU capital markets, will benefit all investors and issuers, and provide a stronger foundation for EU firms to compete on the global stage.

Myth #5: A CT threatens the economic viability of small exchanges

Access to consolidated pan-EU data source under a single licensing framework will encourage international investors to assess investment opportunities across all EU markets rather than only a subset, leading to increased demand to trade in smaller EU markets, and hence to improved access to capital for EU-listed issuers, all of which will benefit smaller exchanges.

It will also ensure that small exchanges achieve broader distribution of their market data than is currently the case when it must be purchased, licensed and reported independently.

Europe

Cboe Global Markets