We run a small business in High Frequency Forex trading. Our trading entries and exits are in the millisecond range. We are seeking a data scientist/statistician to assist us to find certain patterns in bid/ask prices, which will assist us with predictions, based on statistical probabilities, for our trades. We will be using EUR/USD raw tick data for this experiment.
Essentially, the problem is this:
At certain times of the day, the difference between the bid price and the ask price (i.e. the spread), drops to zero or even goes negative. In some cases, the bid jumps up (or beyond) the ask, and at other times, the ask drops to (or below) the bid, (based on where the bid and ask was just prior to the zero spread event). Or the bid and ask may meet in the middle.
We believe a zero spread (or negative spread) event marks some significance for market movement, i.e. up or down as there must be some price action causing major movement. The question is which way and how far does the price move and is there a pattern? Alternatively, a zero spread or negative spread event could just be market in-efficiencies in which case there may be no pattern at all.
Our current theorem is that when a bid jumps up to the ask, (or above), that signals a large buying group moving the market up. We want to capture that movement with a quick scalping trade. But how far does it move and how long does it take? Similarly, if ask drops to or below the bid, we believe there is large selling pressure which would continue downward movement where we could short the market.
We have certain questions the data scientist needs to answer to prove or disprove our theorem using the raw tick data provided.
When a zero spread (or negative spread) event occurs, what price level does this occur in relation to the bid/ask just prior to that event? The definition of the "prior event" are the bid/ask figures which are 10ms, 50ms, 100ms, 500ms prior to the zero spread event. Please provide graphs/distribution curves.
What happens after the zero spread event 50ms, 100ms, 500ms, 1 sec, and 2 sec after? Where is the bid/ask price at those periods? Is there a statistical correlation between where the bid/ask price was before the zero spread event, then the location of the zero spread event, and then bid/ask price after?
For negative spreads, is there a correlation between the size of a negative spread and the subsequent price movement after the negative spread event. i.e. does a larger negative spread correlate to a larger price movement and in which direction?
Are there any other machine learning techniques, neural networks, performance modelling or simulations you would suggest for our problem?