An equity trading strategy using nearest neighbor algorithms to predict future return
Full report available here.
Data Mining For Alpha Signals
In this research we mine past data for similar N day patterns, and then use that information to predict return for next several days. It’s a strategy based on K-Nearest Neighbor algorithm. There are several parameters we choose, such as the size of the look back window and how many days ahead to predict return.
How to measure the similarity between the patterns is the most important part of this research, the main idea behind this strategy is that if the two candlesticks pattern are very similar to each other, then we assume that the trend afterwards will also be very similar. To measure similarity, for example, if we choose the length of the pattern is 3, we need 11 numbers. Assume one pattern start from n, the other one start from m.
Then, the L2 distance between two patterns can defined as:
We utilize the distance above and go back through the entire history to find top three patterns that most similar to current reference pattern. As shown below, the purple line indicating current reference pattern(length=5). The other three are the most similar patterns in the past 100 days (for illustration purpose). The number below each pattern is its distance from the reference pattern.
This simple trading idea is based on this pattern recognition, expecting the history will repeat to some extent. In this part I tried several distance measures like l1 norm, l2 norm, hamming distance etc. Then, observe them to see whether this recognition make sense or not. For example, after visualization, I found that because the volatility level is evolving all the time. Some two very similar patterns, because the body of one pattern are all much longer than the reference pattern, in this case their distance is actually very large. So, for each item in the distance, I choose to normalize them with their own bar size. In this way, the pattern looks more similar to the reference one.
The strategy here is to select a number of the closest patterns we find, and simply average their next-day returns to arrive at an expected return (also tried weighted version using distance as weight). If it is a positive return we open long position and short position vice versa.
To test this trading idea, after finding proper distance measure, we do grid search using in-sample data of several tickers to see whether there is some positive signal of this strategy. Specifically, I use bar-plot to present result. The x-axis of bar-plot is predicted return and y-axis is real-return. If bar-plots of several set of parameter of a tickers is really good, we then switch strategy to out-of-sample data. As shown below, good bar-plot means predicted returns are highly correlated with real return.