Download Full Report: Here

A Machine Learning Technique For Predicting Stock Returns

In this experiment we implement the Support Vector Machine (SVM) method for stock prediction as the research paper “Predicting direction of stock price index movement using artificial neural networks and support vector machines- The sample of the Istanbul Stock Exchange”.

1. Features

We use ten mathematical formulas for our features, as in the paper:

We eliminated the 9th feature “Accumulation/Distribution Oscillator” because some values were infinity.

2. Parameter tuning

In order to choose a good parameter set, we used the same method as the paper. First we conducted a preliminary experiment by randomly choosing 20% from the entire dataset. We further divided this parameter setting data in two equal-sized training (10% of the entire) set.

There are four parameters we need to tune in SVM: kernel ("poly", "rbf", "linear), C (regularization parameter, [1, 10, 100, 1000]) γ (gamma in kernel function, [1, 2, 3, 4, 5]) d (degree of kernel function, [1,2,3}).

To choose a good parameter set, we conducted a 5-fold cross validation on the training set using all the parameter combinations, and obtained the best parameters. Then, we tested the holdout set by using the model with best parameters.

3. Model training and prediction

After obtaining the best parameters, we re-trained the model using new training sets. Specifically, for each year, we used the first 90% data for training and the rest 10% for testing.

4. Experiment Results

In the paper, the “poly” and “rbf” kernels are employed. Through the experiments, we found that these two kernels resulted in very severe over-fitting problem. That is, the accuracy for the training sets is usually 100%, but for testing sets can be <50% for many years. This result actually makes sense, since using ~200 training data and ~10 features is very easy to overfit if using “poly” or “rbf” kernels. Therefore, we suspect the results obtained by the paper.

According to our experience, “linear” kernel is usually more robust than “poly” and “rbf”. For “linear” kernel, the accuracy on training set might be lower, but the accuracy on testing set is usually similar as the training set. Additionally, we can use “l1” or “l2” regularization in linear-kernel SVM, which can select feature by setting the coefficients of some features to be approximately 0.

Using “linear” kernel, we first tuned the parameter by using the parameter selection set

(20% of entire data). The best accuracy for training data is 51.3%, and for the holdout data is 51.8%. The best parameters are: 'penalty': 'l2', 'C': 10. Then, we used the best parameters to re-train models for each year. Table 2 shows the prediction accuracy for the training and testing sets for each year. The average prediction accuracy for the training set is 58.09% ±3.43%, and that for the testing set is 51.13% ±7.80%.

Feature Significance Study

In this study, we used the same model and features as in the previous study “SVM for Stock Price Prediction”. Specifically, we used linear SVM with L1-norm regularization. The features include SMA10, WMA10, Momentum, Stochastic K%, Stochastic D%, RSI, MACD, William R%, CCI.

The method we used to measure “feature significance” is the coefficients/weighting corresponding to each feature. In order to make a fair comparison, we standardized each feature to be z distribution (~N(0,1)), so that the coefficients assigned to each feature can represent the significance of the feature.

1. Plot of the coefficients vs. year

Fig. 1 shows the coefficient of each feature versus year, where the x-axis is year (1981- 2011), and y-axis is the corresponding coefficient. Fig. 2 displays the results of all the features in one figure.

Fig. 1 Plot of feature significance (coefficient of feature) vs. year. Row 1: SMA10, WMA10, Momentum; Row 2: Stochastic K%, Stochastic D%, RSI; Row 3: MACD, William R%, CCI, where x-axis is year (1981-2011), and y-axis is feature significance.

Fig. 2 Plot of feature significance (coefficient of feature) vs. year, where all the features are displayed in one figure. x-axis is year (1981-2011), and y-axis is the feature significance.

2. Plot of the absolute coefficients vs. year

The above figures seem a little confusing, since the coefficient sometimes is positive and sometimes is negative. The absolute value can better characterize the significance of the features. Therefore we re-plot the figures by using absolute value of the coefficient to represent the feature significance, illustrated in fig. 3 and 4.

Fig. 3 Plot of feature significance (absolute value of coefficient) vs. year. Row 1: SMA10, WMA10, Momentum; Row 2: Stochastic K%, Stochastic D%, RSI; Row 3: MACD, William R%, CCI, where x-axis is year (1981-2011), and y-axis is the feature significance.

Fig. 4 Plot of feature significance (absolute coefficient before feature) vs. year, where all the features are displayed in one figure. x-axis is year (1981-2011), y-axis is the feature significance.

3. Correlation between the coefficient and the annual return

The result might be related to the annual return, so we plotted out the annual return figure (fig. 5), where the x-axis is year and the y-axis is the change rate of mean close price per year. We can see that the feature significance curves are highly correlated with the annual return curve.

Fig. 5 Annual return plot. x-axis is year and y-axis is the change rate of mean close price per year.

Influence of Time Lag on Model Accuracy

In this study, we used the same model and features as in the previous study “SVM for Stock Price Prediction”. Specifically, we used linear SVM with L1-norm regularization. The features include SMA10, WMA10, Momentum, Stochastic K%, Stochastic D%, RSI, MACD, William R%, CCI.

1. p_{t+i}-p_t

Fig. 1 shows the prediction result of p_{t+i}-p_{t}, where p is the daily close price, i is the time lag, e.g. 1,2...30 days. The x axis is the time lag and y axis is the mean training accuracy and testing accuracy over 31 years (1981~2011). The error bar shows the standard deviation of the mean accuracy, which is calculated by σ/sqrt(N), where σ is the standard deviation of the samples, and N is the number of samples which is 31 in this case.

Fig. 1 Mean training and testing accuracy vs. time lag, where the dependent variable is daily close price.

From the training accuracy, we can see that as time lag becomes larger, the mean training accuracy increases, while the variation of the mean also increases. This makes sense, since the daily change of the close price is very random due to the existence of noise, thus it is very difficult to predict the change direction of close price next day. As time lag becomes larger, the influence of the noise on the price becomes smaller, therefore we are able to get a higher training accuracy. In the mean while, the uncertainly of prediction also increases.

For the testing accuracy, however, the mean accuracy relatively remains the same around ~55% (it might increases a little bit, but not obvious). This indicates that over-fitting still exists in the model. Again, as the time lag becomes larger, the variation/uncertainly of the prediction increases.

2. SMA3_{t+3}-SMA3_t

In addition to the original close price, we also studied the influence of time lag on the predication accuracy of smoothed close price (3-day simple moving average), illustrated in Fig. 2.

For the training data, we can see that the model accuracy first decreases, reaches minimum at around 3~5, then increases. This makes sense, since it is relatively easy to predict SMA3_{t+1}-SMA3_t and SMA3_{t+2}-SMA3_t. To understand this, we can expand the expression and obtain that SMA3_{t+1}-SMA3_t = (p_{t+1}-p{t-2})/3, and SMA3_{t+2}- SMA3_t = (p_{t+2}+ p_{t+1}- p_{t-1}-p{t-2})/3, the prediction of which are less informational.

For testing accuracy, we can see that as time lag increases, the accuracy gradually decrease and approaches ~55%, which is similar to the result in Fig. 1. This makes sense, since as time becomes larger, the influence of noise becomes smaller, therefore the smoothing will not help too much for improving the accuracy. Again, the variation of mean accuracy estimation becomes larger as time lag increases, since there will be more uncertainly in the far future.

Fig. 2 Mean training and testing accuracy vs. time lag, where the dependent variable is the smoothed close price (specifically, 3-day simple moving average)

Specifically, when time lag = 3, SMA3_{t+3}-SMA3_t has a mean training accuracy of 70.73% ± 0.59% and mean testing accuracy of 63.06% ± 1.88%.

Updated Process

We updated the model by using moving window as long as fix the model parameters. For the model parameters, previously we select the parameter using a random sample from the whole data set. This time we fix it to be: C=1, penalty="l1", dual=False. The reason why we fix the parameters but not select them by cross validation is because we find the parameter doesn’t affect the final result too much.

Implementing a Moving Window

Previously for each year, we used the first 90% for training and the rest 10% for testing, which result is shown in Fig. 1 (left). The limitation is that only the last 10% data for each year is tested. To improve that, we used a moving window so that every data can be tested, illustrated in Fig. 1 (right). By comparing these two figures, we can see that for smaller time lag, e.g. 1~5 days, the results of the right figure is not as good as the left figure. But as the time lag increases, we can see a clear trend that the testing accuracy is increasing.

Fig. 1 Left: only test the last 10% days in each year; Right: using a moving window so that each day is tested. The number of training data we used is 225, which is 90% of the total trading days each year.

Effect of Training Data Number

Fig. 2 Effect of training data number.

We studied the effect of the number of training data. Fig. 2 shows the results of training accuracy and testing accuracy by using different training data number. We can see that as there are more training data, the training accuracy decreases (overfitting is reduced), but the testing accuracy is not necessarily improved.

Effect of Model Update Frequency

Fig. 3 Effect of model update frequency. The number of training data we used is 250

We also studied the effect of how frequent the model is updated. Fig. 3 shows the comparison of updating the model per month vs. per week. It seems that updating the model more frequently helps to improve the testing accuracy.

###### Featured Posts

I'm busy working on my blog posts. Watch this space!