Download Full Report: Here

A Machine Learning Technique For Predicting Stock Returns

In this experiment we implement the Support Vector Machine (SVM) method for stock prediction as the research paper “Predicting direction of stock price index movement using artificial neural networks and support vector machines- The sample of the Istanbul Stock Exchange”.

1. Features

We use ten mathematical formulas for our features, as in the paper:

We eliminated the 9th feature “Accumulation/Distribution Oscillator” because some values were infinity.

2. Parameter tuning

In order to choose a good parameter set, we used the same method as the paper. First we conducted a preliminary experiment by randomly choosing 20% from the entire dataset. We further divided this parameter setting data in two equal-sized training (10% of the entire) set.

There are four parameters we need to tune in SVM: kernel ("poly", "rbf", "linear), C (regularization parameter, [1, 10, 100, 1000]) γ (gamma in kernel function, [1, 2, 3, 4, 5]) d (degree of kernel function, [1,2,3}).

To choose a good parameter set, we conducted a 5-fold cross validation on the training set using all the parameter combinations, and obtained the best parameters. Then, we tested the holdout set by using the model with best parameters.

3. Model training and prediction

After obtaining the best parameters, we re-trained the model using new training sets. Specifically, for each year, we used the first 90% data for training and the rest 10% for testing.

4. Experiment Results

In the paper, the “poly” and “rbf” kernels are employed. Through the experiments, we found that these two kernels resulted in very severe over-fitting problem. That is, the accuracy for the training sets is usually 100%, but for testing sets can be <50% for many years. This result actually makes sense, since using ~200 training data and ~10 features is very easy to overfit if using “poly” or “rbf” kernels. Therefore, we suspect the results obtained by the paper.

According to our experience, “linear” kernel is usually more robust than “poly” and “rbf”. For “linear” kernel, the accuracy on training set might be lower, but the accuracy on testing set is usually similar as the training set. Additionally, we can use “l1” or “l2” regularization in linear-kernel SVM, which can select feature by setting the coefficients of some features to be approximately 0.

Using “linear” kernel, we first tuned the parameter by using the parameter selection set

(20% of entire data). The best accuracy for training data is 51.3%, and for the holdout data is 51.8%. The best parameters are: 'penalty': 'l2', 'C': 10. Then, we used the best parameters to re-train models for each year. Table 2 shows the prediction accuracy for the training and testing sets for each year. The average prediction accuracy for the training set is 58.09% ±3.43%, and that for the testing set is 51.13% ±7.80%.

Feature Significance Study

In this study, we used the same model and features as in the previous study “SVM for Stock Price Prediction”. Specifically, we used linear SVM with L1-norm regularization. The features include SMA10, WMA10, Momentum, Stochastic K%, Stochastic D%, RSI, MACD, William R%, CCI.

The method we used to measure “feature significance” is the coefficients/weighting corresponding to each feature. In order to make a fair comparison, we standardized each feature to be z distribution (~N(0,1)), so that the coefficients assigned to each feature can represent the significance of the feature.

1. Plot of the coefficients vs. year

Fig. 1 shows the coefficient of each feature versus year, where the x-axis is year (1981- 2011), and y-axis is the corresponding coefficient. Fig. 2 displays the results of all the features in one figure.

Fig. 1 Plot of feature significance (coefficient of feature) vs. year. Row 1: SMA10, WMA10, Momentum; Row 2: Stochastic K%, Stochastic D%, RSI; Row 3: MACD, William R%, CCI, where x-axis is year (1981-2011), and y-axis is feature significance.

Fig. 2 Plot of feature significance (coefficient of feature) vs. year, where all the features are displayed in one figure. x-axis is year (1981-2011), and y-axis is the feature significance.

2. Plot of the absolute coefficients vs. year

The above figures seem a little confusing, since the coefficient sometimes is positive and sometimes is negative. The absolute value can better characterize the significance of the features. Therefore we re-plot the figures by using absolute value of the coefficient to represent the feature significance, illustrated in fig. 3 and 4.

Fig. 3 Plot of feature significance (absolute value of coefficient) vs. year. Row 1: SMA10, WMA10, Momentum; Row 2: Stochastic K%, Stochastic D%, RSI; Row 3: MACD, William R%, CCI, where x-axis is year (1981-2011), and y-axis is the feature significance.

Fig. 4 Plot of feature significance (absolute coefficient before feature) vs. year, where all the features are displayed in one figure. x-axis is year (1981-2011), y-axis is the feature significance.

3. Correlation between the coefficient and the annual return

The result might be related to the annual return, so we plotted out the annual return figure (fig. 5), where the x-axis is year and the y-axis is the change rate of mean close price per year. We can see that the feature significance curves are highly correlated with the annual return curve.

Fig. 5 Annual return plot. x-axis is year and y-axis is the change rate of mean close price per year.

Influence of Time Lag on Model Accuracy

In this study, we used the same model and features as in the previous study “SVM for Stock Price Prediction”. Specifically, we used linear SVM with L1-norm regularization. The features include SMA10, WMA10, Momentum, Stochastic K%, Stochastic D%, RSI, MACD, William R%, CCI.

1. p_{t+i}-p_t

Fig. 1 shows the prediction result of p_{t+i}-p_{t}, where p is the daily close price, i is the time lag, e.g. 1,2...30 days. The x axis is the time lag and y axis is the mean training accuracy and testing accuracy over 31 years (1981~2011). The error bar shows the standard deviation of the mean accuracy, which is calculated by σ/sqrt(N), where σ is the standard deviation of the samples, and N is the number of samples which is 31 in this case.