Download Full Report: here

In the era of big data, due to the increased volume and complexity of data, traditional statistical learning tools have been shown to have limited performance in classification and feature extraction tasks in a range of domains, including computer vision, healthcare, language processing (LeCun et al., 2015). Recently, cutting-edge deep learning methods have demonstrated remarkable performance in processing image, speech, and DNA expression data by allowing the architecture to learn representations of the data with multiple levels of abstraction. However, as the amount and complexity of financial data have also been increasing, deep learning methods to predict stock returns and generate investment returns have not been well applied and researched. This introduction will provide a detailed explanation of the basics about deep learning and how deep learning could be applied to stock performance prediction and alpha generation.

1. Supervised learning and unsupervised learning

The most basic form of statistical learning, deep or not, is supervised learning and unsupervised learning. Supervised learning is a statistical learning task of learning a function that maps an input to an output using “labeled” data, while unsupervised learning inferring a function to describe hidden structure from "unlabeled" data.

An example of the difference of supervised and unsupervised learning is demonstrated in Figure 1 below. Under supervised learning setting, the features of the objects (corn, banana, apples, etc.), as well as the object labels (vegetable/fruits), are given. The machine-learning model is trained to classify the objects into different labeled groups. Under unsupervised learning setting, only the features of the objects are given and the labels of the objects are unknown. The machine-learning model is trained to separate the objects based on the similarities and differences of their features.

Figure 1.

Supervised learning is the most common form of deep learning. Most alpha generation tasks, such as predicting if the return of a stock will move up or down, are supervised learning tasks because stock return movements are labeled before model training.

2. Definition of Deep Learning

LeCun et al (2015) define deep learning methods as representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that transform a representation at one level to a representation at a higher, slightly more abstract level. In plain English, deep learning models are typically deeper and far more complex than traditional machine learning models to generate new data representations that perform well in prediction tasks.

Deep learning methods have brought about breakthroughs in image recognition (e.g. whether an image contains a cat or not) (Krizhevsky et al., 2012; Redmon et al., 2016), speech recognition (e.g. recognize spoken languages into texts) (Hinton et al., 2012), machine translation (e.g. automatically translation French text into English) (Cho et al., 2014), and human gene expression prediction (e.g. predict the effects of mutations in non-coding DNA on gene expression and diseases) (Leung et al., 2014; Xiong et al., 2015). An illustration of an efficient deep learning object detection algorithm that has been applied to build self-driving cars is shown in Figure 2.

Figure 2: An illustration of YOLO object detection algorithms: The YOLO algorithms developed by Redmon et al., 2016 uses a deep convolutional neural network with a combination of filters, anchor boxes and pooling layers to detect objects as well as their boundaries in an image.

Deep learning typically means using an Artificial Neural Network (ANN) architecture containing more than 3 hidden layers. The definition of ANNs will be given in the following section:

3. Introduction to Artificial Neural Networks (ANNs)

The main component of all deep learning algorithms is Artificial Neural Networks (ANNs). Understanding how ANNs are constructed and trained is the first step to understand deep learning methods. This section provides a concise review of ANNs, training ANNs, normalizing ANN inputs, regularize ANN parameters, as well as hyper-parameter tuning.

a. ANNs

Artificial Neural Networks (ANNs) are computing systems that inspired by the biological neural networks constituting brains. Typically, an ANN is a network of nodes with multilayers (one input layer, one output layer, and several hidden internal layers) (Figure 3, Figure 4). Each node can store a value and each edge can have a weight. The value of a node on a given layer, except for the first layer (i.e., the input layer), is a function of a bias and the weighted average values of all nodes on the previous layer. The function is called activation function. Usually, activation functions, such as Sigmoid, Rectified Linear Unit (ReLU) (Nair et al., 2010), and Hyperbolic Tangent, are non-linear.

Figure 4: A multiple-layer ANN: This multiple layer ANN has 1 input layer, 2 hidden layers, and 1 output layer, with each layer connected to the previous layer. An activation function ƒ is applied to each node on the hidden layer and the output layer.

b. Training the ANNs:

A training data set and a validation set, in which the values of the nodes in the output layers are known (e.g., 1 for positive outcome and 0 for negative outcome), are needed to estimate the optimal values of the bias and edge weights (i.e., to train the ANN). The idea is to find a set of biases and edge weights that minimize the difference between the true values and predicted values of nodes in the output layer. The difference is a function of the biases and edge weights and is usually called loss function. Once an ANN is trained, it can be used to predict the values of the nodes in the output layer for a testing set, in which the values of the nodes in the output layer are unknown yet.

Back Propagation is a way to compute the loss contribution of each node. The training starts with random or predetermined weights and uses forward propagation to make a prediction. A loss measures the difference between the predicted output value and the true output value. Since the predicted value is a function of biases and edge weights, the loss is also a function of biases and edge weights. Then backpropagation can be used to compute the slope (i.e., the change rate) of the loss function with respect to each edge weight. The next step of the training is to adjust the edge weights by a small amount according to the directions of the slope and run this process recursively until a global minimum loss is achieved. Because the slope of the loss function is often called “gradient” and the global minimum of loss is often achieved when the “gradient” is flat, as is shown in Figure 5, the method of training ANNs is called “Gradient Descent.”

Figure 5: Gradient Descent Training: Loss(w) is the loss function. In Gradient Descent optimization, learning rate represents by how much the edge weights are adjusted in each step before the global minimum is achieved.

c. Normalization

Normalization means adjusting input feature values measured on different scales to a notionally common scale. Most of the times, the input features of a deep learning model are measured on different scales so that the range of their values could vary by large. For example, a financial debtto-asset ratio is always between 0 and 1 while financial earnings such as EBITDA could be thousands or millions. Since it is not reasonable to feed features on different scales into the model directly, input feature normalization must be implemented before model training. Commonly used normalizations methods include standard score and feature scaling, which are shown below

D. Regularization

One of the common problems of deep learning problem is “large variance problem”. Sometimes feeding too much irrelevant and useless features into the model will lead to over-fitting, which means that the model could not generalized well on unseen testing data, causing the performance variance of the training set and testing set to be very large. Regularization is a solution to over-fitting or “large variance problem” by increasing training set error and reducing the variance of the training set and testing set. There is a range of regularization methods to choose for deep learning models. The most classic regularization method is to add the L-2 norm of the weights (sum square of all weights) to the loss function so that irrelevant weights value could shrink to a very small value. Dropout regularization randomly drops a group of nodes in each gradient descent iteration steps to prevent overfitting (Figure 6). In addition, early stopping regularization measures the error of both the training set and the validation set. The gradient descent training stops when the validation set error started to increase as the training set error keeps decreases (Figure 7).

Figure 6: An illustration to Dropout Regularization

Figure 7: An illustration to Earlystopping Regularization

E. Hyper-parameter tuning

One of the key question to construct a deep learning model is how to choose a range of hyperparameters, including the learning rate, the number of layers, and the number of nodes on each layer. The procedure of choosing these hyper-parameters is called hyper-parameters tuning. Grid search, a method to tune hyper-parameters, tests the model performance on a grid of parameters. Under this method, researchers choose a range of values for each hyper-parameter and the model is tested on each combination of these values to find the optimal one. Grid search is often applied to simpler deep learning models because training a large number of models could be too slow. As an alternative, random search is usually applied to train a large neural network. Under random search, a number of models with randomly selected combination of hyperparameters are tested to choose the optimal hyper-parameters (Figure 8).

Figure 8: An illustration to Grid Search and Random Search

4. Traditional methods in stock performance prediction

One of the key approach to derive alpha generation strategies is to predict the performance of the stocks based on various financial input features. A robust stock prediction model could efficiently extract features from historical data and thereby generate trading signals that may lead to excess financial returns.

There is a variety of data sources available that could be utilized to predict stock performance, which is listed below:

Fundamental Data: Financial data and financial ratios from company reported financial statements, such as Free Cash Flow, EBIT, Debt-to-Equity Ratio.

Index Data: Financial data of market indices, such as the return and volatility of S&P 500.

Financial time series data: A series of historical data of a specific stock, such as the 20- year daily return of Apple and the 10-year weekly trading volume of Facebook.

News and Events: The company or market news or events that lead to increased stock return volatility, such as the IPO of an aspiring technology company and the retirement of CEO.

Traditional machine learning methods are widely utilized to predict future stock up-down movements using fundamental data and index data of different frequencies. To classify return movements into the 0/1 (down/up) binary groups, logistic regression, supported vector machine, and random forest are commonly used. In addition, linear regressions could be applied to predict the value of future return using fundamental and index features while principal component analysis is utilized to reduce the dimensionality of the input features and increase model efficiency.

Furthermore, traditional time series models, such as ARIMA and GARCH, are utilized to model financial time series data. For example, the future daily return could be predicted by estimating ARIMA model parameters using 20-year historical daily return data and the volatility of a specific stock could be forecasted by fitting a GARCH (1,1) model.

In addition, researchers and analyst could also grasp useful predictive information of stock movements from market news and events, based on their industry experience. Also, some investment managers apply event-driven engines to generate excess returns based on in-depth analysis of the influence of news and events.

Although these traditional stock performance predictive methods are widely applied in the industry, there are some limitations of the traditional methods:

As the volume and complexity of financial data increase, naïve regression models and shallow machine learning models might not capture the complex connections between model inputs and outputs, especially the complex non-linearity connections.

Given the large data volume and the increased number of input features, traditional models are easy to be over-fitted and might not generalize well on new datasets.

Traditional methods may require plenty of hand-engineering to design new features and reconstruct the model input, which requires demanding industry experience and expertise.

The amount of information processed and analyzed by industry analyst might be limited. Since personal judgements could be objective and biased, it is often a non-trivial task for analyst to determine the appropriate alpha generating signals derived from market news and events.

5. Deep Learning Applications on Stock Performance Prediction

Due to the remarkable achievements of deep learning in other industries, applying deep learning methods to predict performance has received lots of attention recently. Because of the complexity and flexibility of deep learning architectures, they could potentially improve the limitations of traditional methods and could be good alternatives to traditional methods when the input data is large and complex. The reasons are listed below:

Activation functions of deep ANNs, such as ReLU, tanh, and sigmoid, could be applied to each layer to capture complex non-linearity connections between models input and outputs. Other complex feature connections could also be extracted by the deep architecture of ANNs.

Overfitting could be efficiently addressed by various powerful regularization techniques, such as Dropout and Earlystopping.

Deep learning methods extracts feature connections and construct new input representations automatically under Gradient Descent. This procedure typically requires no or little hand engineering (LeCun et al., 2015).

Deep learning architectures could efficiently capture the complexity in the difference between texts about market news and events, as well as grasp the stock market opinions on a large scale of inputs (Sohangir et al., 2018).

To build efficient and robust deep learning models to predict stock performance, researchers use not only naïve deep ANNs, but also a range of complex deep neural network architectures, such as Recurrent Neural Network, Long Short Term Memory Neural Networks, and Auto-encoders. A general introduction of how these architectures are applied to predict stock performance are provided below:

a. . A deep ANN with multiple hidden layers could be applied to predict the up-down movements of stock returns, taking fundamental data and index data as input features and taking the movement trend as the output. A series of normalization, regularization, and optimization techniques could be utilized to prevent overfitting and optimize model performance. For example, researchers could input 60 fundamental features into a deep ANN, normalize the input feature using standard score, randomly drops out 50% of the nodes each layer during training, tuning the learning rate and the number of layers through random search, and finally output the stock movement classification result on the test set. Figure 9 gives an illustration of applying deep ANNs to predict stock performance.

Figure 9: Using deep ANNs to predict stock performance

b. Auto-encoders could be applied to reduce the dimensionality of the input features and then the new representation of the features on the low-dimensional space could be fed into classifiers to predict stock movements. Researchers could choose from traditional machine learning classifiers, such as logistic regression and SVM, as well as adding a sigmoid or softmax unit to the reconstructed features to make a prediction. For example, the dimension of fundamental input with 500 features could be reduced to 50 by applying a denoising auto-encoder. Then, the 50- dimension features are passed to a SVM-RBF to make predictions. Figure 10 illustrates how autoencoder works to reduce dimensionality and predict stock performance.

Figure 10: Using auto-encoders to predict stock performance

c. Long-Short-Term-Memory(LSTM) networks could be applied to model stock return time series data under different frequencies and predict the values of future stock returns. Different from the Deep ANNs described in part a, which could only take features at a single time t as inputs, the LSTM architecture could take in a time series of input features ranging from time 0 to time t. For example, we could train a LSTM model using 10-year financial data including daily high/low price, trading volume, US-RMB exchange rate, and interest rate, and then make predictions on the stock returns over the next year. Figure 11 demonstrates the application of LSTM to model financial time series data.

Figure 11: Financial time series modeling using LSTM

d. LSTM, Doc2vec, and Convolutional Neural Network (CNN) could be applied to conduct sentiment analysis using online stock market opinions. For example, CNNs could be applied to extract the sentiment of authors regarding stocks from their words. It is believed that there are some people in the financial social network who can correctly predict the stock movements. By using CNN, the sentiment of their postings could be predicted and useful market signals could be captured (Sohangir et al.,2018). Figure 12 illustrates the application of LSTM, Doc2vec, and CNN to conduct sentiment analysis

Figure 12: Deep learning financial sentiment analysis

Diversified research on deep learning’s application in stock performance prediction, including Sohangir et al., 2018, Fischer et al.,2017, Alberg et al.,2017, and Bao et al.,2017 have shown that deep learning architectures significantly out-perform traditional statistical learning methods by constructing more robust feature representations and generating higher predicting accuracy.

6. The future and limitations of deep learning on stock performance prediction

Recently, deep reinforcement learning has received a lot attention in gaming and robotics (Arel et al.,2012; Culter et al.,2015). Park et al., 2013 developed a reinforcement learning algorithms to learn the price impact caused by financial trading activities and concluded that this algorithm could efficiently learn the dynamics of financial activities to generate profits. How to combine deep learning architecture with reinforcement learning ideas to derive alpha generation signals could be an interesting future research topic.

Though deep learning is generating fruitful results in predicting stock performance, there are also some limitations in the process of implementation and testing. For example, since fundamental data is always monthly and quarterly, sometimes there is not enough training data available for training a large deep learning architecture and thereby leads to under-fitting. In addition, because deep learning architecture performs like black-boxes (Snoek et al., 2012), sometimes it is difficult for people to interpret the meanings of the reconstructed features as well as how the architecture functions overall. Future research is still expected to address these problems and provide better model interpretation and implementation frameworks.

References

Alberg, John, and Zachary C. Lipton. "Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals." arXiv preprint arXiv:1711.04837 (2017).

Arel, Itamar. "Deep reinforcement learning as foundation for artificial general intelligence." Theoretical Foundations of Artificial General Intelligence (2012): 89-102.

Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series using stacked autoencoders and long-short term memory." PloS one 12, no. 7 (2017): e0180944.

Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

Cutler, Mark and How, Jonathan P. “Efficient Reinforcement Learning for Robots Using Informative Simulated Priors.” 2015 IEEE International Conference on Robotics and Automation (ICRA), May 26-30 2015, Seattle, Washington, Institute of Electrical and Electronics Engineers (IEEE), July 2015 © 2015 Institute of Electrical and Electronics Engineers (IEEE).

Fischer, Thomas, and Christopher Krauss. "Deep learning with long short-term memory networks for financial market predictions." European Journal of Operational Research (2017).

Hinton, Geoffrey, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal Processing Magazine 29, no. 6 (2012): 82-97.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." In Advances in neural information processing systems, pp. 1097-1105. 2012.

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521, no. 7553 (2015): 436.

Leung, Michael KK, Hui Yuan Xiong, Leo J. Lee, and Brendan J. Frey. "Deep learning of the tissue-regulated splicing code." Bioinformatics 30, no. 12 (2014): i121-i129.

Moghaddam, Amin Hedayati, Moein Hedayati Moghaddam, and Morteza Esfandyari. "Stock market index prediction using artificial neural network." Journal of Economics, Finance and Administrative Science 21, no. 41 (2016): 89-93.

Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807-814. 2010.

Park, Beomsoo, and Benjamin Van Roy. "Adaptive execution: Exploration and learning of price impact." Operations Research 63, no. 5 (2015): 1058-1076.

Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You only look once: Unified, real-time object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788. 2016.

Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. "Practical bayesian optimization of machine learning algorithms." In Advances in neural information processing systems, pp. 2951-2959. 2012.

Sohangir, Sahar, Dingding Wang, Anna Pomeranets, and Taghi M. Khoshgoftaar. "Big Data: Deep

Learning for financial sentiment analysis." Journal of Big Data 5, no. 1 (2018): 3.

Xiong, H.Y., Alipanahi, B., Lee, L.J., Bretschneider, H., Merico, D., Yuen, R.K., Hua, Y.,

Gueroussov, S., Najafabadi, H.S., Hughes, T.R. and Morris, Q., 2015. The human splicing code reveals new insights into the genetic determinants of disease. Science, 347(6218), p.1254806.

Important Disclaimer and Disclosure Information

Algo Depth makes no representations of any kind regarding this report. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the absence of errors, whether or not known or discoverable. In no event shall the author(s), Algo Depth or any of its officers, employees, or representatives, be liable to you on any legal theory (including, without limitation, negligence) or otherwise for any claims, losses, costs or damages of any kind, including direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages, arising out of the use of the report, including the information contained herein.

This report is prepared for informational and educational purposes only, and is not an offer to sell or the solicitation of an offer to buy any securities. The recipient is reminded that an investment in any security is subject to many risks, including the complete loss of capital, and other risks that this report does not contain. As always, past performance is no indication of future results. This report does not constitute any form of invitation or inducement by Algo Depth to engage in investment activity.

Algo Depth has not independently verified the information provided by the author(s) and provides no assurance to its accuracy, reliability, suitability, or completeness. Algo Depth may have opinions that materially differ from those discussed, and may have significant financial interest in the positions mentioned in the report.

This report may contain certain projections and statements regarding anticipated future performance of securities. These statements are subject to significant uncertainties that are not in our control and are subject to change.

Algo Depth makes no representations, express or implied, regarding the accuracy or completeness of this information, and the recipient accepts all risks in relying on this report for any purpose whatsoever. This report shall remain the property of Algo Depth and Algo Depth reserves the right to require the return of this report at any time.

###### Featured Posts

I'm busy working on my blog posts. Watch this space!