An Introduction to Deep Learning on Stock Performance Prediction

June 27, 2019

Download Full Report: here 


In the era of big data, due to the increased volume and complexity of data, traditional statistical learning tools have been shown to have limited performance in classification and feature extraction tasks in a range of domains, including computer vision, healthcare, language processing (LeCun et al., 2015). Recently, cutting-edge deep learning methods have demonstrated remarkable performance in processing image, speech, and DNA expression data by allowing the architecture to learn representations of the data with multiple levels of abstraction. However, as the amount and complexity of financial data have also been increasing, deep learning methods to predict stock returns and generate investment returns have not been well applied and researched. This introduction will provide a detailed explanation of the basics about deep learning and how deep learning could be applied to stock performance prediction and alpha generation.


1. Supervised learning and unsupervised learning


The most basic form of statistical learning, deep or not, is supervised learning and unsupervised learning. Supervised learning is a statistical learning task of learning a function that maps an input to an output using “labeled” data, while unsupervised learning inferring a function to describe hidden structure from "unlabeled" data.


An example of the difference of supervised and unsupervised learning is demonstrated in Figure 1 below. Under supervised learning setting, the features of the objects (corn, banana, apples, etc.), as well as the object labels (vegetable/fruits), are given. The machine-learning model is trained to classify the objects into different labeled groups. Under unsupervised learning setting, only the features of the objects are given and the labels of the objects are unknown. The machine-learning model is trained to separate the objects based on the similarities and differences of their features.


Figure 1.


Supervised learning is the most common form of deep learning. Most alpha generation tasks, such as predicting if the return of a stock will move up or down, are supervised learning tasks because stock return movements are labeled before model training.


2. Definition of Deep Learning


LeCun et al (2015) define deep learning methods as representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that transform a representation at one level to a representation at a higher, slightly more abstract level. In plain English, deep learning models are typically deeper and far more complex than traditional machine learning models to generate new data representations that perform well in prediction tasks.


Deep learning methods have brought about breakthroughs in image recognition (e.g. whether an image contains a cat or not) (Krizhevsky et al., 2012;