To predict stocks, there are a variety of methods that one can use, from simple regressions to the more complex ML/AI methods. In this article, we will be focusing on coding the famous Monte Carlo simulation and finding the best fit graph to give us our future prediction.
The Code Most of the equations and theory behind this code can be found on Investopedia’s page regarding the Monte Carlo simulation. Since we are just going to focus on the code, I’ll be leaving the link below for review.
And now, we will be moving directly into the code, and as always, we start off by importing the packages we will be using.
import numpy as np #To work with arrays from datetime import datetime #To work with our stock data import pandas_datareader as pdr #Collects stock data from scipy.stats import norm #For our equation Creating variables that we will use further on.
days_to_test = 30 #Days for our best fit test days_to_predict = 1 #How many days in the future we want to go simulations = 1000 #How many simulations to run ticker = 'BAC' #Our stock ticker name
For the ticker variable, this is where we can change to our preferred stock or even our preferred crypto. Since we will be using Yahoo’s data, we need to set our ticker variable using Yahoo’s naming method, for example, Bitcoin would be ‘BTC-USD’ instead of just BTC. To get this data, we just use the PDR package that we imported like so.
data = pdr.get_data_yahoo([ticker], start=datetime(1990, 1, 1), end=datetime.today().strftime('%Y-%m-%d'))['Close'] For the start date, I just placed a random time to start capturing data and if the stock was made after that date, there’s no need to worry, the package automatically retrieves the correct starting date. For the end date, we just use that line of code to get the current date, and strftime formats it to the input required by the PDR package.
Before we move on, if you are not familiar with the Monte Carlo equations, please do give the link above a click since the following equations will be created by using that resource.
daily_return = np.log(1 + data[ticker].pct_change()) average_daily_return = daily_return.mean() variance = daily_return.var() drift = average_daily_return - (variance/2) standard_deviation = daily_return.std()
Now, a small explanation. The daily_return variable gives us the return using a current price and its previous price. Using pct_change() helps us out with that calculation but we have to be careful because it returns negative values. Since we can’t have negative values for our log, we just go ahead and add that percentage change to one to get our positive percentage change. From here, the other variables are a bit self-explanatory since they are calculated using built-in functions. With the equations set, now we need to create an array that will house our predictions.
predictions = np.zeros(days_to_test+days_to_predict) predictions[0] = data[ticker][-days_to_test] pred_collection = np.ndarray(shape=(simulations,days_to_test+days_to_predict)) For predictions, this sets up an array filled with zeros, with a size of whatever amount of days we are going to use to test plus how many days in the future we wish to predict. The next line assigns the first value of the predictions array, with the last price of our data set, after we removed the days we wish to test. Finally, the last line creates a multi-dimensional array where we will store our predictions and put it to use in the loop that will calculate our prediction, as shown below.
for j in range(0,simulations): for i in range(1,days_to_test+days_to_predict): random_value = standard_deviation * norm.ppf(np.random.rand()) predictions[i] = predictions[i-1] * np.exp(drift + random_value) pred_collection[j] = predictions
This double for loop is what predicts our values. The i loop is the one that uses the equation from Investopedia, to create our predictions. From this loop, we get the same amount of predictions as the number of days we removed from the original data set, plus how many days we want to see in the future. In this case, from the original data set, I removed 30 days and what to predict one day in the future, for a total of 31 days. The reason why I went with 30 days is so that we can compare those 30 day predictions to 30 days of actual data that we currently have and use it to create the best fit line to give us our future prediction. The j loop repeats this process for as many simulations as we want. For a more visual representation, here’s a small snippet.

The bright red line is the actual movement from the past 30 days for our stock. Every other line that we see are predictions given by our code. The plan is to grab all the simulations that we did and compare them to the original stock movement and find the one that closely resembles the original and to this, we use the following code.
differences = np.array([]) for k in range(0,simulations): difference_arrays = np.subtract(data[ticker].values[-30:],pred_collection[k][:-1]) difference_values = np.sum(np.abs(difference_arrays)) differences = np.append(differences,difference_values)
best_fit = np.argmin(differences) future_price = pred_collection[best_fit][-1]
First, we create an empty array in which we will fill with values that represent our closest fit. We do this by subtracting the prediction arrays from the last 30 days in our original data set. Once we have these difference values, we add them all up, using the absolute values, and store these values in the empty array we created at first. Once we fill up the array, we then search for the smallest difference value, which theoretically should represent our closest fit, by using the np.argmin command. Note that np.argmin returns the index from our differences array, so whatever value we get from there, we just use our prediction collection to get the array that had the closest fit. Lastly, the last value of that array will give us our prediction, in this case, for one day in the future. Visually, this is how it looks like.

From our 1000 simulations, this graph was the one that produced the closest fit to our original graph, using the last 30 days as a reference. But if we go back and remember, our prediction arrays are 31 days of predictions, and since this graph produced the best results, then technically that 31st value should be our best prediction for one day in the future, out of all our simulations. One thing we do have to remember is this: since we used Numpy’s random values, every time we run this code, we will never get the same results, graphs nor predictions.
Notes For almost two months, I was running this code using 5 million simulations and comparing the results. To my surprise, this code was having a good run! For a good chunk of predictions, the code would only be around 10 cents off from the actual price. Furthermore, the majority of the time it was correct in predicting whether it was going to be a bull or bear run, which was surprising. However, the code would perform really bad when it was a Friday to Monday prediction, usually off by 60 cents and at worst, a whole dollar. Not only that but this code does take a while to run, at least on my old laptop, the more simulations we attempt. And it did not produce favorable results consistently when I would use crypto instead of stocks. Finally, during those two months, I feel that the stock market had a calm moment, which will not accurately represent our results, so during a volatile season, we might get unfavorable predictions.
Some suggestions for those who wish to continue with this project. I did see some interesting results when I would run this code twice, one for Low price and one for High price. This gave a good range for automation since we can create a bot that can day trade by taking into consideration if the Low price or High price has passed. For example, if the stock gets close to the High price prediction first, we can short. If the Low price prediction happens first, we can buy.
And there you have it! I hope you enjoyed this article and it sparked some interest in the Monte Carlo method and gave you some ideas to test out!