A reader on my blog (Thanks Kris!) suggested that I explore how much noise is needed to send Genotick off the deep end. You’ll recall from my earlier post on the subject that I was looking for hidden biases that Genotick might have, and explored how it responded to pure and noisy sine waves of data.
For those just catching up, Genotick is a free, open-source machine-learning price-prediction application created by Lukasz Wojtow. You can read more about it here. It’s still in beta but it’s a very interesting concept.
First, I generated some sinusoidal data in Google Sheets, and started over with a new sine wave period. I created the sine wave like this:
Column A was simply a counter. It went from 1 to 2000.
Column B contained:
This uses the counter in column A to advance the sine wave through time. The sine wave moves between -1 and 1.
Column C contained:
I increase the amplitude by multiplying row B by 3, then offsetting it by 30 to make it more like a real stock price. Cell E1 contained the noise multiplier, which adjusts the noise source (random number generator creating numbers between 0 and 1). With a noise multiplier of 0, no noise is added and the sine wave is pure.
The sine wave has an amplitude of 6 peak to peak, so divide the noise multiplier by 6 to determine the ratio of noise to signal.
My spreadsheet looks like this:
I had Genotick do ten runs every time I tested a new noise multiplier. This was a lot of work! Since I didn’t know where the noise might start interfering, I started with wide-ranging values: 0, .001, .01, .1, 1 and 10.
Here are the results of that first toe-dip into the statistical water:
As you can see, a value of 10 killed Genotick dead! The values between 0 and .1 all show a worst-case scenario of at least 10^7 % profit. A noise multiplier value of “1” shows a worst-performer return of ‘only’ 10^5 %. But the spread of the returns is getting wider at that point as well, which that tells me the noise is starting to effect the results.
Then I focused in on the range between 1 and 10. It soon became apparent that by n=3, the noise was taking over (this would be a 50% noise ratio). So how about we take a closer look at the 1-3 range?
The last chart, noise=2.0, is linear scale because the results were bad enough log scaling became meaningless. With each noise-multiplier increase of .2, the worst and best cases got lower and lower. By the time we get to a value of 1.8, we see our first loss-making run. While I can’t say there’s a specific threshold of noise that causes Genotick to become confused, I can say that it’s definitely in the 1 to 2 range (or roughly 16-33% range).
Earlier I had done some experiments with filtering actual price data using a simple short moving average. This is like putting the price data through a lowpass filter for you audio types, and removes some of the high frequency noise. Some initial tests on price data that had previously proven stubborn showed a big improvement when filtered. The only problem is, you can’t buy and sell a moving average! You have to buy and sell on the actual price.
Wouldn’t it be nice if Genotick could be set to ignore the price data for its learning inputs, and only look at other columns of data such as filtered price information? It would still use actual price data for the trade, but it would ignore it for the trading algorithm. I’ve put in a request for that feature.
Meanwhile, there may be something of a workaround. That’s my next step: feeding Genotick filtered data to see if it can successfully ignore the noise.