That graph looks like a bunch of spaghetti, I realize. But I’ll explain!
I’m back testing Genotick. It’s an open-source machine-learning java script. Probably to the developer’s dismay, I’m always throwing things at it to make it break. Much like a small child throwing a temper tantrum, but with stocks.
A reader on my blog (Thanks Kris!) suggested that I explore how much noise is needed to send Genotick off the deep end. You’ll recall from my earlier post on the subject that I was looking for hidden biases that Genotick might have, and explored how it responded to pure and noisy sine waves of data.
I have been playing around with Genotick some more, the open-source genetic learning trading software by Lukasz Wojtow. One thing that has been puzzling me is that the software seems to do well on certain types of data, but not others. And I’m having trouble identifying what sort of data it’s “good at”.
At first I thought Genotick might have a ‘long’ bias. It does smashingly well on the S&P 500 for example. The curve looks like a leveraged version of the S&P, but in fact varies at the micro-level.
However it turns out not to have a long bias. To explain:
One method of removing bad ‘programs’ (the many small sets of instructions that make predictions and are either promoted or removed based on performance) is to require they be symmetrical. The program should give the opposite prediction if the data is inverted, otherwise it’s nonfunctional. With this feature turned on, Genotick requires an inverted set of data alongside the real data, to compare.
However if you switch the names on the sets of data and say your inverted data is really your real data, Genotick still works properly. So therefore it doesn’t have a long bias.
Then I wondered if perhaps it had a ‘trend’ bias and didn’t respond well to mean reversion. Lukasz put together some synthetic data, one that had a trending bias and the other that had a mean-reverting bias (data and discussion here). Genotick did a little better with the trending data, but both yielded profits. Hmm, strange. Below are my runs using Lukasz’ synthetic data sets.
I then asked myself: what’s the ultimate in mean reversion? A cyclical waveform such as a sine wave! What if I created a data set that was a sine wave, dirtied it up a little with some randomness, and tested Genotick on that? Surely Genotick would sort that out, right?
And it does. Sort of.
Here you can see seven different runs, superimposed with the ‘dirty sine wave’ data in gray. The left vertical axis is the gain/loss percentage, and the right vertical axis is the synthetic data’s price. One run used the default settings, and the other six used randomized settings.
What I find odd is that, of the five runs that became obviously profitable, all of them took roughly half the data (1000 bars/days) to get going. And all but the dark-pink one reached maximum about 2/3 of the way, followed by drawdown. Why would that be? The data is essentially the same thing repeated, but with slight random variations. With random parameter selection and random programs being generated, this repeated behavior seems improbable.
I looked at a clean sine wave and got these results from two runs. The returns are so good I had to set the graph to log scale (and google sheets is a little buggy in log-scale mode, hence the weird spots at the beginning).
Genotick appears to have trouble with the noise in the dirty sine wave data, even though we humans could look at the graph and predict what would happen next. That gives me some ideas to check further.
If you’d like to take a look at the synthetic source data and the Genotick returns, you can view them here as a google sheet.
I’ve recently been experimenting with Genotick, which is open-source java software that attempts to discover mechanical trading systems through the use of machine learning. You can run it on just about any Mac/Windows/Linux system (although you may have additional hurdles to get java8 working at the command-line level on a Mac). Thousands of tiny programs create random rules to predict the next day’s market move. The ones that have a better success rate are kept, and the ones that suck are booted to the curb. Every day the ‘robots’ all take a vote for up or down, and the majority wins. The process repeats each day, and the good ones evolve and the bad ones die out. Every time you do a new run, the robots evolve differently and you get different results.
Because it’s open source software (and is still at the very early stage of development), there’s a fair amount of heavy lifting on the user’s part to get this to work. I’m still wrapping my head around the Linux command-line interface and grepping the data I need out of the reports the software generates. Serves me right for just clicking on icons all these years I suppose. No matter, I’ve been able to get some results out of it. Not results I would trade, mind you, but this is more about the intellectual exercise at this point. The software does hold promise though!
The lead image shows an equity curve of IBM stock from 2004-2006 inclusive, vs. buy-and-hold of the same stock. No commissions deducted (which would be substantial, since this is a daily trade), and the total account is invested each trade. I fed Genotick the open/high/low/close/volume (OHLCV) data for each period and let it go to work. About an hour later, I had results.
As you can see, where owning IBM through this period would have left your account roughly where you started, the Genotick system would have been up about 18%. The bad news is that it would have been in drawdown from over 40%. It appears as if the software was no longer able to exploit an ‘edge’ after about the half way point.
The funny thing about human nature is that if you don’t know why something like this works, it would be very difficult to keep trading when the results started to go poorly. Would Genotick have turned things around? No idea. One part of any trading plan using (a future version of) Genotick would be a method to check whether your trading system was still working.
As a simple measure of ‘system health’ I tried computing a 10-day moving average of the system’s ‘hit rate’. It varies quite a bit, but there’s a definite downward trend in the success rate over the run.
The next run could have a completely different equity curve, and a correspondingly different hit-rate profile. As I move forward, I’ll be comparing multiple runs and incorporating more data besides OHLCV data for the ticker. But it’s a fun first step.