I have been playing around with Genotick some more, the open-source genetic learning trading software by Lukasz Wojtow. One thing that has been puzzling me is that the software seems to do well on certain types of data, but not others. And I’m having trouble identifying what sort of data it’s “good at”.

At first I thought Genotick might have a ‘long’ bias. It does smashingly well on the S&P 500 for example. The curve looks like a leveraged version of the S&P, but in fact varies at the micro-level.

However it turns out not to have a long bias. To explain:

One method of removing bad ‘programs’ (the many small sets of instructions that make predictions and are either promoted or removed based on performance) is to require they be symmetrical. The program should give the opposite prediction if the data is inverted, otherwise it’s nonfunctional. With this feature turned on, Genotick requires an inverted set of data alongside the real data, to compare.

However if you switch the names on the sets of data and say your inverted data is really your real data, Genotick still works properly. So therefore it doesn’t have a long bias.

Then I wondered if perhaps it had a ‘trend’ bias and didn’t respond well to mean reversion. Lukasz put together some synthetic data, one that had a trending bias and the other that had a mean-reverting bias (data and discussion here). Genotick did a little better with the trending data, but both yielded profits. Hmm, strange. Below are my runs using Lukasz’ synthetic data sets.

I then asked myself: what’s the ultimate in mean reversion? A cyclical waveform such as a sine wave! What if I created a data set that was a sine wave, dirtied it up a little with some randomness, and tested Genotick on that? Surely Genotick would sort that out, right?

And it does. Sort of.

Here you can see seven different runs, superimposed with the ‘dirty sine wave’ data in gray. The left vertical axis is the gain/loss percentage, and the right vertical axis is the synthetic data’s price. One run used the default settings, and the other six used randomized settings.

What I find odd is that, of the five runs that became obviously profitable, all of them took roughly half the data (1000 bars/days) to get going. And all but the dark-pink one reached maximum about 2/3 of the way, followed by drawdown. Why would that be? The data is essentially the same thing repeated, but with slight random variations. With random parameter selection and random programs being generated, this repeated behavior seems improbable.

I looked at a clean sine wave and got these results from two runs. The returns are so good I had to set the graph to log scale (and google sheets is a little buggy in log-scale mode, hence the weird spots at the beginning).

Genotick appears to have trouble with the noise in the dirty sine wave data, even though we humans could look at the graph and predict what would happen next. That gives me some ideas to check further.

If you’d like to take a look at the synthetic source data and the Genotick returns, you can view them here as a google sheet.

Without delving too deeply into the data or the statistical learning method used by Genotick, what you’ve found looks to me like a classic case of fitting a model to the noise rather than the underlying signal.

I’d be curious to know how much noise you added relative to the amplitude of the sine wave? Put another way, what sort of signal to noise ratio does your data possess? And more interestingly, is there a threshold for this ratio at which the algorithm starts to perform well?

My own research on machine learning prediction in finance leads me to believe that the success of that approach hinges largely on the data used to construct the model. A significant amount of feature engineering is required. I’ve concluded that its less about the type of machine learning algorithm deployed (although this does matter too) and more about the processed data that is passed to that algorithm. Of course, in the presence of noise, machine learning algorithms tend to be almost too powerful, with a tendency towards modelling the noise component. However, I think that this can be accounted for with appropriate feature engineering. I’m curious to know if you have explored any feature engineering in your investigations into Genotick?

Thanks for sharing your findings.

I’m not sure how much noise I was adding, because I went through a few iterations coming up with ‘reasonable’ noise. I should have saved the intermediate files. đź™‚

That’s a good idea. I should look at different levels of noise and see where it breaks down.

It makes sense. The uncertainty is in the duration for algorithm to converge to new market mode. The edge is in selection of the initial state of active systems in Genotick. You can see that by dirty sine case (long time to converge). This is also inline with the thesis that market does not follow a system over long time. Sooner or later any “fixed” trading system stops working. So trading system needs to adapt.

Thanks for your comment!

“The edge is in selection of the initial state of active systems in Genotick. You can see that by dirty sine case (long time to converge). This is also inline with the thesis that market does not follow a system over long time. Sooner or later any â€śfixedâ€ť trading system stops working. So trading system needs to adapt.”

I think in his paper the author shows that the initial state or at least the input parameters cease to become relevant and that from halfway through a test, the results are very similar regardless of the initial state?

I’m working on the assumption that Genotick is NOT a “fixed” trading system but an adaptive one. Is that assumption correct?