Genotick and the Dirty Sine (Machine Learning)

Screen Shot 2016-01-05 at 1.22.02 PM

I have been playing around with Genotick some more, the open-source genetic learning trading software by Lukasz Wojtow. One thing that has been puzzling me is that the software seems to do well on certain types of data, but not others. And I’m having trouble identifying what sort of data it’s “good at”.

At first I thought Genotick might have a ‘long’ bias. It does smashingly well on the S&P 500 for example. The curve looks like a leveraged version of the S&P, but in fact varies at the micro-level.

Screen Shot 2015-12-10 at 5.15.48 PM

However it turns out not to have a long bias. To explain:

One method of removing bad ‘programs’ (the many small sets of instructions that make predictions and are either promoted or removed based on performance) is to require they be symmetrical. The program should give the opposite prediction if the data is inverted, otherwise it’s nonfunctional. With this feature turned on, Genotick requires an inverted set of data alongside the real data, to compare.

However if you switch the names on the sets of data and say your inverted data is really your real data, Genotick still works properly. So therefore it doesn’t have a long bias.

Then I wondered if perhaps it had a ‘trend’ bias and didn’t respond well to mean reversion. Lukasz put together some synthetic data, one that had a trending bias and the other that had a mean-reverting bias (data and discussion here). Genotick did a little better with the trending data, but both yielded profits. Hmm, strange. Below are my runs using Lukasz’ synthetic data sets.

Screen Shot 2016-01-05 at 1.37.07 PM
Results using synthetic data with trending tendency.
Screen Shot 2016-01-05 at 1.25.07 PM
Results using synthetic data with mean-reversion tendency.

 

I then asked myself: what’s the ultimate in mean reversion? A cyclical waveform such as a sine wave! What if I created a data set that was a sine wave, dirtied it up a little with some randomness, and tested Genotick on that? Surely Genotick would sort that out, right?

And it does. Sort of.

Screen Shot 2016-01-05 at 1.10.39 PMHere you can see seven different runs, superimposed with the ‘dirty sine wave’ data in gray. The left vertical axis is the gain/loss percentage, and the right vertical axis is the synthetic data’s price. One run used the default settings, and the other six used randomized settings.

What I find odd is that, of the five runs that became obviously profitable, all of them took roughly half the data (1000 bars/days) to get going. And all but the dark-pink one reached maximum about 2/3 of the way, followed by drawdown. Why would that be? The data is essentially the same thing repeated, but with slight random variations. With random parameter selection and random programs being generated, this repeated behavior seems improbable.

I looked at a clean sine wave and got these results from two runs. The returns are so good I had to set the graph to log scale (and google sheets is a little buggy in log-scale mode, hence the weird spots at the beginning).

Screen Shot 2016-01-05 at 2.41.50 PMGenotick appears to have trouble with the noise in the dirty sine wave data, even though we humans could look at the graph and predict what would happen next. That gives me some ideas to check further.

If you’d like to take a look at the synthetic source data and the Genotick returns, you can view them here as a google sheet.

 

Divide By 20: Beating The Market By Chopping It Up

Apparently it’s easy to beat the market. Randomly-selected stocks can beat the market, as recently discussed on the Predictive Alpha website. You can even beat the market just by choosing stocks whose names begin with the letters that make up your own name (as discussed in an amusing blog post by Andreas Clenow here). If it’s that easy to beat the market, why aren’t we all rich? I’m still working on that problem but I’ll let you know when I figure it out.

Meanwhile, I started musing about whether yearly returns were mean-reverting, momentum-based or something else. For example, do the worst-performing stocks of one year rocket back to success the following year? Or do the best performers of one year continue their trend to the next? Perhaps neither? Let’s find out.

I looked at all the historical constituents of the S&P 500 for each year from 2000 through 2014**. At the close of the very last day of the year, I calculated the gain/loss of each member of the index for that calendar year, and ranked them from worst to best. I then divide this sorted list into “vigintiles”, or 20 sets. (I’ll have to thank Dr. Howard Bandy for introducing me to the term “vigintile”).

I then looked at how each of those stocks performed in the following year. I averaged all those forward-year returns per vigintile, and here are the results:

 

Screen Shot 2016-01-03 at 4.01.02 PM

The worst vigintile, i.e. the 25 stocks that performed the worst each year, shows a really good forward-year average return. But wait…

I then calculated the median forward-year return for each vigintile, for each year, and then averaged those median returns.

Screen Shot 2016-01-03 at 4.01.12 PM

Hmm, this is telling a different story. The worst vigintile isn’t showing the same bump when we look at the average of the medians. That means there are some outsized returns in the data that are skewing the average, and may not be repeated in future (or at least, not repeated when you were hoping they’d be). The middle of the range looks more enticing.

Note that the top vigintile (i.e. best 25 stocks in a year) shows mediocre performance in both charts. Much like the Sports Illustrated Cover Jinx, huge success may be mean-reverting.

Below you can see the maximum and minimum forward gains and losses for each vigintile. The worst stocks exhibit both extremely positive and extremely negative returns. In other words, they’re more volatile.

Screen Shot 2016-01-03 at 4.01.21 PM

Now let’s take a look at selected vigintiles and see how they performed over the period in question.

Screen Shot 2016-01-03 at 4.00.50 PMAbove are the yearly forward returns for vigintiles 1, 11, and 20 (worst, middle and best). This confirms our suspicions (I’m assuming you’re suspicious too, and not just following along numbly). The worst vigintile (blue) has some really awesome subsequent years, but also some pretty bad years. The worst and the best (orange) vigintiles also have more loss-making years than does the middle (red). Let’s investigate that further.

What if we were to trade our favorite vigintile over the years? At the end of each year we buy the 25 stocks that were ranked together in a performance vigintile, held them until the end of the following year, then sold and bought the new set of stocks in our vigintile. Rinse and repeat, compounding gains and losses as we go.

Below is a graph showing the returns of each vigintile, if I’d started with $10,000. No commissions (and no ordinary dividends) are taken into account.

Screen Shot 2016-01-03 at 4.00.43 PMBefore you get too excited about that purple line, it’s probably a bit of an aberration. That’s vigintile #11. I don’t know why it’s so much higher than the rest, so assume results won’t continue. Also, it didn’t start out-performing until 2010. None the less, there’s definitely a spread, and some vigintiles perform better than others.

Screen Shot 2016-01-03 at 4.00.17 PMAnd here finally (above) is a graph of the total compound return of each vigintile. Many of them beat the S&P 500. And the best ones, perhaps counter-intuitively, are in the middle. The boring mediocre performers show less volatility, less drawdown and consistent gains compared to either the worst or best performers.

Just for chuckles, here are the 25 stocks that fit into the 11th vigintile for 2015. I’ll check back in a year (maybe) and see how they performed.

Update 12/31/16: I did follow up on this, which you can read here.

symbol 2015 G/L %
KLAC -3.71
BBT -3.32
AFL -3.29
POM -3.27
SYK -3.19
NEM -3.12
VZ -3.04
GD -2.95
CMCSA -2.82
TGT -2.73
ADI -2.45
FITB -2.38
PCG -2.33
TXT -2.3
OMC -2.21
LRCX -2.11
SHW -1.87
JNJ -1.79
WFC -1.77
ABT -1.73
BXP -1.46
SCG -1.35
HRB -1.22
CBG -1
LEG -0.9
WU -0.89

** By looking at historical constituents of the S&P 500 index, I avoid survivorship bias. One detail about survivorship I’m unable to eliminate though: if a stock was a member of the S&P 500 and then went bankrupt the following year, I don’t necessarily have a 100% loss recorded. I only have until the end of the data stream. So if a stock was acquired or delisted, the return is recorded as the last day of trading for that stock, rather than the last trading day of the year. In the event of bankruptcy, I don’t have “zero” as the final trade price. I don’t know how often this happens, and how much it would skew the data. In practicality, any stock that was performing abysmally would have a very low price before being delisted, so most of the loss would be captured in this research.

 

Save