Genotick and UPRO

Screen Shot 2016-02-20 at 9.17.02 AM

That graph looks like a bunch of spaghetti, I realize. But I’ll explain!

I’m back testing Genotick. It’s an open-source machine-learning java script. Probably to the developer’s dismay, I’m always throwing things at it to make it break. Much like a small child throwing a temper tantrum, but with stocks.

The previous version of the software had an added ‘feature’, which allowed for the little tiny robots in Genotick’s brain to have more than one instruction per robot. (Sorry if this technical language is over your head.) The developer wanted more variations in the predictions, so that users wouldn’t all be hopping on the same trades, thus skewing the results.

However, it turns out this multiple-instruction version was actually reducing performance quite severely, and Genotick had less success finding an edge. For the latest version, that’s been removed, but other nice features have been left in.

I thought I’d give it a test on some UPRO data from 1/1/2010 through the current date. Why UPRO, which is a 3x leveraged ETF that tracks the S&P 500? Why not regular ol’ SPY? Because I had the data sitting around already for some other purpose, that’s why!

Above are the results of 16 runs, using 20,000 robots per run (the developer calls them “programs”, but that’s not as fun). This isn’t compounded, it’s just adding the percentage gain/loss of each day. You can also see the price of UPRO as a thin gray line. Yes, it’s a visual mess. And all the runs took a serious nose-dive, and only a few ended up even profitable.

But that’s not the point.

Genotick trains as it goes. There’s no in-sample training data set…it continually uses the past to guess the future. Each program that guesses correctly is rewarded, and lives to fight another day. Programs that guess incorrectly are weeded out. As time goes by, Genotick gets better at guessing.

As you can see from the chart above, Genotick figures out how to make money about half way through. From there the equity curves are pretty consistently upward. It is not an unreasonable assumption to look at programs that are successful after awhile, even if they are in drawdown from the initial stages. Or perhaps you think that *is* unreasonable…in which case, tell me why you think that in the comments section.

Below we look at the second half of the data. I’ve zeroed the Genotick runs, and the actual price of UPRO is again in the background in black. These runs are all very consistent with each other.

Screen Shot 2016-02-20 at 9.45.07 AM

If we compare the daily percentage gains and losses of UPRO vs the best Genotick run, this is what we see:

Screen Shot 2016-02-20 at 9.59.45 AM Blue is UPRO returns, red is the Genotick best run. The Genotick run is a straighter equity curve. You can see UPRO starting to tip over, whereas Genotick gets there more efficiently. Yes, UPRO was more profitable at one point, but a) they both end up about the same place, and b) there’s more to the story.

Now let’s look at drawdowns for both curves:

Screen Shot 2016-02-20 at 9.17.50 AM

 

Orange is the new black, or in this case, the best Genotick run. Gray is UPRO. You can get a sense just by looking at the drawdown chart that UPRO spends more time in drawdown. This is backed up by the percentage of time each curve is making new highs:

Screen Shot 2016-02-20 at 9.18.19 AMThe maximum drawdowns for each are about the same:

Screen Shot 2016-02-20 at 9.58.58 AM

One last detail: Genotick has a setting where it will stay out of the market if the robots are not sufficiently convinced they have a consensus. For this set of runs, I had a 10:1 ratio set. This means theĀ UPs must outweigh the DOWNs by a ratio of 10:1 to offer a prediction, otherwise the system remains flat.

Genotick was only in the market 56% of the time, whereas UPRO buy-and-hold was in 100% of the time. This means that the Genotick run averaged a +0.2% gain per day, whereas buy-and-hold only averaged +0.12%.

Screen Shot 2016-02-20 at 9.59.09 AM

If you look at all the times Genotick was right, divided by the total number of times Genotick made a prediction (i.e. entered the market), the hit rate is 57.37%.

Not bad for a robot.

The usual caveats apply: this in no way confirms that Genotick is predicting the future. Nor are pesky details like commissions accounted for, which certainly would add up if you were trading daily, or even every other day (which is what this run would have had you doing). You’d need a really high position/commission ratio for this to be practical.

8 thoughts on “Genotick and UPRO”

  1. I think it is definitely valid to look at the beginning of the equity curve. At that point in time, that’s the only data you have, and if you were taking those signals it would be pretty hard to stick with a “black box”‘s predictions as your system lost money. I’m not too familiar with the software, but in backtesting in general it’s easy to convince yourself you would’ve traded through an initial drawdown because you knew good times were ahead. Also, in a situation where you don’t explicitly know how your signals are being generated, the performance of the system is really the only thing telling you it may be broken.
    Just my $0.02, keep up the good work! I read everything you do and there’s always great new info on here

  2. Hi Brian,
    I’m the guy who created this software, so I’m probably biased but I need to give some explanation: Genotick has no built in rules for trading. It learns by building trading systems by combining really basic instructions to manipulate data. Therefore, it’s reasonable to assume that at the beginning it doesn’t have a clue. In fact, even when logging profit at the end, second half of equity is summed up separately (on github, will be in new version soon).
    Lukasz

    1. Allowing the model to run and fail for a while is so widespread in various statistical approaches that it has a name, “burn in”. In Bayesian inference (markov chain monte carlo for example), it’s a necessity. Also, with Kalman Filters and Particle Filters a certain percentage of initial outputs are ignored. Neural nets will train for multiple epochs. The list goes on and on.

  3. Will be interesting to see distribution of robots life time. Long mean time will allow to ride robots longer to catch the profit.

  4. Is the software akin to a expanding-window walk-forward analysis? i.e. every day’s prediction is based on all past data points? Would it be fair to say that the program needs at least as much data as was given before turning profitable in order to “calibrate”/eliminate bad models? My comment wasn’t meant as a critique of the software, just responding to whether or not it was reasonable to discount a drawdown at the beginning of a backtest.

    1. Hey Brian!

      The software works more like a rolling window than an expanding window, because there is a parameter to limit how far back it can look when creating programs for prediction. A normal backtest looks at all the data and optimizes based on the entire in-sample set. This works more like a walk-forward test (I think).

Leave a Reply

Your email address will not be published. Required fields are marked *