Data Mining vs Out of Sample Data

So in this last post, I data-mined the hell out of the S&P500 index (well ok SPY) and found an “anomaly”: every time SPY drops more than 1% from the previous close to the current close, you wait (that’s Day 0). You then buy at the close 13 days later, and sell at the close of Day 14. This showed significantly better return than if you did the same thing but owned all the Day 16s instead. Here’s the graph from the last post.

Screen Shot 2016-02-04 at 8.21.02 PM

But with only 177 samples of data between 2010-2015, that’s probably just a fluke….right?

So just for chuckles and grins, I checked an out-of-sample data set: 2000-2009. Admittedly this is an *earlier* data set, which sometimes people frown upon as being Wrong and Evil. But hey, I’m not trying to create a trading system ya’ll! I’m just mining for data. So here’s what day 14 vs day 16 look like in a much larger, OOS data set. I added the dates in the horizontal axis for this graph.Screen Shot 2016-02-06 at 11.48.02 AM

Hmm. Those statistical anomalies can be weirdly persistent.

 

 

2 thoughts on “Data Mining vs Out of Sample Data”

  1. More fun! Ok, so you’ve looked at sensitivity in the time domain a bit, and found it weirdly working that 14 days maintains its anomaly. I think the next thing I’d do is see what it looks like in another dimension: like how sensitive is this to following the >1.0% down day, as opposed to the >0.5% or >X%, or break them into zones, like the 0.5-0.8 down day.

    Truth is what it reminds me of is those old (actually, may still be extant) datamining bonanzas of late night infomercials of the 80s-90s for selling futures trading secrets to truck drivers and housewives. They loaded up every possible commodity future price history and produced gobs and gobs of gems like “Orange juice futures have risen 19/21 years for the five day period starting April 4th, that’s more than 90% profitable!” With lots of trading days, lots of periodicities, and lots of commodities, the hucksters could find lots of correlations to entice innumerate late night TV denizens to buy into.

Leave a Reply

Your email address will not be published. Required fields are marked *