Autocorrelation of SPY, and the Redneck Correlogram

I’ve been reading books by Michael Halls-Moore and my head hurts. Not having any formal training in statistics, I only understand about half of the material. None the less, I found his discussion of ‘correlograms’ interesting. I even installed R on my computer (even though I haven’t fully grasped Python yet!) and was able to make some correlograms with R. However not knowing anything about R (sensing a theme here?), I thought I’d come up with my own version of a correlogram using AmiBroker and Google Sheets. A ‘redneck correlogram’ if you will.

So what is a correlogram, you ask? Here’s a link to a wiki page on the subject. My interpretation: it’s a tool to see if a time series of data (i.e. stock prices) is autocorrelated (i.e. is there some connection between price movements down the line from the day in question).

For each day being examined, which I’ll call the ‘trigger day’, I first look at the directional movement from the previous day. Was it up or down? Then I look at the next 20 days, and record whether they move the same direction or the opposite direction as the current day. If they moved the same direction, that column gets a ‘1’. If not, a ‘-1’. The mean of each day’s values is taken.

I include day 0 in there, which is perfectly correlated with itself, and so will always be at 100%. A perfect anti-correlation would show a value of -100%. No correlation at all would thus provide a value very close to 0, because all those -1s and +1s will be equally distributed and average out to 0. Basically, white noise. Some days will show spikes, but if it doesn’t poke over the 5% threshold, then it’s just random. Even some spikes barely above the threshold are going to be meaningless.

Now throw on your data pants and let’s look at SPY from 1/1/2010 to 12/31/15, including the 20 lagging trading days after the last trigger day.

Screen Shot 2016-02-04 at 5.46.42 PMSo this is how SPY is autocorrelated to itself. Day 0 shows 100% correlation with itself. I’ve truncated the bounds of the graph at .5 to show more detail in the other values. Note that in this case, day 0 (the trigger day) could be either an up or down day, and the bars are showing correlation or anti-correlation, regardless of the direction of the trigger day.

The green lines are the +/-5% mark. Day 9 is just touching that mark, but does that mean day 9 tends to go the same direction as day 0? It’s probably just a false positive. No other days are showing anything significant. Bunch of noise!

What if we filter for just up days or just down days?

all up days
all up days
all down days
all down days


The first graph is for up days only, so the trigger day of course is always positive. Looks at all those correlated days down the line! But wait…the down days shows significant anti-correlation across the entire lagged series too. What gives? Well this just shows that the market had a strong tendency to be positive during 2010-2015. So in general, whether you had a down day or an up day, you were likely to have up days following it.

I however am not as interested in the day-to-day random noise of the market. I am more interested in what happens when the market goes ‘bang’. The market normally hums along as background noise until something whacks it like a child banging on pots and pans. Then prices start vibrating loudly, and the juicy trading happens.

Let’s take a look at the autocorrelation when the market closes up at least 1% over the previous day’s close:

Screen Shot 2016-02-04 at 6.05.21 PMCompare this to the more generalized ‘all up days’ correlogram from earlier. Here, day 1 is pretty ambivalent, and slightly on the negative side. This seems logical if you’re a market watcher: after a big up day, we often have a slight down day as traders take profits. Day 2 is significant but no more so than the ‘all up days’ chart. Day 4 though, now we’re poppin’ over the 15% mark. What this means is that, more often then not, if you have a >1% day with SPY, you’ll have an up day of some sort 4 days later. At least for the time period we’re looking at.

And what’s with day 20 (a month later)? Is that fluke?

And finally, let’s look at all the down days that were more than 1% from the previous close.

Screen Shot 2016-02-04 at 6.15.51 PMAs expected, most days are negatively correlated, since we saw this in the ‘all down days’ graph. However we don’t have anything quite as strong as the ‘1% up’ chart in the early days following the trigger. And yet, day 14…that’s the highest value away from zero of the whole series of graphs.

Does this seriously mean that if we have a down day bigger than 1%, that we’re more likely to have an up day 14 trading days later? Can we make a trading system on that? It seems so…unlikely.

And yet…if we plot an equity graph for day 14 gain/loss vs day 16, we see this:

Screen Shot 2016-02-04 at 8.21.02 PM

Ignoring such nuisances as commissions and fees, we get the following:

day 14
avg gain/loss win rate
+0.0895 % 54.80 %
day 16
avg gain/loss win rate
-0.0301 0.4802

So I have four questions for you:

  1. Is this enough data?
  2. If it’s not enough data, then is data before 2010 still relevant?
  3. Is this a pattern unique to the time period, or will it persist into the future?
  4. Am I doing something wonky with statistics that all you quant-types know is just foolish?


9 thoughts on “Autocorrelation of SPY, and the Redneck Correlogram”

    1. Thanks for your post. i read your blog!

      Just so I’m clear: do you mean if I start moving if I offset the whole lag set of 20 to look for patterns then I’m curve-fitting? Or did you mean that lags 2-20 on this correlogram are not valid?

  1. I’ll bite.

    1. Enough data? No, and yes. I posit there is never enough data, and yet we do with what we have to get ideas to consider further so yes.

    2. Relevant? But the more data we have of this particular sort, the less useful it becomes since the market changes over time. Try these daily autocorrelations back in the 60s and you get fabulous correlation (daily momentum rocked!); see some of John Orford’s posts from last year on this topic for ideas. Daily momentum lasted solidly into the 70s, and since then has progressively gotten more random (shifting rapidly back and forth from momentum to mean regression). This is Michael Harris’s “momersion”.

    3. Absolutely. 🙂

    4. I’m not a quant, but I’m numerically minded. To me this is data mining, pure and simple. Consider your last chart, where day 14 provides the fun. So there’s only 177 samples, but you’re separating them into 20 bins… smells to me like day 14 is not out of the ordinary range of possible outliers with such a sampling. Some day-number is going to have more correlated returns than any other, and 14 is it.

    Other smell tests (as you know from your other posts): look for plateau areas – day 14 is more of a spike than a small peak on a plateau. And does it make conceptual sense from a market knowledge sense that day 14 will have such a correlation? Not that I can see.

    Finally, of course a “Day 14” strategy backtest is going to give you a great equity curve, just as you know before trying that a “Day 17” backtest will suck even worse than Day 16 did. That’s what the autocorrelation told you.

    I like your posts, always thoughtful. Thanks for sharing.

    1. I appreciate your thoughtful comment, Paul!

      What we really need is more data for a given time period. By the time we have enough data to build a system, the regime changes and we have to start over.

      You’re right, 177 samples is not enough data! And also right that the graphs are showing the same thing, not revealing one thing via another. Day 14s were good because day 14s were good, not because there is some inherent magic about day 14. Or, if there is, this doesn’t provide sufficient evidence for it.

      But it’s fun to imagine finding a statistical needle in a haystack. 🙂

    1. Sure! THe AmiBroker code is only part of the story, as you need to average each column in a spreadsheet after the fact. Let me know if you have questions:

      Filter = C/Ref(C,-2)>1.015; //this is whatever you want it to be for a test event.

      updown = IIf(C>Ref(C,-1),1,-1);

      AddColumn(updown * updown,”0″,1.0);
      AddColumn(updown * Ref(updown,1),”1″,1.0);
      AddColumn(updown * Ref(updown,2),”2″,1.0);
      AddColumn(updown * Ref(updown,3),”3″,1.0);
      AddColumn(updown * Ref(updown,4),”4″,1.0);
      AddColumn(updown * Ref(updown,5),”5″,1.0);
      AddColumn(updown * Ref(updown,6),”6″,1.0);
      AddColumn(updown * Ref(updown,7),”7″,1.0);
      AddColumn(updown * Ref(updown,8),”8″,1.0);
      AddColumn(updown * Ref(updown,9),”9″,1.0);
      AddColumn(updown * Ref(updown,10),”10″,1.0);
      AddColumn(updown * Ref(updown,11),”11″,1.0);
      AddColumn(updown * Ref(updown,12),”12″,1.0);
      AddColumn(updown * Ref(updown,13),”13″,1.0);
      AddColumn(updown * Ref(updown,14),”14″,1.0);
      AddColumn(updown * Ref(updown,15),”15″,1.0);
      AddColumn(updown * Ref(updown,16),”16″,1.0);
      AddColumn(updown * Ref(updown,17),”17″,1.0);
      AddColumn(updown * Ref(updown,18),”18″,1.0);
      AddColumn(updown * Ref(updown,19),”19″,1.0);
      AddColumn(updown * Ref(updown,20),”20″,1.0);

  2. Hi Matt
    You wrote that you tried something in R. I m newbie in R and I puzzled a lot how to do ACFs like yours. I know about acf() function, but I can’t do ACF with all up or down days like you.
    Can you (or anybody else) how to do it in R or give me the code?

    1. Hi Dan. I did the correlegram in AmiBroker specifically because my R skills are at a very basic level. I don’t know how to filter for only certain events (and their trailing days) using R at this point. Sorry! Perhaps another reader can help. If I see a response to your question, I’ll try and email you to let you know.

Leave a Reply

Your email address will not be published. Required fields are marked *