S&P 500: The impact of survivorship bias

Over the last couple of weeks I had an important project running. Together with the help of some trading friends we created a S&P500 survivorship bias free database starting 1990.  I did a similar exercise for Nasdaq 100 stocks already see  (here and here). With this post I want to share some of my S&P500 specific findings that might be relevant for you.

My trading systems focus on stocks of a particular index. I like to do that, because trading S&P500 or NDX100 stocks ensures you that there is enough liquidity (at least for my account size). Furthermore trade execution should be smooth with narrow bid/ask spreads. At least that’s my believe and as Van Tharp says: we don’t trade the market we trade our believes about the market.

For backtesting my strategies I purely rely on Norgate’s premiumdata service. They provide me with current data as well as delisted data. I’m very satisfied with their service. Hence I build my survivorship bias free database based on Norgate’s premiumdata. I’ve heard Norgate is going  to release a service for providing historical index constitution during next year. I’m not associated with them, though I can strongly recommend their service. Data just hassle-free in good quality!

Numbers for the numbers lover

Let me give you some raw facts about the index (from 1990 until November 2010)

  • About 1006 stocks were part of the index
  • 402 stocks  that used to be part of the index are delisted now .
  • Only 189 stocks survived staying in the index from 1990 until today.
  • About 5.7% of stocks enter/leave the index every year. That’s less than half the value of the NDX100 turn-over rate.

.

The impact of survivorship bias

In order to understand the impact of survivorship bias I created a simple swing trading system (posted at the end). Here are the entry rules in short:

  • Long only
  • Entry when RSI2 < 30
  • Exit when RSI2 > 70
  • 10 Positions, each 10% equity
  • Portfolio ranking:  low RSI2
  • No commission / slippage
  • Explanation of the key performance indicators being used (link)

.

I run the test with two different data sets. First with today’s S&P500 stocks (NO) and second with survivorship bias free adjusted data (YES). The results are about twice as good using today’s index members only!

Using today’s S&P500 stocks to backtest strategies will guide you towards wrong decisions in your system design & development process.

AmiBroker AFL-Code

#include <SP500_member.afl>;
SetOption("CommissionAmount",0.00);
SetBacktestMode( backtestRegular);
SetPositionSize( 10, spsPercentOfEquity );
SetOption("MaxOpenPositions",10);
SetOption("MaxOpenLong", 10);
SetTradeDelays( 0, 0, 0, 0 );
Member       = SP500member(Name(),DateNum());
RoundLotSize = 0;
Short = Cover = False;
Buy          = member AND RSI(2)<30 AND Year()>=1990;
Sell = RSI(2)>70;
PositionScore= 100 - RSI(2);

Comments

  1. A trading friend sent me an email asking what would happen trading the stocks after they left the index? The assumption behind: once a stock has left the index, most of the negative news should be priced in. Money Managers anticipate index changes and position accordingly before th change.

    I quickly tested this strategy
    – Stock has left the index within the last 30 days
    – RSI2 70 for exit

    Result: -6.43 CAR, negative sharpe, 86% draw down.

    However, if you add another filter to just trade the ones above MA50 .. you get positive results.

  2. It’s pretty interesting that for a short term swing based system the results are significantly different. I was wondering how you get a list of the historical constituents. I’ve been trying to make a survivorship free database of the OEX+Qs but my main problem is knowing what was part of the indexes in the past. Thanks.

  3. I am interested in building this database for myself. Where did you get the list of SP500 changes? How did you match these changes to Norgate’s data.

  4. Well, after Frank’s articles I have full realized the importance of survivor ship bias.
    I have just tried Norgates trial data, and after checking my systems against it.. My backtest results are very much worse!

    Alarmed, I have been testing and studying the database: I have got the trial Norgate’s database, without delisted-stocks. And, the problem is that my former database (quotes Plus) has about 850 stocks >2Millions dollars daily-trade, in 2-jan-2001, and Norgates (without delisted) has only 185 stocks >2M in that day.
    A very big difference between two databases.
    Obviously, I need the delisted stocks of Norgates to compare both databases.
    But, I see the importance of the databases.
    I think I’ll try Norgate instead of Quotes Plus…

  5. I highly recommend Norgate Premium Data. Excellent service. And even better, you only pay one time for delisted data.

    Nice work Frank!

  6. very interesting !
    I am thinking also to build such a database from the norgate ones. As far as I can understand the Norgate Database is a MetaStock Database.
    Is there a way to comvert it in excel ? I can do some VBA Programming and maybe I can manage to build the survivorship database

    Thanks
    Ken

    • Hello Ken,

      I can’t answer that as I never tried to get data out of the “Norgate” database.

      Frank

      • Hello Frank,

        thanks, I realized that actually the problem is not the Norgate db, the problem is where to have the history of the SP500 Index members.

        Can you pls. help with that ?

        Ken

  7. @ Ken… Yeah I use norgate too, and they have a converter that comes with the data purchase. It allows you to convert all the data to csv in one swoop, and alter the formatting that it gets converted into. From csv you can obviously read it into excel. I purchased their data with the delisted stocks and it seems pretty clean and accurate. No complaints.

  8. Hello Frank,
    I am very satisfied with Norgate’s service too. However, I face a different problem: I need the earnings release date infromation for the delisted US stocks. Can You recommend a source for this kind of information?
    Thanks,

    Laszlo

  9. Isn’t it simply a case of the strategy having a “bullish bias” combined with the fact that (naturally) stocks that have been de-listed or dropped from the index have a negative bias, impacting and dragging the system’s performance to the downside.

    I wonder how the strategy would perform on both sets of data if you use detrending to adjust for the position bias in the strategy. My bet woudl be that both results would be closer to each other..

    Still, the article makes an excellent point on understanding the data used for back-testing and making sure it is realistic…

  10. What if you tested to buy the stocks that just got included in S&P500? I would think that big index funds could push up the price when they start buying shares.

  11. I’ve actually messed around with this with one of my models and the results actually improved using the given index constituents for that year. Basically, using the current members, a good amount of them did not exist back in 2000, 2001, etc, therefore less trading opportunities meant less P&L. I think this effect could be very model dependent.

    Also, any chance of posting your historical index constituents?

    • Hello Marko,

      yes, I’ve seen the same affect for some test.

      I’ve spent to much time & effort to just post the historical index constituents.

      Frank

      • I have the entire Nasdaq back to 2000 and the S&P back to 1995. I was sort of looking to compare what I have to what you have. Shoot me an email and maybe we can arrange a swap.

  12. Hello Emil,

    did a quick test: same rules, but only stocks that will be part of the indix WITHIN the coming 30 days. The assumption you can foresee upcoming index changes as the rules are public.

    Here are the results: 45% risk adjusted returns, Avg% 0.96, Sharpe 0.83

    Not bad …

    Frank

  13. Hi
    I don’t know if someone could answer a very simple AFL question.
    I am trying the Norgate delisted-stocks, but sometimes my system buys the last bar of a stock and the position remains opened to the actual day, making the results false.
    I have tried to solve this with the buy condition:
    BUY= … AND NOT EndValue( C );
    or
    BUY=… and NOT EndValue(barindex() );
    or
    BUY =… and NOT isnull( ref(C,1) );
    But nothing works.. 😦
    Does anybody know the solution??
    Thanks!

  14. Norgate looks good for a lot of things, checking their website however it appears they do not adjust for dividends. But overnight systems need to take in to account dividends before you can really draw any conclusions, in my honest opinion, for the model to be more accurate with how trading works in real life – dividends must be included. Imagine you have bought a high yield stock on low RSI as per your rules and the next day it pays out a 1.5% dividend. If you do not adjust for dividends, your system will eat that loss every time a dividend is paid, when actuality it was not a loss. Nearly 1/5 of the SPX has a yield over 2.5%. Considering the generally low CAR for SPX spot last ten years, dividends are a huge deal!

    I have tested short only rotational systems on SPX stocks with dividends (Yahoo) and without (Worden) and seen drastic differences in CAR. I’m talking 25% Yahoo, 50% Worden. That is extreme, coming from a system which held roughly 3 days and was in the market 100% of the time, rotationally trading with similar simple RSI rules.

    Of the few relevant pieces of literature I have seen, I have read that new additions to the SPX tend to under-preform those which were recently removed over the following 12 months.

    As a complimentary note to the last sentence above, most ‘model gone real world’ trading fund managers do not micro-manage their universe of valid stocks by daily re-balancing in synch with actual SPX additions/removals , it simply is too daunting a task if you have to periodically compare your real world results to your walk forward modeled results – which is done for the sake of being sure you are not imparting strategy drift between real life frictions and the model’s assumptions. Furthermore, if you develop a strategy that is overly concerned with avoiding the pitfalls of survivorship bias, by re-synching your universe daily, you potentially introduce a new form of bias, i.e. avoiding doing poorly in the several outlier years where turnover was unusually high(2001, 2007,2008), and correspondingly turnover drag has affected your universe.

    I’m saying the following based on gut intuition. But I bet that the SPX as a whole, 80% of the years, going back 50 years, annualized holdings turnover is below 2.5%, and that these recent times are more of an exception. A 2.5% turnover is not going to break a system which has been backtested on 5 years of static universe with the assumption re-balancing and re-optimizing will be done once a year. In my honest opinion.

    There is also the potential of re-entry. Meaning stocks which are removed from the index, may come back several months later. I have seen something like 20% difference in the S&P 500 over a big 12 year window – something like 1993 compared to 2005. That makes for an even lower ‘real turnover’ than the 2.5% which I am suggesting in terms of actual day to day.

    My gut tells me re-balancing your universe quarterly to match with the real SPX list (as many FM’s would do) is going to impart much better results across the trading system as opposed to re-balancing daily.

    This is an interesting study you have presented and is in line with what I have read regarding the turnover rate in the NYSE compared to NASDAQ. If anything it convinces me that long/short equity is much more prudent when rotationally trading inside of a large index such as the S&P 500.

    Food for thought. I could also just be wrong. But wanted to point out some of the gray areas worth considering.

    • Hello D.,

      first of all Thanks for sharing your experience here!.

      – Norgate is going to release cash dividend adjusted data soon. What I’ve been told mid Dec. However, I agree that not having dividend adjusted data is missing an opportunity when reviewing trading systems.

      – Of course a solid system should continue to do well even if it runs on stocks that recently left the index. I do my test based on index stocks because of slippage / commission.

      – The test presented is based on a daily re-balanced SP500, because I’ve got the data in daily granularity and I can continue doing that with little effort. As Norgate is updating the list as changes occur.

      – Based on your detailed “food for thought” I’m going to make another post on specific performance results after stocks have left the index.

      Regards,

      Frank

      • Hi Frank

        Did Norgate give you an update on when they will release their dividend adjusted data. Also, did they tell you price? I emailed them a couple of days ago requesting information on this feature, but haven’t heard back from them.

        Thanks.

        Love the blog!

        • Hi Ron,

          they told me early next year. I know they have a major re-design of their data updating tool in the makings (along with a few other features). It’s currently in beta.

          Thanks for the kind words.

          Frank

  15. Hi Frank, excellent post! I’ve been wanting to do something similar for some time now, have acquired the historical changes in the indexes I’m following, but don’t know how to put together an AFL file to get it done.

    How would I go about putting together an AFL file to create a list for testing?

    • Hello Graeme,

      most important is to match the list of SP500 constituents with your providers ticker list.

      That’s the most difficult job. Once you have that you simple create an AFL function:

      function SP500member(data, Datnum)
      {
      index = False;
      Maxdate = 1200101;
      switch(data) {

      case “ABCDE”:{ index = ((datnum > 1000701) AND (datnum < Maxdate)); break; } //
      ..
      ..
      ..

      }
      return (index);
      }

  16. Hi Frank, sorry for the late reply.

    Thanks for that, should be enough for me to run with to create my own version.

    Most appreciated!

    Graeme

  17. Hi Frank,

    sorry to bother you again, I’m having a rather hard time trying to figure this out(I’ve never worked with switch statements before).

    With the code you provided above, where you have “ABCDE” after the case statement, is that meant to be where I input the code for the company? Using google as an example, would the statement look like this:

    case “GOOG”:{ index = ((datnum > 1060403) AND (datnum < Maxdate)); break; }

    From there, I'm gathering the AFL file created is then used as an include file for any future testing? If that's the case, then how would I go about including it in my BUY conditions to use in my testing?

    Thanks in advance for any help

    Graeme

    • Hello Graeme,

      the SWITCH statement is part of a function. The switch-function will return TRUE/FALSE depending if the stock has been part of t he index.

      #include ; //your switch statement should be part of a function in this file
      memeber = SP500member(Name(),DateNum());
      buy = RSI(2)<30 and member;

      Frank

  18. Thanks Frank, all working well now!!

    Really appreciate the help.

    Graeme

  19. Hi! This post couldn’t be written any better!
    Reading this post reminds me of my old room mate!

    He always kept talking about this. I will forward this page to him.
    Fairly certain he will have a good read. Thanks for sharing!

Trackbacks

  1. […] a recent post I evaluated the impact of survivorship-bias free back testing on S&P500 stocks (see here). I’m going to reuse this database for this test as […]

  2. […] index version. I already wrote a post about the impact of survivorship bias (please read here link).  Over the last couple of weeks I got a number of request as many system developers have the […]

Leave a reply to public adjuster Cancel reply