Thursday, July 5, 2012

Fun with normalization, economics edition

So the new thing in economic circles is for rich countries to look at poorer ones as economies to emulate and this has sparked some kind of debate about Iceland, Estonia, Latvia, Ireland and Lithuania: which one fared best during the recession? And debate begets ... graphs! I love graphs.

This blog summarizes the graphs, but takes a demonstrably wrong view of the data. I link to it because it and the side it supports are to be the recipients of my spittle-flecked ire.

The subject of the graphs are the RGDP data for the aforementioned countries. Now RGDPs of all countries at any particular time form a power law distribution, making it difficult to graph on standard axes in a way that conveys information. That's why humans in all their wisdom have invented several ways to "enhance" graphical information to try and make their point. Percent changes, derivatives, normalization, logarithmic scales: pick your poison. But make sure you pick the right poison because certain poisons work on certain subjects.

Let's start with the "raw" RGDP data. Iceland is in blue, the rest red (because the question is: Is Iceland faring better?). I forgot to put units on the graph (bad Bourbaki) but the y-axis is Millions of 2005 Euros.
Iceland has only a few hundred thousand people in it so its RGDP is pretty small. However, RGDP per capita is huge; Iceland and Ireland are wealthy countries relative to the others. So according to one metric, How much money do you have?, Iceland wins with a much larger RGDP per capita.

But we want to look at the recession, so one side of this "debate" made the choice to normalize to the pre-recession peak. This is standard practice in economics. When looking at a recession, it only lasts a few years so inside of that window your RGDP data are approximately linear. Normal growth rates (r) are on the order of a few to several percent per year (t) so r*t << 1 for several years. For linear data, normalization is fine, but you need a way to select your normalization point that isn't arbitrary ... hence choosing the peak (or other feature). This is what we get.
You can see Iceland near the top from 2008 to 2012 since its recession wasn't as big relative to the peak. Even the pre-recession data has some value because it shows the run-up to the peak (slope) was shallower in Iceland. Lots of information. The pre-peak levels are not valid for points far from the peak for reasons we'll describe later. Overall, lots of information. Excellent.

Except that the libertarians of the world love the Baltic countries because one of them mentioned Milton Friedman at one point. So they set out to show this wasn't correct. They chose to normalize to the year 2000. And thus, Iceland sucks.
Why 2000? No idea. The peak of Iceland's boom was about 2002; the year 2000 also represented unremarkable years in the other countries. You can choose other years. In fact, if you choose other years, you can show Iceland being anywhere from the bottom to the middle of the pack (2003) ...
To the top of the heap (2006) ...
Actually, by choice of normalization year, you can show any country listed to be at the top of the heap during the recent recession (2008 to today). Iceland in 2007, Estonia in 1997, Latvia in 2011, Ireland in 2011, and Lithuania in 1997. In fact, the year 2000 is the year you'd choose if you wanted to show Iceland at the bottom (which makes me think this was deliberately manipulated by one side of the argument).

In general, a normalizing time series data that is linear in log space creates a time dependent scale. For short times, log(a+b*x) ~ log(a) + x*(b/a) + o(x^2). You can normalize lines. But over 10 years or so with growth rates on the order of a few to several percent per year, you need those o(x^2) terms.

If you look back to the first graph, you can see a nice long linear trend in the data, which suggests the correct way if you want to look at recovery from a deviation from the previous trend: fit the pre-crisis trend and look at the percent difference.

Here are some fits to the pre-crisis trend (Iceland: blue, Estonia: red, Latvia: orange) ...
Note these linear fits have different slopes and intercepts. That's why normalization to a specific year allows you to put any of the countries on top. Also note that the slopes are higher for the Baltics. I think this is what the libertarians are trying to give credit for, but the overall higher trend is not germane to the question of how bad the recession is. Additionally, all poor countries like the Baltics all have higher growth rates than rich countries like Iceland and Ireland. Rapid growth from a low base is what is behind massive growth numbers from China, for example. You can think of it as picking the low-hanging fruit. (China is in the process of transforming low productivity agricultural workers to higher productivity industrial workers.)

The result after taking the percent difference from these trends (Iceland: blue, everyone else: red) ...
Iceland is on top again.

What have we learned?

  • All data can be manipulated. If someone shows you a graph in a certain format (removing the origin, normalizing to some arbitrary year), question their formatting choices.
  • Corollary: Especially question when someone decides to change the format of data previously graphed data to opine or make a political/partisan/school of thought's point. It could even be just to get more page views.
  • Normalize and scale to features of your data (peaks, troughs, trends), not arbitrary points.
  • Specifically, Iceland seems to have fared better than the Baltic countries (and Ireland) in the recession when the data is normalized to the peak or fit to the trend. Iceland is also doing better when measured by RGDP per capita. As these (peak, level, trend) are the only features of a linear data set besides, say, the level of noise/seasonal variations we can with confidence say that Iceland is indeed doing much better when measured with RGDP.

Marginal Revolution has been a serious offender on this kind of manipulation in the first bullet. Or at least on spreading the offending graphs around. This graph shows the same shenanigans mentioned here. This graph basically shows the first graph at the top of the page and asks what's the big deal? The big deal is of course that graphing on a linear scale in this case exaggerates the level when the question is about the trend. They are entering Freakonomics territory. My opinion of this last graph is well known.