FreshMinds Research Blog header image

Rod Hull and what there is to gain from the ‘grainy’ approach

This Matt Yglesias post (and especially its comments) reminded me of one of my first projects for FreshMinds Research. For obvious reasons, I can’t go into the precise details, but I’ll tell you an adapted version to give you some idea…

Presented with a huge data set by TV Nostalgia College, we were asked to find out which factors were most closely linked to the popularity of the late Rod Hull and Emu. One key factor to explore was age, and there was a clear correlation between this and Rod’s popularity, with the youngest students least likely to be fans.

Given the sheer weight of data and number of variables, we had split the sample into five age bands: 14-18; 19-29; 30-39; 40-49; and 50 and above. With some pernickety part of me already a bit bothered by the different sizes of these bands, a more important consideration then came into play. Given the government’s definition of 14-19 as a key age group, we decided that it would be necessary to pile through the data sets and re-categorise all the 19-year-olds.

By now you’ll be on the edge of your seats, I’m sure (!), but bear with me, because there is a point here. Once we’d rejigged the age bands, a lot of the data was suddenly looking worryingly different. And it didn’t just stop at Rod and Emu: net promoter scores for Inspector Gadget, Worzel Gummidge and footie pundit Jimmy Hill had all markedly changed.

As a result, I did what I now try to do first with every continuous variable of this kind: create a rough-and-ready graph before choosing bands. The problem with this approach, of course, is that the sample size for each individual age (or any time-related equivalent) is often too small to be reliable. Nonetheless, you can easily pick out a pattern that will inform the way you choose to manipulate the data. In this case, the love for Rod and his bird compared to age as follows:

% students at TV Nostalgia College who adore Rod Hull, by age

Above the age of 51, the sample size gets so small that the graph becomes misleading, but this rough sketch, which took just moments in Excel, shows something that the original chart simply did not: it is 16- to 23-year-olds who lack the love, with 18 year-olds the least impressed of all. Combine this with the knowledge that 18- and 19-year-olds are the most numerous age group, and it explains why a seemingly minor change of band can have such a large effect on your output.

So what did I learn? Get the overview first, in all its grainy and unwieldy complexity, and only then decide how to split up your sample. You might keep this original, create a scatter graph, plot every single result, or stick with your first instincts and create the most intuitive bands possible. In any event, it will mean you create something much more informative than this, which came from the same data:

% students at TV Nostalgia College who adore Rod Hull, by age band

% students at TV Nostalgia College who adore Rod Hull, by age band

 Read more here

About the author

Dave Bevan is an Interim Analyst working mainly in the Education Team at FreshMinds Research. He previously worked for the G77 (group of developing countries) at the Rome Chapter of the United Nations, and before that was a dessert chef, a tour guide on London’s open-top buses and an inconsistent stand-up comic. Dave’s interests include this, this and this.

0 Comments on “Rod Hull and what there is to gain from the ‘grainy’ approach”

Leave a Comment