Here’s an oddity I ran across that I bet you didn’t know – people on average will do 3 silly walks a day. A silly walk is defined as any walk not the standard one foot in front of the other such as: skipping, hopping, galloping, shuffling, moonwalking, sliding, crab walking, levitating, etc. Pretty weird right that we do this. I mean, who would have thought.
Speaking of mean, that brings up a question. What does average even mean? Well, doing some research we find that it’s just the sum of the values divided by the number of values. In our silly walking case we have some number of people asked and how many ambulatory variations in total they performed. Sounds pretty simple to do.
Our study included six participants (I know, kind of a small sample set but this is just for demonstration). Five of the those asked performed zero silly walks (wait, what?). The sixth person, who it turns out is a huge Monty Python fan, does an astounding 18 silly walks per day. If we do the quick math the total number of silly walks is 18 (just the one person since the rest did zero) divided by the 6 people which gives us 3. You should be thinking, um wait, only one person did the walks, but the average person does 3 yet the rest did zero. Something doesn’t seem right.
You are totally correct. Averages are simple yet they hide a dangerous flaw. They like evenly distributed data. Now I’m trying to keep these articles math and statistic lite, so we’ll keep the definition of distribution pretty simple. In evenly distributed numbers, there isn’t a lot of variation from one number to the next when they are sorted (stat people, I know this is super simplified). Our data would look like: 0,0,0,0,0,18. Everything is good until those last two when we jump from 0 to 18 – it’s a big change. That’s what’s called an outlier – that means it lies outside the range of the rest of the numbers. Sets of data like this can really mess with the average and render it pretty meaningless. Obviously, most people do no silly walks.
If the data was better, we might have seen it look like this: 1,2,2,4,4,5. There are still six observations and they total to 18. So, while no one asked performed an exact 3 silly walks per day, 3 is the average. We can see that there is no large change from one number to the next in the ordered set, so they are pretty evenly distributed. In this case, the average is probably much more trustworthy.
The important question to ask when you see an average then is what the underlying numbers looked like. Were they even distributed? We’re there any outliers? If there were outliers, we’re they removed (sometimes it makes sense to remove them, other times they are important, but that’s maybe another article). Averages surround us. If you start looking you will likely see them everywhere. On average, how many do you see a day? The danger is that you probably know nothing about the underlying data and you can’t assume the person making the number did either. Averages are easy to abuse to make something come out the way you want even if it’s not correct just like I did with the silly walks data.
Averages are just one type of method for checking the centricity of a set of data. You may often hear average also called “mean” – that’s the more accurate name. Re-read those first few paragraphs for some foreshadowing. The other two are the median (what number is actually at the center) and the mode (what number occurs the most often). I give this bit of reference as we’ll probably see them later.
Like most critical thinking exercises, if you can see the underlying data or at least know the sample population and logic used, you might find the number is just fine. If you can’t find the data or no logic is given, there’s that flag to say it might be suspect. This is especially frustrating when you hear averages thrown around on a news program and you have no way to check it. If you find yourself with the chance to question the data, go ahead, on average you might find the answers enlightening.