login join help ad

November 08, 2008

WHY THE ELEVEND IS INHERENTLY ACCURATE

As we developed our new product, the EleVend, there was some concern about the overall accuracy of the count of items issued, due mostly to the random behavior of a stack of gloves. One glove would have a thumb sticking up, which would break the sensor beam sooner than the next glove, which had its thumb down, thereby reporting different measurements for the same glove count, which translates to an inaccurate count of gloves taken. The fear was that this inaccuracy would build up over time, seriously calling into question the value of the system.

My assertion was that since the process we use is a classic example of  regression to the mean, the count would actually fix itself over time, and would be more accurate the more it was used.

Blank stares.

One of my favorite writers, John Derbyshire, 'splains it to you. Click below where it says 'more'.

Note: this effect does not address the separate issue of accurately charging an individual for the gloves taken. We addressed that by employing two sensors, requiring that both break before counting, greatly reducing the chances of random errors.

So buy one, already.

Math Corner. If you’re assumed to have any kind of math understanding, there are certain questions you get asked about time and again. I’m going to give over this month’s math corner to explain one of those things: “Regression to the mean.” What is it? I get asked. How does it work? Where does it apply? And so on. O.K., here goes.

The first thing to be said is that a statistician will rap you across the knuckles with a ruler if you say “regression to the mean.” While rapping, he will bark angrily: “It’s ‘regression towards the mean,’ you innumerate dolt!"

In defiance of that statistician, I shall show you an actual case of regression to the mean. Our statistician is right, and regression to the mean is an artificial situation, a hypothetical extreme. Sometimes, though, a phenomenon is more easily understood from its extremes, however unrealistic.

First, the mise en scène. You are standing on a podium in the open air. Think of it as a general’s reviewing platform. In front of you are your “troops,” lined up in ranks and files. There are a million of them, precisely a million. They await your word of command. Oh, did I mention that each one is holding a fair coin? — a quarter, say. Here comes your word of command.

“Listen up! I want each and every one of you to toss his coin a hundred times and count the heads. When you have finished, orderlies will come among you to collect your results. Each person will report his name to the orderly, along with a single number — the number of heads you got in your hundred tosses. This will, of course, be some whole number between zero and a hundred. Got it? Right — begin!”

When they are through tossing and the lists — name, number of heads — have come back to you, you rank them, from the fewest heads to the most. What does this ranked list look like?

Well, it will be what mathematicians call a “binomial distribution.” The handy little BINOMDIST function in Microsoft Excel does the work for us. BINOMDIST(40,100,0.5,FALSE)*1000000 tells us the number of people who will “score” exactly 40 heads, for example: 10,843.87 on average — that is, if we were to do this entire thing many many times and average out the results. Similarly, BINOMDIST(29,100,0.5,TRUE)*1000000 tells us that 16.08 people, on average, will get 29 heads or fewer.

Sticking close to these “expected” average numbers delivered by my pal BINOMDIST, I'm going to say that my particular trial delivered the following numbers of people with really high scores:

4,473 people scored 62 heads
2,698 people scored 63 heads
1,560 people scored 64 heads
864 people scored 65 heads
458 people scored 66 heads
232 people scored 67 heads
113 people scored 68 heads
52 people scored 69 heads
23 people scored 70 heads
nine people scored 71 heads
four people scored 72 heads
two people scored 73 heads
one person scored 74 heads
These 10,489 people — a tad more than one percent of our total population of “competitors” — are the high-scoring stars! Their average score is 63.22! High fliers!

OK, we shall now concentrate on this 10,489 sub-population of high scorers, dismissing the other 989,511 mediocrities. You can keep the quarters, guys!

So now I’m up on my reviewing stand with my bull horn, addressing my 10,489 high fliers — a group with an average score better than 63 heads in a hundred coin tosses. My instructions to them are precisely the same as my instructions to the original million: Each of you toss that coin a hundred times, note the number of heads you got, then report that number, with your name, to the orderlies when they come round.

This new drill takes place. We collect the results. We look at them. What do we see? What, for example, will be the average number of heads?

Why, it’ll be 50, of course! Why would it be anything else? These “high scorers” have, in fact, no particular ability (assuming the coins are fair and fairly tossed, which I am assuming). They just got lucky the first time around. By the iron laws of chance, they are no more likely to be lucky the second time around, than anyone else.

So the average of this high-flying group went from over 63 on the first drill, to 50 on the second. That’s regression to the mean.

This generalizes to any situation where (a) some process with a measurable outcome is being iterated (that is, repeated over again), and (b) the process is to some degree random.

In my little thought experiment here, the process generating the measurable (number of heads) was perfectly random, so we got regression all the way back to the mean. In real-world situations, there is usually some non-randomness mixed in, so you don’t get regression all the way back to the mean, only a part of the way back towards the mean. That is what caused the statistician to bark.

In sexual reproduction, for example, a new genome (the baby’s) is produced by mixing half the father’s genome with half the mother’s. Which genes get selected for each of those halves is to some degree random (as of course was the father’s choice of a wife, and the mother’s choice of a husband), to there will be regression towards the mean. That’s why — on average, of course — short people have kids who are also short, but not as short as the parents; tall people have kids who are also tall, but not as tall as the parents; smart people have smart kids, but the kids are not, on average, as smart as their parents, … and so on. The shortness, tallness, smartness, etc. are to some degree chance effects, like getting 70 heads in a hundred coin tosses.

Regression towards the mean. Got it? There'll be a quiz period on Monday.

Note also the following point, which even some scientifically-sophisticated people miss. Regression towards the mean is a perfectly general arithmetical-statistical phenomenon. It is by no means just a phenomenon of genetics. It shows up in other areas — in industrial quality control, for example. It’s math, not biology. Otherwise this section would be called “Biology Corner.” See?

Posted by: JBD at 07:26 PM | Comments (1) | Add Comment
Post contains 1253 words, total size 8 kb.

1 Interesting, so know we know why it is so accurate, the emptier the bin gets.

Posted by: Pix at Wednesday, November 19 2008 09:16 PM (2yD0p)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
13kb generated in CPU 0.0183, elapsed 0.0348 seconds.
25 queries taking 0.0263 seconds, 30 records returned.
Powered by Minx 1.1.6c-pink.