All Things in Moderation…The Problem With Gymnastics Judging
All things in moderation…
Eating, drinking, exercising, talking, working, playing, and…judging gymnastics?
Judging in gymnastics has been a problem for as long as the sport has existed. Despite efforts to make judging more objective through more concrete point values for skills and fewer subjective categories, judging controversies are as alive today as they ever were. Though we often tend to blame judging issues on changes in the code of points, the truth is that they have always been there, and there are other factors at play besides the rules themselves. After studying the sport closely for 20 years now – as a gymnast, coach, fan, and a judge myself, I’ve reached the conclusion that there are some psychological phenomena that cause the frequent absurdities in judging that drive us all crazy. In this article I’ll discuss one phenomenon that I think plays a fascinating role not only in judging, but in life itself…”regression toward the mean.”
Regression toward the mean is a mathematical and statistical phenomenon that has been described for many years. Here are a couple of definitions:
Regression toward the mean is the tendency for subsequent observations of a random variable to be closer to its mean (its “average”).
Regression toward the mean refers to the fact that those with extreme scores on any measure at one point in time will, for purely statistical reasons, probably have less extreme scores the next time they are tested.
Though it is most commonly referred to in mathematical and statistical settings, the truth is that regression toward the mean is everywhere. Regression toward the mean is the reason why a basketball player who scores a stunning 50 points in one game probably isn’t going to repeat that feat in the next game, and it’s why a golfer doesn’t get an eagle twice in a row. It’s how casinos in Las Vegas make so much money, and why you should quit gambling almost as soon as you start winning because your good luck will magically “expire” as quickly as it appeared. It’s the reason why a kid who gets his first 100 on a difficult test is probably going to score a little lower on the next one. It’s why the day after the “best day of your life” might bring a cloud or two, and why the day after the “worst day of your life” usually brings a welcomed ray of sunshine. It’s why after doing the best routine of his/her career, a gymnast’s next event is probably a little less spectacular, and after having the best competition of a gymnast’s life, he/she can probably expect a little dose of reality at the next outing. And it’s also why a gymnast should actually feel very encouraged after having the worst meet ever, because things are likely to be dramatically better the next time around. And, believe it or not, I think it’s why gymnastics execution scores tend to be too close together, despite wide ranges of performances.
Many experts and fans have noticed for years that most gymnastics judges have an innate tendency to AVOID GIVING EXTREME SCORES, regardless of the performance. And lest you think this mindset only exists to avoid being “thrown out” of the final average (in national and international competition, for example), I can assure you that I see it just as much in small junior competitions, where often only one or two judges exist, and all scores count.
I first became interested in the regression toward the mean phenomenon in judging when I began judging junior boys’ gymnastics almost eight years ago. As I judged alongside many other judges around the country, I began to notice a trend that I just couldn’t quite understand. I found that often my scores on the bad routines were often a little lower than the other judges, and my scores on the great routines were often a little higher than the other judges. In general, I always felt that routines were not being separated NEARLY enough, and that we often trapped ourselves into giving almost identical scores to routines that were clearly very different in quality. Kids with loose form and poor body positions throughout an entire routine were given 8.4’s (under the old rules), while kids with a couple of noticeable mistakes but beautiful form and body line otherwise were given 8.6’s. Even WORSE, kids who looked like 9.8’s – in a completely different league than the previous ones – were nitpicked down to maybe a low 9, receiving petty deductions that weren’t even CONSIDERED during the more mediocre kids’ routines. Typical results included most of the scores in the 8-range, and routines that were probably 2-3 points better than the others scored mere tenths – or at most one point – higher.
I often felt that many of the judges only cared about trying to rank the kids correctly, with little or no regard to the actual separation between the scores. Scores of 8.9, 8.8, and 8.7 were considered fine as long as they were in the correct order, even if the more appropriate scores would have been 9.5, 8.6, and 7.9! Even in the unlikely event that individual event rankings turned out correct under this approach, it doesn’t take a mathematical genius to realize that all-around results were going to be seriously skewed because of these inappropriate separations of scores.
So what was going on here? It’s regression toward the mean at work. The phenomenon is so pervasive in ours lives and in the world around us that it subconsciously affects many of our actions…such as judging gymnastics routines. As it manifests in gymnastics judging, I also like to call this mystery “The Moving Target Phenomenon.” So what’s the target, and why is it moving?
To be continued…
This is a very interesting post. My son’s first year of team was last year, and he was level 4. Several times his scores were questionable, compared to other boys’, and given their actual routines. I’m sure this happens all the time. I have to say that I think most of the scoring was pretty fair. But sometimes I think you are right about judges’ reluctance to give high or low scores. It was surprising sometimes how little variation there was among the actual scores themselves. A judge might tend to be a high or a low scorer, but the spread was rarely great within the scores. I had never thought of this reason before. Thanks for the insight.