Monday, September 13, 2010

Caveats with ratios by Sstoz Tes

 Here is a following up by Mr. Tes in regards to an earlier post to address some of the questions that came up when you compare different cross country courses.
By taking overall averages, I papered-over all sorts of important
sub-populations whose ratios are substantially different from the
overall picture. The basic assumption for producers and consumers alike
is that the sub-populations all have a close-enough ratio to the overall
population to make it all come out in the wash. Depending one one’s
attachment to accuracy and precision, this may or may not be the case.

The primary problem with calculating for sub-populations is that,
despite a population of over 9 000 data points, sub-categorizations can
easily become too small to be reliable. With that in mind, I have delved
into the results to see what makes it through that wash.

Because of interest in the S.S. state-meet qualifier and because it has
by far the largest population (2 529, compared to the 2nd largest at 1
350), I began to first reduce those results in search of more refined
ratios. I came up with some surprising results. Note that I have not
taken the time to do null hypothesis testing for the below. It has been
too long and I’d have to refresh my memory. If anyone is interested
enough to have me do formal testing, I’ll be happy to do it.

The first delving into sub-populations is the most obvious: separating
the ratios of boys and girls to the state-meet course.

boys: girls:
mu: 1.01863 1.00837
sigma: 0.02307 0.02485

The boys’s ratio is over 2,2 times greater than that of the girls. A
coach looking to predict his/her boys and/or girls team(s)
performance(s) would be led astray by using the overall ratio of
1,01377:1. S/he would predict times too fast for his/her boys and too
slow for his/her girls. Despite the relatively large difference in the
ratios, the error would amount to all of 4-seconds for the typical boy
and 6-seconds for the typical girl. Such are the margins between
disappointment and elation, particularly in the hyper-competitive team
battles. It is also interesting that the sigma shoots up, making the
confidence intervals even less useful.

Another obvious sub-population is class. Note that this is a sub-group
of a sub-group. The smallest sub-grouping (9th grade boys) has a
population of 69.

boys girls
9: 1.01931 1.01127
10: 1.01819 1.00814
11: 1.01786 1.00795
12: 1.01935 1.00748

Given the nerves associated with the state-meet, I am surprised that,
with the exception of the 9th grade girls, there is no apparent trend
here. Though it is conventional wisdom that experience helps with
state-meet performance, perhaps this is not a major factor, or if it is
perhaps the trend is hidden by the relative state-meet experience of the
different classes. Though it is marginally relevant, I have put in the
below tables to show the breakdown of first time state-meet particpants
in any given race at the state-meet:

boys girls
1st state-meet 10th graders: 23,2% 27,3%
1st state-meet 11th graders: 30,1% 22,7%
1st state-meet 12th graders: 32,6% 18,5%

1st state-meet: 56,7% 54,8%
2nd state-meet: 24,9% 22,6%
3rd state-meet: 10,5% 12,0%
4th state-meet: 03,1% 06,0%
unknown: 04,8% 04,7%

Age or relative mental maturity might be other factors associated with
class and performance, but there are few things in the world less
mentally mature than a 9th grade boy, and they seem to perform at par
with the other grades. Perhaps the 9th graders who make it to the
state-meet are a special breed.

Parsing out the ratios of first time state-meet participants might yield
a worthwhile variable, but for now that is a bridge too far (for me) =)

A sub-group that M’r Beal wondered about was “packs,” particularly in
girls’s races. His qualitative observation is that those in the back of
the pack have a relatively difficult time at M’t SAC, and so their ratio
to the state-meet would be lower (indicating a relatively slow time at
the state-meet qualifier). Perhaps other state-meet qualifiers with
courses less difficult than M’t SAC would not show this difference.

z-score (range) boys girls
-1.6 -1.69 1.02994 1.02158
-1.5 -1.59 1.02582 1.02024
-1.4 -1.49 1.02648 1.02264
-1.3 -1.39 1.02628 1.02092
-1.2 -1.29 1.02480 1.02105
-1.1 -1.19 1.02125 1.02137
-1,0 -1.09 1.03167 1.01484
-0.9 -0.99 1.02628 1.01877
-0.8 -0.89 1.02734 1.01399
-0.7 -0.79 1.02697 1.01426
-0.6 -0.69 1.02302 1.01082
-0.5 -0.59 1.02587 1.01577
-0.4 -0.49 1.02393 1.01284
-0.3 -0.39 1.02068 1.01066
-0.2 -0.29 1.02115 1.00870
-0.1 -0.19 1.02028 1.01058
0,0 -0.09 1.01685 1.01282
0,0 0.09 1.01853 1.00716
0.1 0.19 1.01705 1.00423
0.2 0.29 1.01579 1.00938
0.3 0.39 1.01438 1.00464
0.4 0.49 1.01720 1.01355
0.5 0.59 1.01199 0.99675
0.6 0.69 1.01215 1.00136
0.7 0.79 1.01280 0.99927
0.8 0.89 1.01136 1.00085
0.9 0.99 1.01655 0.99942
1,0 1.09 1.01399 0.98704
1.1 1.19 1.01192 1.00213
1.2 1.29 1.01453 0.98729
1.3 1.39 1.02321 0.99459
1.4 1.49 1.00753 1.00358
1.5 1.59 1.00434 1.00899
1.6 1.69 1.00813 1.00359
1.7 1.79 0.99861 1.00034
1.8 1.89 1.00730 1.00897
1.9 1.99 0.99676 0.99489
2,0 2.09 0.99190 0.98883

I tested the packs by segregating the population into boys and girls,
then ordering the results by time at the state-meet qualifier. I then
calculated a z-score for each runner’s state-meet qualifier time (a
z-score simply places a single data-point in its proper place on a
distribution; nearly all data fits between -1,96 and +1,96). I bundled
runners into z-score intervals of 0,1, only graphing those groups with
>/= 10 athletes. Using the z-score as a basis for segregation has the
advantage of putting like-quality runners into the same group in a more
systematic fashion then percentiles. Rather than a grouping being
exclusively based on numerical order (i.e. the 90th percentile being the
fastest 127 runners, even if there is a substantial drop-off in quality
at, say, the 120th place), the z-score is exclusively based off of the
quality of a performance (i.e. the grouping between 1,00 - 1,09 standard

For the girls, there is a steady trend down in the ratio all the way
from the fastest (at z = -1,69) to z = +1,3 group, which in this
distribution encompasses over 91% of the total population of girls. This
is to say that the state-meet qualifier is relatively slower (or the
state-meet relatively fast) the slower runner you are, as M’r Beal
suggested. A more detailed look shows that the fastest girls’s ratios
are on par with the overall boys’s--a bit over 1,02:1. By z =0,0, the
ratio drops below 1,02:1, then steadily erodes to less than 0,99:1
through z = +1,29. From z = +1,3 - +1,69 (n = 45), though, the trend
steadily reverses, averaging 1,0074:1 (this is still below the overall
girls’s average of 1,00837:1). From z = +1,7 through the end of the
distribution (n = 66, and which extends all the way out to +5,86!), the
downward trend re-asserts itself, averaging 0,98:1. Besides the small
group from z = +1,3 - +1,69, M’r Beal’s observation
appears correct.

For the boys, the overall trend goes from 1,03:1 down to 0,99:1. There
is one anomaly that is difficult to account for, though this is probably
influenced by the relatively small numbers involved. From z = -1.69 -
-1,0 (n = 158), the ratio trends steadily down from 1,030:1 to 1,021:1,
but then jumps to 1,032:1. After this, the ratio again steadily drops,
eventually to 1,01:1 through z = +0,89 (n = 912). It then flattens or
slightly rises through z = +1,3 (n = 88), then drops again, ending at
just above 0,99:1 (n = 108). Again, besides a small group, M’r Beal’s
observation that slower runners run relatively slower at the S.S.
state-meet qualifier (or, possibly, faster at the C.I.F. state-meet)
appears to be true.

A last potential variable that I tested for was divisional ratios. I
assumed going in that the smaller schools would have lower ratios, if
only because they have slower runners and, as seen above, slower runners
run relatively slower at M’t SAC. Though in both boys and girls there
is a trend down from d. 1 to d. 5, it is not as smooth or prominent as I
expected, particularly for the boys. The most prominent example of a
difference in ratio is that of the d. 5 girls, whose ratio is 1,002:1
compared to 1,017:1 for d. 1 & d. 2 girls. This sudden drop is not
surprising since the d. 5 girls run 13% - 14% slower than d. 1. It is
surprising that the d. 5 boys, a group that runs 9% - 10% slower than
the d. 1 boys, do not show more of a drop-off.

mu : mu CIF mu SS
d. 1 b.: 1.02050 00:16:6 00:15:47
d. 2 b.: 1.01801 00:16:31 00:16:14
d. 3 b.: 1.02336 00:16:39 00:16:16
d. 4 b.: 1.01518 00:17:14 00:16:59
d. 5 b.: 1.01634 00:17:37 00:17:21

mu : mu CIF mu SS
d. 1 g.: 1.01073 00:19:6 00:18:55
d. 2 g.: 1.01068 00:19:6 00:18:55
d. 3 g.: 1.00821 00:19:45 00:19:36
d. 4 g.: 1.00926 00:20:26 00:20:15
d. 5 g: 1.00219 00:21:35 00:21:33

The only variables that seem to affect the ratio between the SS
state-meet qualifier and the state-meet are sex and a runner’s time at
the state-meet qualifier. It is possible that there is a weak
relationship by division, and no apparent relationship based on class.
It would be interesting to test whether relative experience affects


Anonymous said...

do ya think you have now totally over analyzed the stats instead of just waiting to see what the kids run this year? Lets wait and see and enjoy the sport.

Albert Caruana said...

Those stats are based on past results and you would be surprised how accurate they are for the general running population.

But the great part of our sport (or any other sport for that matter) is that there are other factors that come in to play.

Popular Posts