Tuesday, September 07, 2010

XC Course comparisons by Sstoz Tes

As compiled by Sstoz Tes.  I won't even pretend to claim that I understand some of the math in the following but for those that love stats and comparisons, the following data is for you.  An amazing amount of time and effort went into the following as you will see.  A public thank you to Mr. Tes for his work.

You can also check out his state meet data from 1987 to 2008 at the following link:
================================================
Here, finally, are correspondences between state meet qualifying courses and the state meet course. My method was simple if painstaking: I compared the times of each state-meet qualifier to his/her state-meet qualifying time. This gave a population of over 9 000, though this ranged from 87 for the O.C.S. to 2 529 for the S.S. I segregated only by section, not by sex, class, or other potentially important variables. Perhaps I will take on that task in the near future.

There are several notes to go along with the results:

-These course comparisons are valid only for comparing the state qualifying meet to the state meet. If, for example, an athlete wished to know what his/her time at the M't S.A.C. Invitational corresponded to at the Clovis Invitational, calculations based on my work would be flawed. The time frame is important: weather changes, kids get in better shape or get injured or get sick.

-The amount of variation, as measured by the standard deviation, is huge for all of these courses. This is to say that using an exact ratio between courses is only part of the story. This is why I have included confidence intervals (c.i.), which allow one to give a range of  expected outcomes. I used a c.i. of 95% alpha/2 (i.e. in 100 samples, one would expect an individual's time to fall within the calculated range 95 times). Because I used such a rigorous c.i., the ranges aren't terribly useful. For example, a time of 19:47 at the West Valley (N.S.) course would typically result in a time of 19:50 at the state meet. As a matter of full disclosure, though, one would have to tell that runner that 95 out of 100 times s/he would run between 18:54 and 20:46. Obviously, that is such a wide range as to be useless. A less rigorous c.i. wouldn't be terribly useful, either, since it would be less reliable. The main point is that anyone using these conversions should know that there is a lot of variability.

-As I mentioned above, I did not test for different variables that could plausibly affect the ratios, including for different classes, boys separate from girls, different divisions, time of day of race. Nor did I check weather data for race days. Certain courses, such as Hayward, are particularly affected by moisture, even dew.  The climate in the Walnut area tends to be good for running in November, but has gone over 305*K on at least one occasion. Running in anything over 295*K begins to affect running performance.

I went back only to 2005 because an independent variable, qualifying procedure, lurked in previous years. I could have gone back to 2004, but that year had another independent variable: rain.

Each course and section has its own characteristics:

C.C.S. Crystal Springs: This course has an extravagantly recorded  history. The ratio I came up with may vary from other ratios, though, because mine is exclusively based on the state-meet qualifier and the state meet. One anomaly for which I do not have an answer is the discrepancy between 2005 & 2007, both of which had correspondences of 3,1%, and that of 2009, which was 2,6%.  Perhaps Crystal Springs was wet in 2009?

C.C.S. Toro Park:  Because the C.C.S. is the only section to actually follow through on its commitment to hosting the state-meet qualifier on a rotating schedule, I have only two years of data on this course. Most likely as a result of extreme heat in 2008 (306° K by one report), there is little correspondence between the  2006 ratio, in which the course ran 1,3% faster than the state-meet, and the 2008 ratio, in which it ran 2,2% slower. The 2006 ratio is probably what one would typically expect, but until more data becomes available there is no way to know if it, too, was not somehow an anomaly.

C.S. Woodward Park: This is one of the most interesting correspondences. One would expect that the C.S. state-meet qualifier, held at Woodward Park, would have a 1:1 ratio or, if not that, athletes would run slightly faster at the state meet (since it is held 16 days before the state-meet). Instead, athletes typically run 1,0% slower at the state meet. This suggests a "state-meet effect," in which some variable associated with the state-meet or the state-meet qualifier influences the outcome. This could be nerves or an excessive interval between races or the higher level of competition. It would be interesting to measure if this effect is unique to the C.S. or is observable in other sections, too. An unrelated note: At least once in the past 10 years the C.S. held its state-meet qualifier at a different course. All results that I used listed Woodward Park as the course being used, but it is possible that one year contains data from another course.

L.A.C.S. Pierce College: This course showed a disturbing amount of variability. In 2005 & 2007 the ratio was over 5% faster than the state-meet, in 2008 & 2009 it was 3,9%, and in 2006 it was 3,1%. I do not know how well-established the course's route is. I also have no data if the other above-listed variables, particularly wind, temperature and moisture, might have influenced any of the state-meet qualifiers at Pierce College.

N.C.S. Hayward High School: Though this course typically runs a bit over 3% faster than the state-meet, in 1999 it ran close to a 1:1 ratio with the state-meet. It did not rain that year, but there was mist in the morning that left a heavy dew on the grass. Because roughly 2km of the 4,8km course is on thick grass, the amount of dew becomes a factor in the ratio. Most years, early morning races have at least some dew on the grass while later races rarely do. There can be soft spots on the course, too, but these are probably not a major factor. The variability from year to year (2,2% - 3,5%) is surprising. A more detailed study accounting for the weather on race day would be a beginning point to explaining this variability.

N.S. West Valley High School: This course has run admirably tight (as such things go), with a range of being 0,8% slower to 1,2% faster than the state-meet, and an overall median of 0,3% faster.

O.C.S. Joaquin Miller Park: This course runs a full 12% slower than the state meet course.  This course is the only one for which I had to employ samples (because I could not locate full results for 2009). The effects of this are compounded by the small numbers of samples (87). Despite this, the ratios for 2007-2009 are very consistent.

S.D.S. Balboa Park: The S.D.S. is the only section to have girls and boys run different distances. The boys run a tough 5,0km course that is 3,3% slower than the state-meet, the girls run 4,4km, which runs 11,2% faster than Woodward Park. Though this seems inequitable, in another sense it has a sound logic. When boys and girls run the same course, the girls typically end up being on the course 16% longer. If the girls' course is 16% shorter, then, the girls end up running for the same amount of time as the boys. The S.D.S. girls's course is only 12% shorter than that of the boys, and so the typical girl ends up being on the course for 4% longer than the boys.

S.F.C.S. Golden Gate park: Cross-country courses in Golden Gate Park are notorious for making small, seemingly insignificant changes from year to year. The course may go behind a hedge-row one year and in front another, there may be a sandy hill with a log-jump one year and a side trail with a gentle hill another. I don't know how carefully this course is set up from year to year, but there is enough variability (1,3% to 4,3%)  in the results to make one suspicious.

S.J.S. Willow Hills: Because of several indications of course changes over the years, I have little faith in the consistency of this course. Despite this, the results are consistent for 2006, 2008 & 2009.

S.S. Mount San Antonio College: The ratio for this course can be matched in consistency only by the C.S. course, which of course has the advantage of being on the same course as the state meet. This is no doubt due in part to the "tendency to the mean" with larger populations (the S.S. has had 2 529 qualifiers since 2005). The consistency of this ratio is surprising given the extreme heat that can afflict the area in November.

Other notes:

-The state meet course seems to have run relatively fast across sections in 2007. With the exception of the O.C.S. and the S.F.C.S., the ratio for 2007 was either the 2nd fastest or fastest year. 2009 holds this same distinction for 7 out of the 10 sections.

-The state-meet course can of course be a variable, too. The winds kicked up during the 2006 morning late, the trees at the curve at 500m were trimmed at some point, the natural curb at the sharp corner at 1,2km has steadily eroded over the years.  Despite this, there is no consistent pattern of changes in ratios across sections. A consistent pattern would, of course, indicate a change in the state-meet course.

Notes

How to interpret the below:

mu = average; A ratio >1 means an athlete can expect to run a faster time at the state meet, <1 means an athlete can expect to run a slower time at the state meet. Because running statistics are so loaded to the right end of the curve, I decided to take out the extremes in two phases. I was more or less trying to move the mu to be closer to the median. I can provide more detail if anyone wishes to know.

sigma = standard deviation; I also took out extremes here, using the same method as with the mu.

c.i. = confidence interval; I set this at 95%, alpha/2. This corresponds to a z-score  of +/-1,96.

n = number; these are all populations except for the O.C.S. (I could not find complete results for the 2009 state-meet qualifier). All of the populations are plenty large for statistical purposes, except the O.C.S. and S.F.C.S. Since these are populations, though, the number isn't so important, but it is still relevant because they will be applied to samples.

C.C.S. Crystal Springs (4,73km)
mu: 1.02913
sigma: 0.02027
c.i: 0.98940, 1.06885
n: 703

C.C.S. Toro Park (4,83km)
mu: 0.99499
sigma: 0.02675
c.i.: 0.94256, 1.04744
n: 505

C.S. Woodward Park (5,00km, though I keep measuring 4,97km!)
mu: 1.00981
sigma: 0.02446
c.i.: 1.05776, 0.96187
n: 925

l.A.C.S. Pierce College (4,83km)
mu: 1.04284
sigma: 0.02306
c.i.: 0.99764, 1.08804
n: 273

N.C.S. Haywaard High School (4,81km)
mu: 1.03010
sigma: 0.01789
c.i.: 0.99504, 1.06517
n: 1 350

N.S. West Valley High School (4,83km)
mu: 1.00255
sigma: 0.02401
c.i.: 0.95549, 1.04960
n: 530

O.C.S. Joaquin Miller Park (5,0xkm)
mu: 0.87991
sigma: 0.02160
c.i.: 0.83757, 0.92224
n: 87

S.D.S. boys Balboa Park (5,0xkm)
mu: 0.96786
sigma: 0.01909
c.i.: 0.93045, 1.00528
n: 544

S.D.S. girls Balboa Park (4,43km)
mu: 1.11205
sigma: 0.02504
c.i.: 1.06297, 1.16114
n: 534

S.F.C.S. Golden Gate Park (5,0xkm)
mu: 0.96790
sigma: 0.02259
c.i.: 0.92362, 1.01218
n: 106

S.J.S. Willow Hills (5,0xkm)
mu: 0.95774
sigma: 0.02174
c.i.: 0.91512, 1.00036
n: 998

S.S. Mount San Antonio College (4,67km)
mu: 1.01377
sigma: 0.01792
c.i.: 0.97864, 1.04890
n: 2 529

Comments?  Thoughts?  Feel free to chime in the comment section below.

17 comments:

URFasterThanMe said...

Great stuff! Thanks very much.

One clarification though please:

"mu = average; A ratio >1 means an athlete can expect to run a faster time at the state meet, <1 means an athlete can expect to run a slower time at the state meet"

Unless I am reading this wrong (quite possible!) there is a typo and the above doesn't match what I see in the course stats. For instance, SJS and OCS have courses harder than Woodward yet have values <1.

Sstoz Tes said...

URFasterThanMe:

Because ratios are by definition two sides of the same coin, they're always confusing, and explaining them usually adds to the embroilment! Two examples might help:

1. Flo runs 19:29 at the S.J.S. state-meet qualifier. This translates to 18:40 at the state-meet (19:29 * 0.95774 = 18:40)

2. Jo runs 19:29 at the state-meet. This translates to 20:21 at the S.J.S. state-meet qualifier (19:29 / 0.95774 = 20:21).

URFasterThanMe said...

Ah, then the "<1" and ">1" need to be reversed in the explanation text since SJS ratio is <1 and yet an athlete should expect to run faster at state meet.

Unknown said...

Wow, that Oakland course must be completely uphill! The ratios indicate 18:00 at state ~ 20:28 at CIF-OS right? Maybe a couple more years of data will change it (or maybe not).

One thing I notice when I look at stuff like this is that the second race is always just a little slower than the first (as you find for CA State and CIF-CS, despite both being at Woodward). I think there might be two good reasons for that, if not more.

Firstly, there is the issue of the field they are racing in. Better fields might push some runners to faster times, but at the same time kids that were towards the front in their sectional races are probably towards the middle of the pack in the State race. Those kids are kind of getting "pushed back", or might find it harder to move up in the pack when they feel ready to -- it's easier to make a move when you're in a group of 5-10 runners than it is to make a move in a pack of 40-50. And then, of course, there is the psychological effect that might happen when kids are used to being towards the front of every race, but now they find themselves in the middle or even towards the back of the field.

The second good reason would be that all the kids that qualified from their section race ran good races, the exceptions being those who only ran hard enough to qualify (though those would be in the minority in any competitive section). At state, those same kids that ran hard at sections might not run as well at state for various reasons (maybe it just wasn't their day, maybe they aren't as good on that course, maybe they got boxed in at some point, maybe there was some psychological reason, etc.) ....

Those reasons might help explain the unexpected differences, such as you see from the Central Section runners. Just a couple of things to consider.

Anyways, thanks for doing this, it is definitely great info!

Anonymous said...

Awesome work is all I can say.

Anonymous said...

Nice work. Please note that the Hayward High School course was NOT the same in 1999. There was a slight change due to some mounds of dirt on the grass loop above the track. Normally this loop is run once, but in 1999, this loop was run twice, I presume because it was smaller.

pmccrystle said...

This is awseome...it's lke the cast of Big Bang Theory just got into Cross Country!

Greg Beal said...

Having looked at loads of SS Finals to State data, I've seen conversions that seem counterintuitive. Generally speaking, faster girls have a higher conversion than slower girls. Since you have so much more data than I've ever looked at, test it out. Check girls less than 18:00 or lower, 18:01-19:00, 19:01-20:00, and greater than 20:00 at SS Finals versus their state times. Pretty sure the difference will decrease as Mt. Sac times increase.

To deal with time differences, girls under 18:00 at Mt. Sac tend to average about 20-30 seconds slower at State, while slower girls tend to average 10-20 seconds slower.

Be interesting to see what your data shows.

Anonymous said...

Huh?

Greg Beal said...

Sorry if I wasn't clear.

In 2009, girls averaging about 17:48 at SS Finals at Mt. Sac tended to run about 24 seconds slower at State.

Girls averaging 20:00 at SS Finals tended to run about 7 seconds slower at State.

I think if Sstoz looks at segments of his data he will find similar results.

Sstoz Tes said...

Rob: Excellent idea about the reason for C.S. athletes running slower at the state meet, particularly your 2nd. It's a matter of probabilities, which is a language I understand =)

Anonymous: Thanks for the clarification on the 1999 course alterations at Hayward. I was there, but I guess I wasn't really there!

M'r Beal: A quick calculation for the fastest 5% of S.S. female runners results in an average ratio of 1,0139:1 (state-meet:state-meet qualifier). The average ratio for the other 95% is 1,0081:1. I will do some old fashioned reduction as I awaken from the miasma of all this other calculating I've been doing ;)

Greg said...

While I posted this over at Track Talk, I should have repeated it here. This is amazing work! Thanks for all the data collection and analysis.

On the Central front, let me add to Rob's #2. Some teams focus on performing well at league; others on the section meet; and the best teams on state. Those teams that exceed expectations and perform well enough to advance are apt to perform poorly at the next level. Achieving a goal of running well at sections and just barely reaching state often means that the team will not run as well at state. This occurrence could also account for some of the discrepancies you're seeing between Woodward section and state races. (Obviously, this happens at all section races; it just doesn't show up so clearly.)

Greg said...

Looking forward to your additional comments on Mt. Sac. These new 5% and 95% numbers are in line with what I suggested, but don't make sense to me when compared with your orignal mu.

And just in case, people aren't following me, let me attempt try to demonstrate by example:

Two courses tend to have the following differences:

Runner . Course 1 . Course 2
. . A . . 15:00 . . . 15:15
. . B . . 16:00 . . . 16:16
. . C . . 17:00 . . . 17:17
. . D . . 18:00 . . . 18:18
. . E . . 19:00 . . . 19:19
. . F . . 20:00 . . . 20:20

But comparing Mt. Sac races with races on another course seems to be more like this:

Runner . Mt Sac . . Course 2
. . A . . 15:00 . . . 15:25
. . B . . 16:00 . . . 16:22
. . C . . 17:00 . . . 17:19
. . D . . 18:00 . . . 18:16
. . E . . 19:00 . . . 19:13
. . F . . 20:00 . . . 20:10

When comparing most courses, you expect the slower runners to slow down more than the faster runners. When comparing Mt. Sac, it doesn't work the same way.

It wouldn't surprise me if other tradtionally difficult courses show similar comparisons. The more difficult the course, the harder it is for slower runners to traverse. When the slower runners return to an "easier" course, they run better.

Mike Sherwood said...

Amazing work.

More fun for stats lovers: We've got graphs of the distribution of performances at 51 California courses. Runners can see where their time stacks up against the accumulated population of runners. We also do course time converters but it's just based on the average times of the thousands of races we have at those courses since 2006.

Mike Sherwood
www.xcstats.com

Sstoz Tes said...

M'r Beal: I've cooked up a lot of numbers for S.S. girls and, since I was at it anyway, S.S. boys. I want to make sure I have my ducks in order before they start quacking, so I will wait to put the information up until 2010-09-09 23:00 UTC-8.

In the mean-time, blow your mind at xcstats.com! Very slick.

Coach Ozzie said...

I ran in the 1999 ncs race and although it didn't rain that day, it had rained much of the week leading up to it. Although the course was not too bad to run across the field (a bit softer than normal, but not much), there were certain points around the outside of the field that were much more slippery than normal. The boys from Eureka had a day!

Greg said...

Looking forward to your getting those ducks in a row. I realized a little late that your course comparisons include all boys and girls. My discussion for Mt Sac to Woodward is only about the girls, so my mini-table may be misleading with those 15:00 times. I don't know what happens with the boys; I suspect their conversions may be more consistent across different ability levels.

It finally dawned on me as to why your mu for the SS girls seemed odd. It was because your original mu was for all runners.

I've visited xcstats and it looks great, though perhaps I haven't taken full advantage of what's available to an outsider. Don't know if Mike Sherwood has a sign-up plan for someone just wanting to look at and analyze results. Having all statewide cross country results available in a usuable format would be great. Guess I should email him.

Popular Posts