News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

Räcer performance modelling: an update

Started by Duplode, September 25, 2021, 06:39:25 AM

Previous topic - Next topic

Duplode

Here is a long-delayed progress update on my race strength estimation project, more specifically on its performance modelling side. Besides improving my ratings framework, the findings I will talk about here might even bring extra insight about racing in practice. How so? Let's find out!

What do you mean by "performance modelling"?

I will start with a recap of what I was busy with at the beginning of the year. The primary goal back then was setting up a metric for the strength of races. To a first approximation, how strong a race is depends on how strong its field of pipsqueaks is -- how many people take part, and how well they are racing at the time. The latter part can be dealt with an Elo rating. However, while Elo ratings can be a decent indicator of the expected performance of pipsqueaks, they don't easily translate to a meaningful rating for race strengths. That's because the Elo system is based on head-to-head matches, while in a race everyone involved is competing at once. For that reason, my chosen strategy was using the Elo ratings to tune probabilistic models of how well each pipsqueak was likely to do in a certain race, and use said models to compute how well someone would be likely to do if entering that race.

To define the model, I used the one probability distribution with sensible characteristics that I happened to recall. It looks like this:



The horizontal coordinate is for lap times, in arbitrary units since concrete information about the tracks is not being dealt with. Time zero, at the leftmost edge of the plot, stands for the ideal lap, and so moving rightwards means increasing deviations from the perfect lap. The vertical coordinate is for the probability (strictly speaking, the probability density) of reaching each lap time. In the example plot above, there are distributions for two pipsqueaks. The red curve pipsqueak is expected (though not guaranteed) to perform better than the blue curve one, as their probability density is squeezed closer to zero overall. Given these two distributions, winning probabilities for red and blue can be obtained; with more pipsqueaks, it is also possible to calculate things like the probability of one pipsqueak reaching a top five result against the others (that is, losing to at most four of them).

It is also worth reviewing what I meant by this model having "sensible characteristics". Firstly, the probability density at zero time is zero, which means the perfect lap is taken as an unattainable ideal. Secondly, the drop leftwards from the peak reflects how improvement becomes increasingly hard as the perfect lap is approached. Thirdly, the gradual decay on the tail to the right of the peak expresses the range of possible mistakes that might affect a lap, even quite big ones.

OK, so what's new?

I managed to improve a lot my approach after learning that the model I had chosen is a gamma distribution. The chart below, taken from Wikipedia, shows a few example distributions from that family of probability distributions:


Gamma distribution pdf
Cburnett, CC BY-SA 3.0 <http://creativecommons.org/licenses/by-sa/3.0/>, via Wikimedia Commons


In particular, the blue curve here with k=3 is (save for a scaling factor) the same as the distributions I had been using (note the visual similiarity with the curves from my example plot above). from my example plot. k, the shape parameter, controls the shape of the curve: distributions with larger k are more symmetrical around the peak, and drop off more sharply towards zero on its left. (The other parameter, theta, is the scale parameter, controls how compressed or stretched out the curve is -- it is the one I let vary according to the Elo ratings, while keeping the shape parameter fixed.)

The Wikipedia examples also illustrate how setting k=1 collapses the distribution a plain exponential. Exponential distributions could conceivably be used instead of the more plausible-looking ones with higher k that I chose. The interesting thing about that possibility is how the winning probabilities for a pair of pipsqueaks with exponential distributions of lap times look exactly like those in the Elo formulas, with the scale parameter being inversely proportional to the Elo rating. Going the other way around, it is possible to replace the conventional Elo winning probability formula with the one for my favoured k=3 model and, with a few tweaks to the system to keep things in scale, have everything working as smoothly as before -- except without the need for an awkward conversion from the Elo ratings to a seemingly unrelated model, and with a much better understanding of what is going on.

And why should any of that be taken seriously?

So far all I have reported here was me toying with equations, with nothing to show that the talk about performance models corresponds to reality. At this point, a little serendipity comes into play.

One of the Stunts experiments I wanted to do back in January was investigating the cornering speed differences between left and right turns. As alluded to elsewhere, I prepared two minimal four-large-corners tracks, one clockwise and one anticlockwise, and went about driving them repeatedly with the Corvette. After seeing no difference whatsoever over quite a few attempts, it dawned on me that I had been doing the tests with the 1990 Mindscape version for no particular reason (I happened to be playing on that version for the DOSReloaded.de competition at the time). Once I switched to 1991 Broderbund the difference immediately became obvious. In any case, at that point I already had recorded a few dozen lap times around those tracks on 1990 Mindscape, so I figured I might as well keep driving a few laps per day until I had enough data to see if the empirical distribution of lap times actually looks like a gamma distribution...

The plots below show the histogram of the lap times (224 completed laps -- attempts in which I left the track or crashed were discarded) and the fitting of a gamma distribution to it. In order to fit a gamma distribution to the data, it is necessary to guess what the ideal lap time is, so that the zero time on the model can be set. I have assumed 23.25 (which is 0.05 below the best I managed) as the ideal lap time, and so the "data"/"quantile" values on the plots should be read as time gaps to 23.25.



That looks fantastic! Incredibly, even the fitted value for the shape parameter was 3.35, which is pretty close to my almost arbitrary initial choice of k=3 for the model. (I should note that a fitted k further away fro 3 wouldn't actually be a problem, for reasons I will soon get to; what truly matters is that a gamma distribution is a reasonable fit for the observed lap times.)

What else there is to investigate?

Given what I have learned so far about the model (through not only the investigation described here but also various attempts at tuning my rating system), I believe gamma distributions with shape parameter 3 form a good basis for an Elo-like ranking of pipsqueaks and a race strength metric derived from it. It remains a pretty bare bones model, with the limitations of using a single parameter as an evolving rating that stands for pipsqueak performances that I had alluded to in my earlier post about race strengths remaining the same. Furthermore, in spite of the incredibly good results I obtained in my attempt to fit a gamma distribution to real data, there is a large gap between a series of single laps by a specific person on a very simple track and whatever happens when people take part in a real race. Depending on what one wants to use a performance model for, ways in which this gap might be narrowed become relevant. Here I will briefly discuss two aspects of this matter: tracks and repeated attempts.

When it comes to tracks, the Elo-like rating system I am working with is entirely indifferent: races are seen as events in which pipsqueaks compete and are ranked, with everything else, including the nature of the track on which they compete, being abstracted away. Turning our attention to the performance model, though, we might consider how the probability distribution of lap times might change according to the track. One way to approach that invokes an useful property of gamma distributions. Suppose we have, rather than a lap time distribution, gamma distributions for the section times that make up a lap, with the distributions having the same scale parameter but possibly different shape parameters. In that case, the lap time distribution will also be a gamma distribution, and its shape parameter will be the sum of the shape parameters for the sections. That points to a way of accounting for basic track characteristics in the performance model, namely using higher shape parameters for longer and/or harder tracks. As I mentioned earlier, higher shape parameters mean a sharper drop in the probability density as we move from the peak towards the ideal time, which intuitively reflects increased difficulty. (One way of thinking about this is seeing each track section as a potential source of mistakes which increase the lap time, and the overall lap time distribution as the result of combining these sources of mistakes.)

As for the matter of repeated attempts, we know that even in a NoRH race what reaches the scoreboard is not the result of a single attempt, but the best result from a set of completed laps. That being so, if we want to use the performance model to think about, for instance, racing strategy, it would make sense to consider, given a single attempt lap time distribution, what the distribution for the best lap time out of a number of completed attempts would look like. While I haven't worked out the details yet, playing with the formulas suggests the resulting distribution is very similar but not quite precisely a gamma distribution, but pretty close. As the number of attempts gets larger, the distribution becomes more symmetrical -- much like it happens when the shape parameter grows in a gamma distribution -- and gets squeezed towards zero -- as expected, as repeated attempts are supposed to improve the results! How fast do the results improve as the number of attempts increase, one might wonder? The preliminary calculations I have done starting from a k=3 single lap distribution suggest the expected gap to the ideal lap time is, approximately, inversely proportional to the cubic root of the number of attempts. In other words, if you want to cut by half the gap to the ideal time, you should be ready to try eight times as many! Note that I'm assuming 3 as the initial shape parameter, which, as the results of my four-corners minimal track suggest, should amount to a fairly easy track; for harder tracks the foreseen number of attempts should grow even faster.

There should be plenty of interesting questions like those I have just mentioned that might be posed in terms of a performance model; I'm all ears for your suggestions!

Cas

Your analysis is getting much more precise and refined!
Reading what you say about how track sections could be analysed separately and how it's mostly about the opportunity to make mistakes that add to the total lap time, I imagine a pipsqueak's lap time as the perfect lap, plus the delay cause by each individual mistake. With this in mind and the division into sections, I can't help but considering that what obstacles are present on track has a lot to do with how many mistakes we make and of what magnitude. Then I wonder if this could be used to analyse the impact of each individual track element (compared to a straightway of the same length) has on the driven time on a track. Of course, changing a straightway to a stunt not only adds "mistake time", but also "real perfect time" to the track's total perfect lap time. Then, if we make the measurement with many pipsqueaks, we would be able to isolate one from the other by comparing. In other words, we would be able to estimate that ideal perfect lap as well as the intrinsic added lap time for each stunt and, on the other end, get a magnitude to tell us how easy or hard it is for each pipsqueak to handle a certain stunt. Of course, with free-style races, it becomes a lot more complicated because detours are very hard to put in an equation. But would this be feasible with OWOOT?
Earth is my country. Science is my religion.

Duplode

Quote from: Cas on September 26, 2021, 01:47:00 AM
With this in mind and the division into sections, I can't help but considering that what obstacles are present on track has a lot to do with how many mistakes we make and of what magnitude. Then I wonder if this could be used to analyse the impact of each individual track element (compared to a straightway of the same length) has on the driven time on a track.

Yup, that's the spirit. With carefully designed experiments, we might be able to obtain the gamma shape/k parameter for different elements/combos/mini-sectors, and combine them to obtain a k for a whole track. I think the main issue to be dealt with would be granularity: how much can we divide a track with it still being reasonable to consider the sectors independent from each other? There is also the likely scenario of individual pipsqueaks obtaining somewhat different k values for the same combo due to different driving styles, though I guess that would be easier to paper over.

(On the matter of ideal lap times, and going in a slightly different direction, I have a few ideas on estimating them after a race, given the relevant lap times and pipsqueak gamma curves. I will write about that once I make some more progress with those plans.)