News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

Balanced bonuses estimation

Started by Duplode, November 02, 2024, 01:54:10 AM

Previous topic - Next topic

Duplode

Ever since ZakStunts started having multi-car races, the balanced bonuses question has been a matter of interest to pipsqueaks and track designers alike: which set of bonuses would give all cars even chances of winning? This post presents a what I believe to be a sensible, comprehensive way of answering that question. A list of bonuses estimated for most cars featured in ZakStunts can be found towards the end of the post. Before that, a few words about my methods and assumptions.



A couple pre-season rule reviews ago, I suggested figuring out balanced bonuses by comparing pairs of cars, sorting the races they were involved according to the ratios of the bonus multipliers (or the advantages, which is how I'm calling the differences of the multipliers' logarithms, and which are easier to work with as they are on a linear scale), and finding the point at which the winning car changes. While that idea is basically sound, there were a few obstacles when it came to applying it systematically:

  • Figuring out which races are relevant for each pair of cars. (Two years ago, I picked those in which at least one car of the pair reached the top 6. However, that might discard quite a lot of useful information, specially for the last two seasons, with close bonuses and single-car podiums.)
  • Reliably estimating the switch point between the cars. (Eyeballing only takes you so far, specially when there is significant overlap between the winning ranges of the cars, or with custom cars for which not many data points are available.)
  • Distilling the pairwise estimates into a single, overall estimate of the bonus for each car. (With 33 cars used in ZakStunts since 2008, we might have up to 32 different pairwise estimates for each car!)

For #1, after playing a bit with the available data I have settled on picking matches (that is, comparisons between cars available for a specific race) in which at least one of the cars either reached 120% of the winning time or had at least 10% of the submitted replays. That seems to give a good balance between preserving information and excluding accidental results.

For #2, a simple and well-established solution is logistic regression: using the advantages and the record of wins and losses for a pair of cars, we can fit a model that estimates the winning probability for the given advantage. For instance, here is an updated version of the Carrera versus Lancia chart from my older post, with a fitted logistic curve:



The midpoint of the curve, the point at which the model predicts the cars have the same chance of winning, is at an advantage of +0.056, which would amount to a +5.4% relative bonus of the Carrera over the Lancia.

(Side note: while the logistic model has the great advantage of simplicity, there are other reasons to regard it as a reasonable functional form for our problem. In particular, the winning probability as a function of the advantage should in any case be a sigmoid, as it must go to either 100% or 0% as the bonuses grow apart from each other, and also must be symmetrical around the midpoint.)

Obstacle #3 was arguably the decisive one. Ideally, we'd want to use the full wealth of pairwise comparisons to reach an overall value for each car, but do so in a principled way. After trying a few ad hoc approaches with limited success, I learned there is a standard way of taking in the pairwise models all at once: the Bradley-Terry model. It is similar in a few ways to an Elo ranking but without evolution in time (so the abilities of the players are assumed to remain constant), and is commonly used to estimate the abilities of players or teams within a tournament. In our case, the cars are the "players", and each pairwise meeting in a race is a match. The bonus advantages, then, can be incorporated in the same way that, in more typical uses of Bradley-Terry, home advantage for sports teams is handled: as an added term to the car "ability" scores.

With those pieces in place, fitting a Bradley-Terry model to the ZakStunts data is straightforward, giving us abilities for each of the cars, which are displayed with error bars on the chart below:



Up to a scaling factor, these abilities correspond to the intrinsic advantages for each car, and so they can be easily converted to balanced bonuses. You'll note I have excluded a few of the ZakStunts cars:

  • Acura, GTO, Indy and Vette, due to the notorious problem of powergear leading to too much variability from one track to the next. (For a rough idea of where they stand relative to the others, see the Magic Lamp experiment.)
  • Xylocaine, which, besides also being a powergear car, was never competitive in its one ZakStunts season so far.
  • Speedgate, whose driving technique evolved so much that it is questionable how representative its historical results are.

It might be prudent to take the estimates of cars which only had one or two seasons so far with a grain of salt, as the error bars in the chart above suggest. (Uncertainty is specially high for the 911 Turbo because the seasons in which it was competitive were at the height of the far-apart-bonuses era, which tends to make its matches less informative.)



With the i's dotted and the t's crossed, here are the balanced bonus estimates:



These percentages should not be taken as definitive: the methodology might be further refined, and there will always be more races to add, with different scenarios to try out the cars. In any case, they are an auspicious start!  :)



One direction that might be explored in future work is adding more predictors to the model. The Bradley-Terry framework allows bringing in extra factors that might be additional sources of advantage or disadvantage for the cars. We might collect data on track-specific (e.g. the proportion of non-asphalt elements) or even replay-specific (e.g. how much of a lap is spent accelerating) variables, incorporate them to the model and see if they interact with the car abilities. That might be one way to quantify situational advantages of, for instance, off-road cars, or even powergear cars.

Both the data and the (very unpolished!) R code used to fit the model and obtain the bonuses are attached, in case you want to reproduce the calculations or tinker with them.

Overdrijf

Wow, very interesting data. Not all of these results immediately make sense in my head, but the methodology seems very thorough. And it's kind of poetic that the "basically developed to be my ideal car" BMW ends up so close to my favorite original car in the Jaguar. I might go and suggest an NTT race between the two.

Argammon

Great Work!

I think it would be cool to make all cars start with their balanced bonuses in January 2025. The scoreboard would surely look interesting.  ;D

Duplode

Quote from: Argammon on November 06, 2024, 08:13:48 PMI think it would be cool to make all cars start with their balanced bonuses in January 2025. The scoreboard would surely look interesting.  ;D

Not sure about doing it on ZakStunts (it's very nice to have continuity of bonuses from one year to the next), but otherwise that would be a fun race to have! Maybe it could be a special event of some sort?

Cas

Great work!  One thing I'd like to ask because I'm not sure I understood is whether cars are being compared only when driven by the same pipsqueak or by their combined results of all pipsqueaks that used them in each given race. I mean, some of us have been to most races, but some pipsqueaks have had periods of much higher participation, so that may have an impact.

Another thing is... would it be possible to solve the PG problem, more or less, by separating races on tracks that make it very hard to do PG from those that are easy to do PG on and treating them as two different cars?
Earth is my country. Science is my religion.

Duplode

Very good questions!

Quote from: Cas on November 07, 2024, 06:41:33 PMOne thing I'd like to ask because I'm not sure I understood is whether cars are being compared only when driven by the same pipsqueak or by their combined results of all pipsqueaks that used them in each given race. I mean, some of us have been to most races, but some pipsqueaks have had periods of much higher participation, so that may have an impact.

It's the latter option: the comparisons are according to the best corrected lap time for each car (with unused cars losing by default). For instance, the ZCT258 match in that example regression plot was between my 46.31 (1:01.75) Lancia lap and Zapper's 55.50 (1:14.70) Carrera lap, and thus it is scored as a Lancia win. The lap times aren't used directly in the regression, but merely to pick the winners.

Quote from: Cas on November 07, 2024, 06:41:33 PMAnother thing is... would it be possible to solve the PG problem, more or less, by separating races on tracks that make it very hard to do PG from those that are easy to do PG on and treating them as two different cars?

In theory, yes. There are, however, two complications that are difficult to overcome with this approach. Firstly, non-PG races with powergear cars are historically rare at ZakStunts. Secondly, even among PG races there is plenty of variability when it comes to how much of the laps is spent on powergear.

An alternative approach that I see as more likely to succeed is choosing a continuous variable that can work as a proxy for powergear use, and then extending the model by adding interaction terms between it and the car choices. For instance, one candidate for being that proxy variable could be the average speed of the winning lap in a track (possibly divided by the winning car's top speed). As a bonus, besides helping with powergear, that approach might also allow quantifying how much better cars with low top speed do on slow tracks. 

Argammon

As an alternative, the continuous variable could measure the percentage of time the car has spent in power gear on a given lap. For the Corvette/GTO/Indy a speed above 225mmph (230?) may be a good proxy for that. So, for example, if the GTO drove faster than 225mph for 1 minute on a 2-minute track the variable takes a value of 0.5.

This should not be difficult because cartography measures the speed continuously anyhow, or does it?  ::)

Duplode

That would be a very good variable, actually! It's easy to interpret for PG cars, and for non-PG cars we could just set it to zero and forget about it. For GTO and Vette 225 mph (the rigid PG limit) would be suitable, while for Indy and NSX I think it makes sense to use their flat track top speed (216 mph and 170 mph respectively).

Quote from: Argammon on Yesterday at 06:08:38 PMThis should not be difficult because cartography measures the speed continuously anyhow, or does it?  ::)

Yup, it's technically feasible to do. Cartography relies on repldump to obtain frame-by-frame data for the replay, so in principle it's just a matter of running repldump on the replays and having a little program to process the output and calculate the ratio. (Automating the whole sequence of steps is a sensible thing to do, because the number of replays to be analysed is potentially large, and also because repldump has to be ran through DOSBox, making it a relatively slow tool for this sort of task.)