Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - Duplode

Pages: [1] 2 3 ... 8
Stunts Chat / Race strength estimation, revisited
« on: December 27, 2020, 10:52:35 AM »
For a long while, one of my favourite Stunts investigation topics has been the evaluation of race strength, be it merely for the enjoyment of historians and pundits, or to inform some kind of spiritual successor to Mark L. Rivers' SWR Ranking. Some of you might even remember my 2012 thread on the matter. Now, after a long time with this project in the back burner, I have made enough progress to feel like posting about it again. So, without further ado, here is a plot of race strengths covering the 236 ZakStunts races so far:

(Attached below is the Excel file this chart belongs to, so you can have a closer look at the data.)

While this chart might look rather like the ones I shown you years ago, there is one major difference: this time, a clearer procedure to obtain the data has led to values that are meaningful on their own. For instance, consider the massive spike you see just left of the middle of the chart. That is ZCT100, whose strength is around 70. According to the model underpinning the calculations, that number means a pipsqueak of Elo rating 1500 (which generally amounts to lower midfield) would, if they joined ZCT100, have a 1 in 70 chance of reaching a top five result on the scoreboard.

The numbers here aren't definitive yet, as I still want to check whether there is any useful tuning of parameters to be done, as well as to figure out how to estimate some of the involved uncertainties. In any case, I believe they look fairly reasonable. Within each season, the ranking of races is generally very sensible. Comparing different eras of ZakStunts is, as one might expect, trickier. In particular, I feel the model might be overrating the 2010 races a little bit. Also, it is hard to tell whether the model underrates races from the first few seasons (2001-2004) as it moves towards a steadier state. Still, the chart does seem to capture the evolutionary arcs of ZakStunts: a steady increase in the level of the competition over the initial years, culminating in the 2005-2006 high plateau, followed by a sharp drop in 2007, and so forth.

I will now outline how this new strength estimation works. Some of what follows might be of interest beyond mere technical curiosity, for parts of the procedure can be useful for other investigations in Stunts analytics.

(By the way, you can check the source code of my program on GitHub, if you are so inclined.)

When I set about resuming this investigation early this year, I decided to, instead of rolling yet another quirky algorithm from scratch, start from well-understood building blocks, so that, if nothing else, I would get something intelligible at the end. Balancing that principle with the known limitations of my chosen methods (and there are quite a few of them), I eventually ended up with the following pipeline of computations:

  • From the ZakStunts results, compute Elo ratings at every race.
  • Obtain, from the Elo ratings, victory probabilities against a hypothetical 1800-rated pipsqueak, and use those probabilities to parameterise a rough performance model, which amounts to the probability distribution of lap times (relative to an ideal lap) for a pipsqueak.
  • Add a ficticious 1500-rated pipsqueak to the list of race entrants, and either:
    • Use the performance model to implement a race result simulator, which spits out possible outcomes when given a list of pipsqueaks and their ratings, and run the simulation enough times to be able to give a reasonable estimate of the likelihood of a top 5 finish by the ficticious pipsqueak; or
    • Numerically integrate the aporopriately weighed probability density for the ficticious pipsqueak to obtain, as far as the model allows, an exact result for said likelihood.

(Implicit in the above is that my code includes both a Elo rating calculator and a race result simulator, which can be put to use in other contexts with minimal effort.)

Let's look at each step a little closer. When it comes to ratings of competitors, Elo ratings are a pretty much universal starting point. They are mathematically simple and very well understood, which was a big plus given the plan I had at the outset. For our current purposes, though, the Elo system has one major disadvantage: it is designed for one-versus-one matches, and not for races. While it is certainly possible to approach a race as if it were the collection of all N*(N-1)/2 head-to-head matchups among the involved pipsqueaks, doing so disregards how the actual head-to-head comparisons are correlated with each other, as they all depend on the N pipsqueak laptimes. (To put it in another way: if you beat, say, FinRok in a race, that means you have achieved a laptime good enough to beat FinRok, and so such a laptime will likely be good enough to defeat most other pipsqueaks.) All that correlation means there will be a lot of redundant information in the matchups, the practical consequence being that a single listfiller or otherwise atypical result can cause wild swings in a pipsqueak's rating. Trying to solve the problem by discarding most of the matchups (say, by only comparing a pipsqueak with their neighbours on the scoreboard) doesn't work well either: since we only have ~12 races a year to get data out of, that approach will make the ratings evolve too slowly to be of any use. Eventually, I settled for a compromise of only using matchups up to six positions away on the scoreboard (in either direction), which at least curtails some of the worst distortions in races with 20+ entrants. Besides that, my use of the Elo system is pretty standard. While new pipsqueaks are handled specially over their initial five races for the sake of fairer comparisons and faster steadying of ratings, that is not outside the norm (for instance, chess tournaments generally take similar measures).

Elo ratings are not enough to simulate race results, precisely because of the distinction between a collection of matchups and a single race discussed above. A simulation requires a model of the pipsqueak performance, so that individual simulated results for each pipsqueak can be put together in a scoreboard. One workaround to bridge this gap relies on victory probabilities. It is possible, given Elo ratings for a pair of pipsqueaks, to calculate how likely one is to defeat the other in a matchup. Similarly, if you have the laptime probability distributions for a pair of pipsqueaks, you can calculate how likely it is for one of them to be faster than the other. A few seat-of-the-pants assumptions later, we have a way to conjure a laptime probability distribution that corresponds to an Elo rating. As for the distributions, the ones I am using look like this:

This is a really primitive model, perhaps the simplest thing that could possibly work. It is simple enough that there are victory probability formulas that can be calculated with pen and paper. There is just one pipsqueak-dependent parameter. As said parameter increases, the distribution is compressed towards zero (the ideal laptime), which implies laptimes that are typically faster and obtained more consistently (in the plot above, the parameter is 1 for the blue curve and 2 for the red one). While I haven't seriously attempted to validate the model empirically, the features it does have match some of the intuition about laptimes. (On the matter empirical validation, one might conceivably drive five laps on Default every day for a month and see how the resulting laptimes are spread. That would be a very interesting experiment, though for our immediate purposes the differences between RH and NoRH might become a confounding factor.)

Having the laptime distributions for all entrants in a race makes it possible to figure out a formula that can be used to, in principle, numerically compute victory and top-n probabilities against its set of pipsqueaks. In practice, it turns out that victory probabilities aren't a good race strength metric, as the results tend to be largely determined by a small handful of pipsqueaks with very high ratings. To my eyes, the top-5 probabilities are at the sweet spot for strength estimations. I originally believed calculating the probabilities by numerical integration would be too computationally expensive (as the number of integrals to be numerically calculated grows combinatorially as the n in top-n grows), so I used the alternative strategy of simulating the races and afterwards check how often top-5 results happen. The chart at the top of the post was generated after 100,000 runs per race, that is, twenty three million and six hundred thousand runs to cover all ZakStunts races, which took fifteen and a half minutes to perform on my laptop. Later, I figured out that, with sufficiently careful coding, the numerical method, which has the advantage of giving essentially exact results, is feasible for top-5 probabilities; accordingly, an alternative Excel file with those results is also attached. (The simulations remain useful for wider top-n ranges, or for quickly obtaining coarse results with 1,000 to 10,000 runs per race while tuning the analysis parameters.) 

To turn the discussion back to sporting matters, the troubles with wild rating swings and outlier results alluded to above brought me back to the question of listfillers, already raised by Bonzai Joe all those years ago. Left unchecked, a particularly weak listfiller in a busy race can wreck havoc upon the Elo rating of its unfortunate author. That ultimately compelled me to look for objective criteria according to which at least some of the obvious listfillers can be exclued. For the current purposes, I ultimately settled on the following three rules:

  • Results above 300% of the winning time and more than two standard deviations away from the average of laptimes are to be excluded. (The Bonzai Joe rule.)
  • GAR and NoRH replays are only counted if the fastest lap on the parallel scoreboard they belong to is, or would be, above the bottom quarter (rounding towards the top) of the scoreboard. (The Marco rule.)
  • For our current purposes, a car is deemed "competitive" if it can be found above the bottom quarter (rounding towards the top) of the scoreboard, or if it was used to defeat a pipsqueak using a competitive car whose lap was not excluded according to the previous two rules. Only laps driven with competitive cars count. (The Alan Rotoi rule).

These rules were applied to the full list of ZakStunts race entries that I'm using (it was one of those quarantine days back in May). Disqualified race results were also removed; there were a few curious findings in that respect I should write about one of these days. (By the way, ghosts are not included in the calculations, regardless of what their race entries look like.)

(A footnote: the bar for applying the first rule above looks, at first, incredibly low. I considered using a lower percentage, like 250%, and would rather not have the frankly bizarre standard deviation additional condition. It turns out, however, that ZCT029, a difficult dual-way full Vette PG track from 2003, had an extraordinarily broad spectrum of laptimes, including several pipsqueaks with non-listfiller laps beyond the 300% cutoff which would have been excluded without the standard deviation test. Faced with such a peculiar scoreboard, I opted to err on the side of circumspection.)

It remains a tall order to find objective criteria to discard listfillers that won't exclude too many proper competitive laps as a collateral effect. Ultimately, if we were to establish new pipsqueak rankings I suspect different use cases would call for different kinds of ratings. An Elo-like ranking is appropriate for race strength estimations, simulations and predictions, when what is needed is a picture of how well someone is racing at a specific moment in time. For comparing performances within the last several months, though, a ranking of weighed (for instance, by recency or race strengths) race scores within a time window, in the style of SWR, might prove more appropriate. With this kind of ranking, it becomes reasonable to have, for instance, ZakStunts-style worst results discards, which could definitely help dealing with listfillers.

Anyway, by now I probably should stop rambling, at least for a little while :D Questions, comments, criticism, suggestions about the metric and ideas on cool stuff to do with those algorithms are all welcome!

General Chat - ZSC / Car bonus rule change impact review
« on: December 24, 2020, 09:09:37 PM »
After my post on the 2020 rule LTB changes, I felt it might be of some use to also have a look at the effects of the 2019 tweaks to the car bonus system. I remember being a little worried back then about the slower bonus changes possibly unbalancing the rotation in favour of powergear cars. Let's see how that panned out.

Here is a summary of top 10 car usage for the 2017 and 2018 seasons, before the changes:

Code: [Select]
Car Points 1-6 Points 1-10 1st 2nd 3rd Podiums Full Podiums Occurrences Notes
JAGU 67 299 2 3 2 3 1 33
ZF40 62 253 2 2 3 3 2 26
AUDI 56 216 2 2 2 2 2 18
CDOR 56 214 2 2 3 3 2 22
LM02 52 208 2 2 2 2 2 23
ANSX 42 138 2 2 1 2 1 10 Powergear
P962 41 151 2 1 2 2 1 14
COUN 34 126 2 1 1 2 1 12
ZPTR 26 109 1 1 1 1 1 12 2017 only
LANC 26 100 1 1 1 1 1 9
Code: [Select]
ZMP4 26 98 1 1 1 1 1 8 2018 only
PC04 26 95 1 1 1 1 1 9
NSKY 26 95 1 1 1 1 1 8
PMIN 26 94 1 1 1 1 1 7 Powergear
FGTO 26 71 2 1 0 2 0 5 Powergear
VETT 16 76 0 1 1 1 0 9 Powergear
RANG 9 46 0 1 0 1 0 6 2018 only
ZLET 7 27 0 0 1 1 0 2 2017 only

Total 624 2416 24 24 24 30 18 233

("Points 1-6" are assigned according to the scoring system F1 used in the 90's, 10-6-4-3-2-1, while "Points 1-10" use the current F1 system, 25-18-15-12-10-8-6-4-2-1.)

And here is the same data for 2019 and 2020:

Code: [Select]
Car Points 1-6 Points 1-10 1st 2nd 3rd Podiums Full Podiums Occurrences Notes
PMIN 62 201 4 1 2 5 1 15 Powergear
P962 61 278 2 1 3 3 1 34
PC04 60 257 2 2 2 2 2 30
LANC 52 209 1 3 3 3 1 21
SUKA 46 173 2 2 1 2 1 18
RANG 45 178 2 2 1 2 1 18 2019 only
ANSX 45 163 2 1 2 2 1 14 Powergear
DBMW 37 180 0 2 3 3 0 22 2019 only
VETT 37 127 2 1 1 2 1 9 Powergear
DAUD 29 102 2 0 1 3 0 9
Code: [Select]
AUDI 29 90 1 2 1 2 0 6
COUN 27 95 1 2 0 2 0 8
ZTST 25 104 1 1 1 1 1 11 2020 only
ZF40 21 71 1 1 1 1 1 6
JAGU 16 73 0 1 1 1 0 7
LM02 16 65 1 0 0 1 0 7
DMCB 10 34 0 1 1 1 0 3 2020 only
FGTO 6 24 0 1 0 1 0 2 Powergear

Total 624 2424 24 24 24 37 11 240

While usage of powergear cars did increase, it turns out they were being underused before the changes, with the higher frequency of split podiums involving them perhaps masking that. The current scenario is arguably closer to parity, with PG cars having gone from 17.6% to 24.0% of the points in the 1-6 system. In any case, we are definitely not in 2008 territory, and I don't think there will be too many complaints if we keep getting an extra Indy-amenable race per year. (Back in 2008, as we tried to figure out how to make the bonus system run smoothly, the rotation was a touch too slow, resulting in nine (!) powergear car victories, including five all-PG podiums.)

Besides the matter of powergear, another key aspect to look into is diversity of car choices within a race, as increasing it was taken as a goal in the pre-2019 discussions. The data confirms the revised system passed this test with flying colours. We went from 6 mixed podiums over the 24 races of 2017 and 2018 (with 4 of them involving PG cars) to 13 over the following 24 races (6 of them with PG cars). I'd say the current amount of multi-car races is close to ideal: on the one hand, having it way below half of the races, as it used to be a few years ago, felt like a missed opportunity; on the other hand, I suspect having it way above half could be a tad exhausting, given the increased strategical complexity.

Season 2019 / 2019-6 (White Leaf Desert)
« on: August 07, 2019, 01:09:14 AM »
We could do with a race thread right now, so here is one  :) The track is attached here, in case you need to download it before the site goes up again; the car for the race is the Jaguar.

I will probably have a second session sometime soon. I feel like I should give all those other paths a try...

General Chat - ZSC / Mid-00's NoRH ZakStunts replays
« on: August 03, 2019, 08:46:18 PM »
Back in 2006 and 2007 (and possibly in some earlier seasons), ZakStunts races sometimes had an unofficial NoRH scoreboard. For instance, I remember having taken part in a classic line NoRH parallel race in Z79/Default (you can find discussion of it in the Z79 forum thread, as well as in the December 2007 shoutbox archive); also, elsewhere on the forum Bonzai Joe mentions someone having driven an 1:16 NoRH lap on Z59.

Do those NoRH replays (not just the ones I mentioned) still exist anywhere?

General Chat - ZSC / Custom car bonuses and rotation
« on: July 28, 2019, 08:08:48 PM »
Something I noticed after the Z217 update: For the sake of peace of mind, I have docked six points from the Indy, and four points from the Ranger. While doing so, I had assumed that custom bonus changes would be reverted at the end of the race; however, looking at the bonus changes around Z212 (which had custom +6 for Carrera and +4 for Acura) shows that is not the case. In this case, I certainly didn't mean to set back the rotation of the Indy by six points.

I'm genuinely unsure on whether we should change that behaviour. In any case, it is worth being aware of that if you are going to tweak bonuses.

Competition 2019 / ZCT217 - Pacific Coast Highway
« on: July 28, 2019, 01:06:17 PM »
Welcome to my guest track for this season! Compared to Z193, my track from two years ago, this one is a lot shorter (I cut off a ton of stuff that wasn't working well in between the drafts), and probably quite a bit easier as well. Still, I hope that finding the best lines will require a fair amount of exploration.

Competition 2019 / ZCT212 analysis
« on: April 07, 2019, 09:00:01 PM »
Top 4 analysis for ZCT212:

Code: [Select]
0 0.00 0.00 0.00 0.00
1 22.30 22.55 22.70 22.70
2 39.20 39.45 39.95 40.25
3 53.60 54.20 54.75 55.55
4 60.10 60.65 62.15 61.95
5 68.55 68.90 71.15 70.15
6 81.60 82.50 85.90 86.65

0 0.00 0.00 0.00 0.00
1 22.30 22.55 22.70 22.70
2 16.90 16.90 17.25 17.55
3 14.40 14.75 14.80 15.30
4 6.50 6.45 7.40 6.40
5 8.45 8.25 9.00 8.20
6 13.05 13.60 14.75 16.50

0 0.00 0.00 0.00 0.00
1 0.00 0.25 0.40 0.40
2 0.00 0.25 0.75 1.05
3 0.00 0.60 1.15 1.95
4 0.00 0.55 2.05 1.85
5 0.00 0.35 2.60 1.60
6 0.00 0.90 4.30 5.05

Two remarks. Firstly, this was a tough track, with multiple spots where one subtle misstep (for instance, a fumbled gear change at the loopcut) could derail one's plans entirely. Carrera tracks do have a penchant for being unforgiving, one (in)famous example being Z86. Secondly, the final cut was absolutely decisive, as comparing FinRok's dream line by the first roof and Marco's spotless take on the second tunnel with everything else reveals. In particular, I have lost a podium place by not realising quite how much that would matter enough (my purple sectors 4 and 5 weren't worth much when followed by a somewhat careless final cut -- in fact, I had to abandon a partial replay with even better sectors 4 and 5 because I couldn't pull off a sufficiently good sector 6 before the deadline).

Competition 2019 / ZCT211 analysis
« on: March 19, 2019, 03:11:49 AM »
Here is a top 6 analysis for last month's race -- it's been a while since we last did that, and also it is a nice opportunity to try it out with multiple cars:

(From the second table on, Audi times were corrected by the car bonuses to fit the Ranger ones.)

Code: [Select]
0 0.00 0.00 0.00 0.00 0.00 0.00
1 41.15 41.95 34.60 43.25 34.95 35.25
2 74.25 75.05 59.50 77.90 62.70 63.70
3 109.80 109.95 88.25 113.75 91.95 93.70
4 125.60 126.65 101.15 130.85 105.00 106.90
5 148.80 149.75 118.75 156.15 123.85 126.80

1.00 1.00 1.29 1.00 1.29 1.29

0 0.00 0.00 0.00 0.00 0.00 0.00
1 41.15 41.95 44.49 43.25 44.94 45.32
2 74.25 75.05 76.50 77.90 80.61 81.90
3 109.80 109.95 113.46 113.75 118.22 120.47
4 125.60 126.65 130.05 130.85 135.00 137.44
5 148.80 149.75 152.68 156.15 159.24 163.03

0 0.00 0.00 0.00 0.00 0.00 0.00
1 41.15 41.95 44.49 43.25 44.94 45.32
2 33.10 33.10 32.01 34.65 35.68 36.58
3 35.55 34.90 36.96 35.85 37.61 38.57
4 15.80 16.70 16.59 17.10 16.78 16.97
5 23.20 23.10 22.63 25.30 24.24 25.59

0 0.00 0.00 0.00 0.00 0.00 0.00
1 0.00 0.80 3.34 2.10 3.79 4.17
2 0.00 0.80 2.25 3.65 6.36 7.65
3 0.00 0.15 3.66 3.95 8.42 10.67
4 0.00 1.05 4.45 5.25 9.40 11.84
5 0.00 0.95 3.88 7.35 10.44 14.23

Some of the highlights:
  • Sectors 1 (GAR-esque) and 3 (flat out) do suggest a Ranger advantage under those bonuses.
  • Other sectors, however, gave the Audi pipsqueaks enough margin to (barely) remain in contention. There was the high speed outer line in sector 5, and, in my lap, one extra shortcut in sector 2 that wouldn't be as useful with the Ranger due to the lower attainable speeds.
  • While one would be forgiven for thinking nothing interesting would happen in sector 3, it turns out that Seeker managed to claw back almost the whole gap to Marco there, centisecond by centisecond.
  • Following that, Marco managed to fend off the challenge with a quite excellent line through sector 4 (the first dual-way sector).
  • Marco's lap also features an interesting inner dual-way cut through sector 5. Among the top 6, the only one to attempt a somewhat similar line was Heretic -- the alternative provided by outer line, though, meant the inner line was rather less effective with the Audi than with the Ranger.

Stunts Chat / Keyboard issues and ghosting
« on: November 30, 2018, 01:37:12 PM »
Folks sometimes report not being able to press some of the keys needed to drive simultaneously (I remember BJ talking about that in the past). Such issues are, in all likelihood, a problem with the keyboard itself, and not with the operating system or with the configuration of any software. The heart of the matter is that, unless you are using one of those fancy mechanical keyboards plugged to a PS/2 port, there will be an upper limit on how many simultaneous key presses can be registered (a limit which might vary depending on which parts of the keyboard you are using). Some cheaper keyboards are bad enough in that respect that they are unable to handle enough simultaneous key presses for Stunts to be played properly. The keyboard of my newish laptop, for instance, has this issue, and so I still need my old desktop keyboard to race. The keyword to use when looking for more information about this issue, or for choosing a keyboard adequate for Stunts, is "ghosting". This page provides an interactive test for your keyboard. In my case, it reveals my laptop keyboard can't handle "Down Arrow", "Right Arrow" and "Z" simultaneously, which is clearly unacceptable.

Motor sports, Racing / Formula One in 2019
« on: November 22, 2018, 11:46:40 AM »
Robert Kubica will be racing for Williams in 2019! Robert stallin Kubica!!   :o :) 8)

Competition 2018 / Z209 - pArAnOiA
« on: November 20, 2018, 04:51:12 AM »
A bona fide CTG arena track! This is auspicious. Later I will try GAR -- it is a must, given the name.

Stunts Forum & Portal / Competition Archive 2019
« on: November 12, 2018, 04:42:35 AM »
After months of procrastination and computer woes, I can finally announce that there will be a 2018 update to the Competition Archive. The plans are for it to be ready before the new year. Here are the competitions the update should cover (please do tell me if I forgot anything!):

  • ZakStunts (2016-2018)
  • ZakStunts GAR (2016-2018)
  • ZakStunts NoRH (2016)
  • R4I (2016)
  • R4K (2016-2018)
  • LeStunts (2017; Dorsal and Second)
  • Superkart Special Event (2010)

I think almost all of those either are somewhere in my computer already, or are otherwise easy to retrieve. In any case, I will ask you folks if anything turns out to be missing.

On an additional note, another thing I would like to do is setting up some architecture for the Archive more resilient than a bunch of folders, textfiles and spreadsheets loosely correlated with each other. I'm not sure I will have time to play with that before the release, though.

Competition 2019 / Cars and rules for 2019
« on: September 23, 2018, 02:25:00 AM »
Let's get it rolling! On the subject of cars, I would like to begin by making a pitch for two of them:
  • Superkart: I like the Kart, and would enjoy seeing it again at ZakStunts. If it comes up at a insufficiently twisty track during the season, we can always slap a -10% penalty on it. If you want to try it for yourself, head to the current race at Cas' competition.
  • Toyota Sprinter Trueno: It would be good to give one of the Toyotas a chance to shine. I feel the Trueno is a more interesting car than the Corolla, as due to its low real top speed it is closer to the LM002 than to all the other mid-slow cars. There is one problem: the Trueno has extra off-road grip similarly to the Lotus, but exaggerated to the point it becomes more bug than feature. However, as the Trueno was never used in a competition, it would be straightforward to prepare and distribute a bugfix release -- I will do just that if we decide to use it next season.
As both of my candidates are slow cars, it sounds sensible to bench the Skyline this time if either or both of them are approved for 2019.

There are all those other cars to consider, of course, and perhaps I'm overlooking some great candidate (you can find all of them, or at least the released ones, at Southern Cross). So please have your say  :)

Competition 2017 / ZCT193 - No Imagination
« on: August 07, 2017, 11:40:57 AM »
Behind the scenes: not only this is not the track I originally intended to make when I asked for the guest spot, it isn't even what it was supposed to be when I began working on it a couple days ago -- when I laid the first tile it was meant to be an USC-style "arena" track, but I never seem to get those right. In any case, it is a slow one. Hopefully not too slow  ::) :)

Competition 2017 / ZCT192 - Freestyle Fever
« on: August 07, 2017, 11:30:36 AM »
This track looked interesting -- too bad I managed to race so little through the month that I couldn't even explore weirder dual-way lines... time to watch the replays, I guess :)

Pages: [1] 2 3 ... 8