News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu
Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - Duplode

#1
Custom Cars with Stressed / Trying out Ryoma's cars
December 27, 2021, 02:55:11 AM
In the Cars and rules for 2022 topic, there was a consensus for having at least one of Ryoma's cars in the next season. For that to work smoothly, however, we need to do some testing in order to make an informed choice. To get things started, I will (try to) test one car from Ryoma's Mega every day, and report the results here. If some of you join me in doing that, I'm sure we'll get to figure out suitable choices in no time.

Here are the cars tested so far:



To begin with, here are some words on the Ferrari 456 GT (forum topic):

  • Gears: 6
  • Powergear: No
  • Flat track top speed: 196 mph
  • Real top speed: 218 mph
  • 0-60 mph: 5.1 s
  • Time to hill at Default (auto gears): 12.55 s (References: Countach - 12.40 s; Skyline - 12.45 s; Acura - 12.70 s)
  • Default test lap (NoRH, classic line): 1:32.45
  • First impressions: A moderately fast sports car, whose closest match is probably the Skyline (the 456 should have the upper hand on faster tracks, though). Next to other cars from that class, its handling feels nice and responsive, though there doesn't seem to be much extra grip, and it is easy to get overconfident on the corners.
#2
General Chat - ZSC / Race positions dataset
October 30, 2021, 08:36:20 PM
Extracting the race positions from all ZakStunts scoreboards is not a straightforward matter, even with database access. The positions are computed from the replays rather than stored, and there are all sorts of corner cases to look out for -- name changes, draws, disqualifications, and so forth. For my ratings and race strengths project, I opted to review all the scoreboards and prepare the dataset by hand. A CSV file with the data is available at my project's GitHub repository (attached below is a version of the same file as of ZCT243).

The Track, Racer and Rank columns of the CSV contain what it says on the tin. pipsqueak aliases and name changes have already been unified. The Ghost column value is 1 if the pipsqueak is a known ghost, and 0 otherwise. Lastly, the Status column records the following occurrences about a race entry:

  • DSQ: Disqualification, from a 00's ZakStunts race (back then, disqualifications were handed by setting the laptime to 9:99.99 rather than deleting the race entry outright).
  • INV: Invalid replay, discovered upon review.
  • MSC: CTG's 2014 races.
  • EX1: Laptime beyond the "300% and 2 SD" cutoff.
  • EX2: Lap driven under an alternative ruleset, such as GAR.
  • EX3: Lap driven with a clearly uncompetitive car.

Edit, 21:53 ZakStunts time: I have updated the attached file after re-adding a few 2014 season status notes I had accidentally deleted.
#3
In my previous thread on pipsqueak performance modelling, I reported optimistically on modelling lap times over repeated attempts with a gamma distribution. Back then, all I had for empirical evidence was a set of lap times driven in a minimal four-corners track. To better understand the model and the problem space, it would make sense to gather data on more realistic conditions. Soon enough, the perfect opportunity came up with Alpine, R4K's October race: great car, great (and not too difficult) track, and a real OWOOT NoRH race going on.

From October 11th to the deadline day, I had one (occasionally two) NoRH session on Alpine every day, and recorded all of my valid lap times. I tried my best to keep a consistent driving style, going for the most effective reproducible racing line I knew of, and not giving up on laps unless I crashed or left the track. The following set of box plots show what my lap times on each session looked like (for a closer look at the data, you can look at the attached CSV file):



I would divide my racing in three main stages:

  • On sessions 1 to 5, I was still learning both car and track. In particular, I got under 83 seconds after figuring out a good line through the second and third corners, and under 82 seconds by  understanding how the second fast chicane should be approached.
  • From sessions 6 to 14, there was largely stability, with perhaps some very slow improvement, culminating with a 81.00 on session 14.
  • On sessions 15 to 18, there was modest but marked improvement, which I attribute to either realising I could be a touch more aggressive with my lines upon rewatching the laps I had posted to R4K, or to being in better shape by no longer doing late night sessions. I drove my final R4K time of 80.55 early on session 15. After that, I only managed one more lap under 81 seconds: an 80.75 on session 18.

With the lap times at hand, the next step is attempting gamma fits on stretches in which I had consistent performance. A good place to start could be sessions 15-18, in which I was perhaps closer to my best. Here is a first attempt:

Fitting of the distribution ' gamma3 ' by maximum likelihood
Parameters :
        estimate Std. Error
shape  5.5682015 1.79934095
scale  0.3520632 0.07724456
thres 80.2265133 0.24903711
Loglikelihood:  -115.9133   AIC:  237.8267   BIC:  245.612
Correlation matrix:
           shape      scale      thres
shape  1.0000000 -0.9492797 -0.9041436
scale -0.9492797  1.0000000  0.7526068
thres -0.9041436  0.7526068  1.0000000


A recap on what the gamma parameters mean in our context:

  • thres, short for threshold, is the predicted ideal time, the one the model assumes it is impossible to go below.
  • shape indicates how hard it is to improve as you get closer to the ideal time. It is associated with track length and track/line difficulty.
  • scale reflects how consistently the pipsqueak manages to drive. Smaller values make for a narrower gamma curve, squeezed closer to the ideal time.
While the diagnostic plots show the gamma fit is pretty good, the parameter values are all over the place, as the following bootstrap confidence intervals indicate:

Nonparametric bootstrap medians and 95% percentile CI
          Median      2.5%      97.5%
shape  5.2026795  1.965190 10.9580942
scale  0.3649339  0.222748  0.6145803
thres 80.2609610 79.615182 81.0369308

Ultimately, is is a bit much to try fitting those three parameters at once with the available amount of data; there is too much wiggle room. From the three parameters, the one we are in a better position to estimate by other means is thres. The sum of the best sectors over the fifteen laps I had saved for posting to R4K is 80.05; that being so, 80.00 is a reasonable, if conservative, ideal lap estimate. Fixing thres to 80.00 results in the following gamma fit over sessions 15-18:
Fitting of the distribution ' gamma3 ' by maximum likelihood
Parameters :
       estimate Std. Error
shape 7.0919246 0.98618134
scale 0.3083916 0.04444141
Fixed parameters:
      value
thres    80
Loglikelihood:  -116.1671   AIC:  236.3341   BIC:  241.5244
Correlation matrix:
           shape      scale
shape  1.0000000 -0.9650934
scale -0.9650934  1.0000000


The 95% confidence intervals look a fair bit tamer now:
Nonparametric bootstrap medians and 95% percentile CI
         Median      2.5%      97.5%
shape 7.2104854 5.4284800 10.1456150
scale 0.3034129 0.2069173  0.4084368


While thres is the easiest parameter to estimate through different means, the one it would be arguably more interesting to keep fixed is shape, as it is supposed to be the one most closely associated to the nature of the track. Given the more believable estimate of shape we have just obtained for sessions 15-18, it is worth trying to fits for other sets of sessions with shape fixed at 7.1. Here is a fit for sessions 6-9...

Fitting of the distribution ' gamma3 ' by maximum likelihood
Parameters :
        estimate Std. Error
scale  0.4312479  0.0320152
thres 80.1501134  0.1939291
Fixed parameters:
      value
shape   7.1
Loglikelihood:  -142.3025   AIC:  288.605   BIC:  293.6915
Correlation matrix:
           scale      thres
scale  1.0000000 -0.8532844
thres -0.8532844  1.0000000


Nonparametric bootstrap medians and 95% percentile CI
          Median       2.5%      97.5%
scale  0.4276147  0.3703891  0.4924684
thres 80.1722529 79.8464586 80.5086957


... and 10-14:

Fitting of the distribution ' gamma3 ' by maximum likelihood
Parameters :
        estimate Std. Error
scale  0.3832479 0.02554952
thres 80.0632570 0.15585001
Fixed parameters:
      value
shape   7.1
Loglikelihood:  -171.2663   AIC:  346.5326   BIC:  352.1242
Correlation matrix:
           scale      thres
scale  1.0000000 -0.8591264
thres -0.8591264  1.0000000


Nonparametric bootstrap medians and 95% percentile CI
          Median       2.5%      97.5%
scale  0.3800133  0.3260546  0.4442169
thres 80.0765083 79.7532830 80.3593560


Keeping the shape fixed, the fits point to a noticeable reduction of the scale parameter (suggesting more confident and consistent driving) across the sets of sessions. The difference in thres (which might be ascribed to refinements of the driving line) is too small to be clearly meaningful.

Parameter estimation for my proposed model is challenging: pining down those three significantly correlated values without huge amounts of lap time data is nontrivial, and  if it can be tricky to extrapolate from one day to the next, let alone doing so across different pipsqueaks or driving lines... On the flip side, the results of the experiment do suggest the gamma distribution is a reasonable model for lap times, and that it is worth it to keep pursuing this idea.
#4
Competition 2021 / ZCT244 - Crazy Eight
October 18, 2021, 03:41:01 AM
Z244 has a two-tile switching shortcut, which is illusrated by the attached diagram and Trueno lap. As per tradition (ZCT202, ZCT146, ZCT136, etc.), now it's that time in which we sit down and decide whether it should be forbidden.

My take: while this shortcut isn't as destructive as the ones seen in past cases (it affects less than half of the track, and might not even be advantageous with faster cars), I think it should be forbidden anyway (it is as ugly as any of those two-tile cuts, and ruins a perfectly good section of the track). For a more objective rationale, we might invoke the "it shouldn't be allowed to drive the same section of track in both directions" principle, originally suggested by afullo back in ZCT202.
#5
General Chat - R4K / R4K15 - Alpine
October 14, 2021, 05:42:44 AM
As some of you have already found out, this is a very enjoyable track, which fits the car and the rules like a glove. It's a great opportunity to try out the Corvette CERV III!

(Incidentally, this race is also perfect for carrying out my performance model experiments, so expect a lot of NoRH laps from me this month :D)
#6
Stunts Chat / Räcer performance modelling: an update
September 25, 2021, 06:39:25 AM
Here is a long-delayed progress update on my race strength estimation project, more specifically on its performance modelling side. Besides improving my ratings framework, the findings I will talk about here might even bring extra insight about racing in practice. How so? Let's find out!

What do you mean by "performance modelling"?

I will start with a recap of what I was busy with at the beginning of the year. The primary goal back then was setting up a metric for the strength of races. To a first approximation, how strong a race is depends on how strong its field of pipsqueaks is -- how many people take part, and how well they are racing at the time. The latter part can be dealt with an Elo rating. However, while Elo ratings can be a decent indicator of the expected performance of pipsqueaks, they don't easily translate to a meaningful rating for race strengths. That's because the Elo system is based on head-to-head matches, while in a race everyone involved is competing at once. For that reason, my chosen strategy was using the Elo ratings to tune probabilistic models of how well each pipsqueak was likely to do in a certain race, and use said models to compute how well someone would be likely to do if entering that race.

To define the model, I used the one probability distribution with sensible characteristics that I happened to recall. It looks like this:



The horizontal coordinate is for lap times, in arbitrary units since concrete information about the tracks is not being dealt with. Time zero, at the leftmost edge of the plot, stands for the ideal lap, and so moving rightwards means increasing deviations from the perfect lap. The vertical coordinate is for the probability (strictly speaking, the probability density) of reaching each lap time. In the example plot above, there are distributions for two pipsqueaks. The red curve pipsqueak is expected (though not guaranteed) to perform better than the blue curve one, as their probability density is squeezed closer to zero overall. Given these two distributions, winning probabilities for red and blue can be obtained; with more pipsqueaks, it is also possible to calculate things like the probability of one pipsqueak reaching a top five result against the others (that is, losing to at most four of them).

It is also worth reviewing what I meant by this model having "sensible characteristics". Firstly, the probability density at zero time is zero, which means the perfect lap is taken as an unattainable ideal. Secondly, the drop leftwards from the peak reflects how improvement becomes increasingly hard as the perfect lap is approached. Thirdly, the gradual decay on the tail to the right of the peak expresses the range of possible mistakes that might affect a lap, even quite big ones.

OK, so what's new?

I managed to improve a lot my approach after learning that the model I had chosen is a gamma distribution. The chart below, taken from Wikipedia, shows a few example distributions from that family of probability distributions:


Gamma distribution pdf
Cburnett, CC BY-SA 3.0 <http://creativecommons.org/licenses/by-sa/3.0/>, via Wikimedia Commons


In particular, the blue curve here with k=3 is (save for a scaling factor) the same as the distributions I had been using (note the visual similiarity with the curves from my example plot above). from my example plot. k, the shape parameter, controls the shape of the curve: distributions with larger k are more symmetrical around the peak, and drop off more sharply towards zero on its left. (The other parameter, theta, is the scale parameter, controls how compressed or stretched out the curve is -- it is the one I let vary according to the Elo ratings, while keeping the shape parameter fixed.)

The Wikipedia examples also illustrate how setting k=1 collapses the distribution a plain exponential. Exponential distributions could conceivably be used instead of the more plausible-looking ones with higher k that I chose. The interesting thing about that possibility is how the winning probabilities for a pair of pipsqueaks with exponential distributions of lap times look exactly like those in the Elo formulas, with the scale parameter being inversely proportional to the Elo rating. Going the other way around, it is possible to replace the conventional Elo winning probability formula with the one for my favoured k=3 model and, with a few tweaks to the system to keep things in scale, have everything working as smoothly as before -- except without the need for an awkward conversion from the Elo ratings to a seemingly unrelated model, and with a much better understanding of what is going on.

And why should any of that be taken seriously?

So far all I have reported here was me toying with equations, with nothing to show that the talk about performance models corresponds to reality. At this point, a little serendipity comes into play.

One of the Stunts experiments I wanted to do back in January was investigating the cornering speed differences between left and right turns. As alluded to elsewhere, I prepared two minimal four-large-corners tracks, one clockwise and one anticlockwise, and went about driving them repeatedly with the Corvette. After seeing no difference whatsoever over quite a few attempts, it dawned on me that I had been doing the tests with the 1990 Mindscape version for no particular reason (I happened to be playing on that version for the DOSReloaded.de competition at the time). Once I switched to 1991 Broderbund the difference immediately became obvious. In any case, at that point I already had recorded a few dozen lap times around those tracks on 1990 Mindscape, so I figured I might as well keep driving a few laps per day until I had enough data to see if the empirical distribution of lap times actually looks like a gamma distribution...

The plots below show the histogram of the lap times (224 completed laps -- attempts in which I left the track or crashed were discarded) and the fitting of a gamma distribution to it. In order to fit a gamma distribution to the data, it is necessary to guess what the ideal lap time is, so that the zero time on the model can be set. I have assumed 23.25 (which is 0.05 below the best I managed) as the ideal lap time, and so the "data"/"quantile" values on the plots should be read as time gaps to 23.25.



That looks fantastic! Incredibly, even the fitted value for the shape parameter was 3.35, which is pretty close to my almost arbitrary initial choice of k=3 for the model. (I should note that a fitted k further away fro 3 wouldn't actually be a problem, for reasons I will soon get to; what truly matters is that a gamma distribution is a reasonable fit for the observed lap times.)

What else there is to investigate?

Given what I have learned so far about the model (through not only the investigation described here but also various attempts at tuning my rating system), I believe gamma distributions with shape parameter 3 form a good basis for an Elo-like ranking of pipsqueaks and a race strength metric derived from it. It remains a pretty bare bones model, with the limitations of using a single parameter as an evolving rating that stands for pipsqueak performances that I had alluded to in my earlier post about race strengths remaining the same. Furthermore, in spite of the incredibly good results I obtained in my attempt to fit a gamma distribution to real data, there is a large gap between a series of single laps by a specific person on a very simple track and whatever happens when people take part in a real race. Depending on what one wants to use a performance model for, ways in which this gap might be narrowed become relevant. Here I will briefly discuss two aspects of this matter: tracks and repeated attempts.

When it comes to tracks, the Elo-like rating system I am working with is entirely indifferent: races are seen as events in which pipsqueaks compete and are ranked, with everything else, including the nature of the track on which they compete, being abstracted away. Turning our attention to the performance model, though, we might consider how the probability distribution of lap times might change according to the track. One way to approach that invokes an useful property of gamma distributions. Suppose we have, rather than a lap time distribution, gamma distributions for the section times that make up a lap, with the distributions having the same scale parameter but possibly different shape parameters. In that case, the lap time distribution will also be a gamma distribution, and its shape parameter will be the sum of the shape parameters for the sections. That points to a way of accounting for basic track characteristics in the performance model, namely using higher shape parameters for longer and/or harder tracks. As I mentioned earlier, higher shape parameters mean a sharper drop in the probability density as we move from the peak towards the ideal time, which intuitively reflects increased difficulty. (One way of thinking about this is seeing each track section as a potential source of mistakes which increase the lap time, and the overall lap time distribution as the result of combining these sources of mistakes.)

As for the matter of repeated attempts, we know that even in a NoRH race what reaches the scoreboard is not the result of a single attempt, but the best result from a set of completed laps. That being so, if we want to use the performance model to think about, for instance, racing strategy, it would make sense to consider, given a single attempt lap time distribution, what the distribution for the best lap time out of a number of completed attempts would look like. While I haven't worked out the details yet, playing with the formulas suggests the resulting distribution is very similar but not quite precisely a gamma distribution, but pretty close. As the number of attempts gets larger, the distribution becomes more symmetrical -- much like it happens when the shape parameter grows in a gamma distribution -- and gets squeezed towards zero -- as expected, as repeated attempts are supposed to improve the results! How fast do the results improve as the number of attempts increase, one might wonder? The preliminary calculations I have done starting from a k=3 single lap distribution suggest the expected gap to the ideal lap time is, approximately, inversely proportional to the cubic root of the number of attempts. In other words, if you want to cut by half the gap to the ideal time, you should be ready to try eight times as many! Note that I'm assuming 3 as the initial shape parameter, which, as the results of my four-corners minimal track suggest, should amount to a fairly easy track; for harder tracks the foreseen number of attempts should grow even faster.

There should be plenty of interesting questions like those I have just mentioned that might be posed in terms of a performance model; I'm all ears for your suggestions!
#7
Custom Cars with Stressed / BMW 850 CSi (E31)
April 18, 2021, 07:06:30 PM
Okay, let's make this officialy a thing!

I already have a dashboard generously contributed by Ryoma, a 3D model prototype from 2010 (which I will reevaluate later in the week, when I reboot into Windows), and a torque curve (obtained from this page, and which matches this BMW page about the engine). I already know in which direction I want to go with the RES, so I will begin working on and testing it today.
#8
General Chat - ZSC / GAR rules draft
March 24, 2021, 06:16:12 AM
I have drafted a set of written GAR rules for use in ZakStunts, to hopefully clarify edge cases and give us an unified document to refer to. Here is a link to it. I plan to have it kept as a draft at least until the Z236 deadline, so that you can review it. In the meantime, I will prepare screenshots to illustrate some of the rules. Criticism and suggestions on all aspects of the draft are most welcome.

To begin with, I would highlight the following rules as deserving extra scrutiny:

  • Topics recently discussed at the shoutbox: 3.28.1 (airtime on banked roads), 3.55.2 (entry and exit of l/r corks), 3.42.2 (jumping over tunnels), 3.71.1 (jumping over slalom blocks), 3.44.4 (airtime inside pipes).
  • Extra rules I am proposing, mostly based on analogies: 3.40.2 (loop exit), 3.44.2 to 3.44.6 (various edge cases involving pipes), 3.4a.1 and 3.4b.1 (extent of the track on crossroads and splits), 3.6d.2 (edge cases involving highway dividers), 3.73.2 (dodging slalom blocks by the outside).
(Don't worry, there aren't hundreds of rules, as the high numbers might suggest; it's just that I have used track element hex codes for indexing.)
#9
Stunts Chat / Penalty oddities
March 23, 2021, 02:57:35 AM
A while ago, GTAMan pointed out a penalty time glitch that cropped up in one of Marco's lives (here is a direct link). It indeed is a new one for me; AFAIK it hadn't been documented yet. Here's the glitch: if the final track element is a split (and not a rejoin) connected to the finish line through the straight path, three seconds of penalty are always given, as if the final track element had been skipped. I guess it took so long for anyone to notice it because the only non-decorative reason for having a split immediately before the finish line is if you are making a Le Stunts track, and in the case no one will actually cross the finish line during the race.

This glitch joins a list of known penalty time oddities, one in which all entries so far have to do with splits and multi-way tracks:

  • Instant finish: if the first element after the start line is a dual way split, the entire track can be cut without penalty by driving back to it. Exploiting this one is generally forbidden through rules and precedents (Z85 was one race in which the issue had to be raised), though competitions relying on a checkpoint system independent from penalty time have on occasion accepted it. Two examples I know of are Funny (SDR-RH 2007) and FTT0111 (FTT 2008).
  • Three-element cut: a split doesn't count for penalty checking if the track is rejoined through the non-straight path. I had forgotten about this one until watching my recent USL round 3 lap and noticing that I fully skipped the split at the 180° turn, by a gap of a few metres. Given what we know about the penalty algorithm, I suspect this one is closely related to the glitch in Marco's live.
  • Dual way switching: Quoting the Wiki, "on tracks where the road splits, it is possible to leave one of the paths and re-enter the track through any point of the other one without penalty time, provided at least one track element is crossed before the paths rejoin". Examples abound; entire classes of track designs are shaped by this one.

Then there's Nagasaki, an USC 2014 track full of splits and crossroads which featured bizarre shortcuts skipping large parts of the track. I still have no clue about what was going on there.
#10
(We surprisingly didn't have a non-locked thread solely dedicated to Stressed, so I'm creating one.)

An updated Stressed executable for Windows is now available through Southern Cross and the sticky post at the custom cars subforum. It fixes the GAME.PRE issue reported the other day by GTAMan, and incorporates a number of other improvements which had only been available by building Stressed from the source code. Quoting the changelog in the aforementioned sticky, here is a list of them:

  • Prevent flickering when switching between resources.
  • Added more text resource ids from GAME.PRE/MAIN.RES/TEDIT.PRE.
  • Safeguards against overwriting packed files when saving.
  • Fixed ".sfx" unpacked extension.
  • Improvements to resource size guessing.
  • Default option for parsing unknown resources is to treat as raw data.
  • Import and export of binary data of raw resources as binary data.
  • Added a "move to" action to the materials list.
  • Adjustments for building with QT 5.15 and MSVC 2019.
#11
This list is primarily here to make sure that, as I gradually update the mods page, I don't end up missing any new car  :)

Turbo from Pole Position (uploaded in 2021-03-08)
Ferrari Pinin (uploaded in 2021-03-08)
Porsche 962C Le Mans (uploaded in 2021-03-08)
Ferrari 637 Indy (uploaded in 2021-03-08)
Fiat 500 Abarth (uploaded in 2021-03-16)
Audi Quattro Sport (Group B edition) (uploaded in 2021-03-16)
Porsche 959 (uploaded in 2021-03-16)
Lancia Delta HF Integrale Evo (2021-03-05 update) (uploaded in 2021-03-08)
Lancia Thema 8.32 (2021-03-07 update) (uploaded in 2021-03-08)
Ferrari Mythos (2021-05-07 update) (uploaded in 2021-03-16)
Maserati Shamal (2021-03-11 update) (uploaded in 2021-03-16)
Ferrari 641/2 (uploaded in 2021-03-29)
Ferrari 288 GTO (2021-03-07 update)
Lamborghini Diablo (2021-03-12 update) (uploaded in 2021-03-29)
Ferrari Modulo (uploaded in 2021-03-29)
Lancia Delta Integrale Evo 3 "Viola" (uploaded in 2021-03-29)
Lamborghini LM002 (revamp)
Cizeta V16T (uploaded in 2021-03-29)
Vector W8 (uploaded in 2021-05-08)
Bugatti EB110 (uploaded in 2021-05-08)
Jaguar XJ220 (uploaded in 2021-05-08)
Vector WX3 (uploaded in 2021-05-08)
Citroen CX 25 (2021-02-28 update)
Ferrari 308 Quattrovalvole (2021-02-21 update)
Ferrari 328 GTB/S (2021-02-28 update)
Ferrari 408 (2021-02-28 update)
Ferrari 512 TR (2021-02-28 update)
Ferrari 512 M (2021-02-28 update)
Ferrari GTO Evoluzione (2021-02-28 update)
Ferrari Testarossa (2021-02-28 update)
Lamborghini Bravo (2021-03-15 update)
Alfa Romeo 164 Procar (uploaded in 2021-05-08)
Alfa Romeo SE048SP (uploaded in 2021-05-08)
McLaren F1 (uploaded in 2021-05-08)
Caterham Super Seven JPE
Dodge Stealth R/T Turbo
Dodge Challenger (1970)
Dodge Stealth R/T KIFT
Lancia ECV1
Lancia ECV2
Dodge Viper RT/10
Ferrari 348 TB/TS
#12
Stunts Chat / Race strength estimation, revisited
December 27, 2020, 10:52:35 AM
For a long while, one of my favourite Stunts investigation topics has been the evaluation of race strength, be it merely for the enjoyment of historians and pundits, or to inform some kind of spiritual successor to Mark L. Rivers' SWR Ranking. Some of you might even remember my 2012 thread on the matter. Now, after a long time with this project in the back burner, I have made enough progress to feel like posting about it again. So, without further ado, here is a plot of race strengths covering the 236 ZakStunts races so far:



(Attached below is the Excel file this chart belongs to, so you can have a closer look at the data.)

While this chart might look rather like the ones I shown you years ago, there is one major difference: this time, a clearer procedure to obtain the data has led to values that are meaningful on their own. For instance, consider the massive spike you see just left of the middle of the chart. That is ZCT100, whose strength is around 70. According to the model underpinning the calculations, that number means a pipsqueak of Elo rating 1500 (which generally amounts to lower midfield) would, if they joined ZCT100, have a 1 in 70 chance of reaching a top five result on the scoreboard.

The numbers here aren't definitive yet, as I still want to check whether there is any useful tuning of parameters to be done, as well as to figure out how to estimate some of the involved uncertainties. In any case, I believe they look fairly reasonable. Within each season, the ranking of races is generally very sensible. Comparing different eras of ZakStunts is, as one might expect, trickier. In particular, I feel the model might be overrating the 2010 races a little bit. Also, it is hard to tell whether the model underrates races from the first few seasons (2001-2004) as it moves towards a steadier state. Still, the chart does seem to capture the evolutionary arcs of ZakStunts: a steady increase in the level of the competition over the initial years, culminating in the 2005-2006 high plateau, followed by a sharp drop in 2007, and so forth.




I will now outline how this new strength estimation works. Some of what follows might be of interest beyond mere technical curiosity, for parts of the procedure can be useful for other investigations in Stunts analytics.

(By the way, you can check the source code of my program on GitHub, if you are so inclined.)

When I set about resuming this investigation early this year, I decided to, instead of rolling yet another quirky algorithm from scratch, start from well-understood building blocks, so that, if nothing else, I would get something intelligible at the end. Balancing that principle with the known limitations of my chosen methods (and there are quite a few of them), I eventually ended up with the following pipeline of computations:


  • From the ZakStunts results, compute Elo ratings at every race.
  • Obtain, from the Elo ratings, victory probabilities against a hypothetical 1800-rated pipsqueak, and use those probabilities to parameterise a rough performance model, which amounts to the probability distribution of lap times (relative to an ideal lap) for a pipsqueak.
  • Add a ficticious 1500-rated pipsqueak to the list of race entrants, and either:

    • Use the performance model to implement a race result simulator, which spits out possible outcomes when given a list of pipsqueaks and their ratings, and run the simulation enough times to be able to give a reasonable estimate of the likelihood of a top 5 finish by the ficticious pipsqueak; or
    • Numerically integrate the aporopriately weighed probability density for the ficticious pipsqueak to obtain, as far as the model allows, an exact result for said likelihood.

(Implicit in the above is that my code includes both a Elo rating calculator and a race result simulator, which can be put to use in other contexts with minimal effort.)

Let's look at each step a little closer. When it comes to ratings of competitors, Elo ratings are a pretty much universal starting point. They are mathematically simple and very well understood, which was a big plus given the plan I had at the outset. For our current purposes, though, the Elo system has one major disadvantage: it is designed for one-versus-one matches, and not for races. While it is certainly possible to approach a race as if it were the collection of all N*(N-1)/2 head-to-head matchups among the involved pipsqueaks, doing so disregards how the actual head-to-head comparisons are correlated with each other, as they all depend on the N pipsqueak laptimes. (To put it in another way: if you beat, say, FinRok in a race, that means you have achieved a laptime good enough to beat FinRok, and so such a laptime will likely be good enough to defeat most other pipsqueaks.) All that correlation means there will be a lot of redundant information in the matchups, the practical consequence being that a single listfiller or otherwise atypical result can cause wild swings in a pipsqueak's rating. Trying to solve the problem by discarding most of the matchups (say, by only comparing a pipsqueak with their neighbours on the scoreboard) doesn't work well either: since we only have ~12 races a year to get data out of, that approach will make the ratings evolve too slowly to be of any use. Eventually, I settled for a compromise of only using matchups up to six positions away on the scoreboard (in either direction), which at least curtails some of the worst distortions in races with 20+ entrants. Besides that, my use of the Elo system is pretty standard. While new pipsqueaks are handled specially over their initial five races for the sake of fairer comparisons and faster steadying of ratings, that is not outside the norm (for instance, chess tournaments generally take similar measures).

Elo ratings are not enough to simulate race results, precisely because of the distinction between a collection of matchups and a single race discussed above. A simulation requires a model of the pipsqueak performance, so that individual simulated results for each pipsqueak can be put together in a scoreboard. One workaround to bridge this gap relies on victory probabilities. It is possible, given Elo ratings for a pair of pipsqueaks, to calculate how likely one is to defeat the other in a matchup. Similarly, if you have the laptime probability distributions for a pair of pipsqueaks, you can calculate how likely it is for one of them to be faster than the other. A few seat-of-the-pants assumptions later, we have a way to conjure a laptime probability distribution that corresponds to an Elo rating. As for the distributions, the ones I am using look like this:



This is a really primitive model, perhaps the simplest thing that could possibly work. It is simple enough that there are victory probability formulas that can be calculated with pen and paper. There is just one pipsqueak-dependent parameter. As said parameter increases, the distribution is compressed towards zero (the ideal laptime), which implies laptimes that are typically faster and obtained more consistently (in the plot above, the parameter is 1 for the blue curve and 2 for the red one). While I haven't seriously attempted to validate the model empirically, the features it does have match some of the intuition about laptimes. (On the matter empirical validation, one might conceivably drive five laps on Default every day for a month and see how the resulting laptimes are spread. That would be a very interesting experiment, though for our immediate purposes the differences between RH and NoRH might become a confounding factor.)

Having the laptime distributions for all entrants in a race makes it possible to figure out a formula that can be used to, in principle, numerically compute victory and top-n probabilities against its set of pipsqueaks. In practice, it turns out that victory probabilities aren't a good race strength metric, as the results tend to be largely determined by a small handful of pipsqueaks with very high ratings. To my eyes, the top-5 probabilities are at the sweet spot for strength estimations. I originally believed calculating the probabilities by numerical integration would be too computationally expensive (as the number of integrals to be numerically calculated grows combinatorially as the n in top-n grows), so I used the alternative strategy of simulating the races and afterwards check how often top-5 results happen. The chart at the top of the post was generated after 100,000 runs per race, that is, twenty three million and six hundred thousand runs to cover all ZakStunts races, which took fifteen and a half minutes to perform on my laptop. Later, I figured out that, with sufficiently careful coding, the numerical method, which has the advantage of giving essentially exact results, is feasible for top-5 probabilities; accordingly, an alternative Excel file with those results is also attached. (The simulations remain useful for wider top-n ranges, or for quickly obtaining coarse results with 1,000 to 10,000 runs per race while tuning the analysis parameters.) 




To turn the discussion back to sporting matters, the troubles with wild rating swings and outlier results alluded to above brought me back to the question of listfillers, already raised by Bonzai Joe all those years ago. Left unchecked, a particularly weak listfiller in a busy race can wreck havoc upon the Elo rating of its unfortunate author. That ultimately compelled me to look for objective criteria according to which at least some of the obvious listfillers can be exclued. For the current purposes, I ultimately settled on the following three rules:


  • Results above 300% of the winning time and more than two standard deviations away from the average of laptimes are to be excluded. (The Bonzai Joe rule.)
  • GAR and NoRH replays are only counted if the fastest lap on the parallel scoreboard they belong to is, or would be, above the bottom quarter (rounding towards the top) of the scoreboard. (The Marco rule.)
  • For our current purposes, a car is deemed "competitive" if it can be found above the bottom quarter (rounding towards the top) of the scoreboard, or if it was used to defeat a pipsqueak using a competitive car whose lap was not excluded according to the previous two rules. Only laps driven with competitive cars count. (The Alan Rotoi rule).

These rules were applied to the full list of ZakStunts race entries that I'm using (it was one of those quarantine days back in May). Disqualified race results were also removed; there were a few curious findings in that respect I should write about one of these days. (By the way, ghosts are not included in the calculations, regardless of what their race entries look like.)

(A footnote: the bar for applying the first rule above looks, at first, incredibly low. I considered using a lower percentage, like 250%, and would rather not have the frankly bizarre standard deviation additional condition. It turns out, however, that ZCT029, a difficult dual-way full Vette PG track from 2003, had an extraordinarily broad spectrum of laptimes, including several pipsqueaks with non-listfiller laps beyond the 300% cutoff which would have been excluded without the standard deviation test. Faced with such a peculiar scoreboard, I opted to err on the side of circumspection.)

It remains a tall order to find objective criteria to discard listfillers that won't exclude too many proper competitive laps as a collateral effect. Ultimately, if we were to establish new pipsqueak rankings I suspect different use cases would call for different kinds of ratings. An Elo-like ranking is appropriate for race strength estimations, simulations and predictions, when what is needed is a picture of how well someone is racing at a specific moment in time. For comparing performances within the last several months, though, a ranking of weighed (for instance, by recency or race strengths) race scores within a time window, in the style of SWR, might prove more appropriate. With this kind of ranking, it becomes reasonable to have, for instance, ZakStunts-style worst results discards, which could definitely help dealing with listfillers.

Anyway, by now I probably should stop rambling, at least for a little while :D Questions, comments, criticism, suggestions about the metric and ideas on cool stuff to do with those algorithms are all welcome!
#13
General Chat - ZSC / Car bonus rule change impact review
December 24, 2020, 09:09:37 PM
After my post on the 2020 rule LTB changes, I felt it might be of some use to also have a look at the effects of the 2019 tweaks to the car bonus system. I remember being a little worried back then about the slower bonus changes possibly unbalancing the rotation in favour of powergear cars. Let's see how that panned out.

Here is a summary of top 10 car usage for the 2017 and 2018 seasons, before the changes:


Car Points 1-6 Points 1-10 1st 2nd 3rd Podiums Full Podiums Occurrences Notes
JAGU 67 299 2 3 2 3 1 33
ZF40 62 253 2 2 3 3 2 26
AUDI 56 216 2 2 2 2 2 18
CDOR 56 214 2 2 3 3 2 22
LM02 52 208 2 2 2 2 2 23
ANSX 42 138 2 2 1 2 1 10 Powergear
P962 41 151 2 1 2 2 1 14
COUN 34 126 2 1 1 2 1 12
ZPTR 26 109 1 1 1 1 1 12 2017 only
LANC 26 100 1 1 1 1 1 9


ZMP4 26 98 1 1 1 1 1 8 2018 only
PC04 26 95 1 1 1 1 1 9
NSKY 26 95 1 1 1 1 1 8
PMIN 26 94 1 1 1 1 1 7 Powergear
FGTO 26 71 2 1 0 2 0 5 Powergear
VETT 16 76 0 1 1 1 0 9 Powergear
RANG 9 46 0 1 0 1 0 6 2018 only
ZLET 7 27 0 0 1 1 0 2 2017 only

Total 624 2416 24 24 24 30 18 233


("Points 1-6" are assigned according to the scoring system F1 used in the 90's, 10-6-4-3-2-1, while "Points 1-10" use the current F1 system, 25-18-15-12-10-8-6-4-2-1.)

And here is the same data for 2019 and 2020:


Car Points 1-6 Points 1-10 1st 2nd 3rd Podiums Full Podiums Occurrences Notes
PMIN 62 201 4 1 2 5 1 15 Powergear
P962 61 278 2 1 3 3 1 34
PC04 60 257 2 2 2 2 2 30
LANC 52 209 1 3 3 3 1 21
SUKA 46 173 2 2 1 2 1 18
RANG 45 178 2 2 1 2 1 18 2019 only
ANSX 45 163 2 1 2 2 1 14 Powergear
DBMW 37 180 0 2 3 3 0 22 2019 only
VETT 37 127 2 1 1 2 1 9 Powergear
DAUD 29 102 2 0 1 3 0 9


AUDI 29 90 1 2 1 2 0 6
COUN 27 95 1 2 0 2 0 8
ZTST 25 104 1 1 1 1 1 11 2020 only
ZF40 21 71 1 1 1 1 1 6
JAGU 16 73 0 1 1 1 0 7
LM02 16 65 1 0 0 1 0 7
DMCB 10 34 0 1 1 1 0 3 2020 only
FGTO 6 24 0 1 0 1 0 2 Powergear

Total 624 2424 24 24 24 37 11 240


While usage of powergear cars did increase, it turns out they were being underused before the changes, with the higher frequency of split podiums involving them perhaps masking that. The current scenario is arguably closer to parity, with PG cars having gone from 17.6% to 24.0% of the points in the 1-6 system. In any case, we are definitely not in 2008 territory, and I don't think there will be too many complaints if we keep getting an extra Indy-amenable race per year. (Back in 2008, as we tried to figure out how to make the bonus system run smoothly, the rotation was a touch too slow, resulting in nine (!) powergear car victories, including five all-PG podiums.)

Besides the matter of powergear, another key aspect to look into is diversity of car choices within a race, as increasing it was taken as a goal in the pre-2019 discussions. The data confirms the revised system passed this test with flying colours. We went from 6 mixed podiums over the 24 races of 2017 and 2018 (with 4 of them involving PG cars) to 13 over the following 24 races (6 of them with PG cars). I'd say the current amount of multi-car races is close to ideal: on the one hand, having it way below half of the races, as it used to be a few years ago, felt like a missed opportunity; on the other hand, I suspect having it way above half could be a tad exhausting, given the increased strategical complexity.
#14
Season 2019 / 2019-6 (White Leaf Desert)
August 07, 2019, 01:09:14 AM
We could do with a race thread right now, so here is one  :) The track is attached here, in case you need to download it before the site goes up again; the car for the race is the Jaguar.

I will probably have a second session sometime soon. I feel like I should give all those other paths a try...
#15
General Chat - ZSC / Mid-00's NoRH ZakStunts replays
August 03, 2019, 08:46:18 PM
Back in 2006 and 2007 (and possibly in some earlier seasons), ZakStunts races sometimes had an unofficial NoRH scoreboard. For instance, I remember having taken part in a classic line NoRH parallel race in Z79/Default (you can find discussion of it in the Z79 forum thread, as well as in the December 2007 shoutbox archive); also, elsewhere on the forum Bonzai Joe mentions someone having driven an 1:16 NoRH lap on Z59.

Do those NoRH replays (not just the ones I mentioned) still exist anywhere?
#16
Something I noticed after the Z217 update: For the sake of peace of mind, I have docked six points from the Indy, and four points from the Ranger. While doing so, I had assumed that custom bonus changes would be reverted at the end of the race; however, looking at the bonus changes around Z212 (which had custom +6 for Carrera and +4 for Acura) shows that is not the case. In this case, I certainly didn't mean to set back the rotation of the Indy by six points.

I'm genuinely unsure on whether we should change that behaviour. In any case, it is worth being aware of that if you are going to tweak bonuses.
#17
Competition 2019 / ZCT217 - Pacific Coast Highway
July 28, 2019, 01:06:17 PM
Welcome to my guest track for this season! Compared to Z193, my track from two years ago, this one is a lot shorter (I cut off a ton of stuff that wasn't working well in between the drafts), and probably quite a bit easier as well. Still, I hope that finding the best lines will require a fair amount of exploration.
#18
Competition 2019 / ZCT212 analysis
April 07, 2019, 09:00:01 PM
Top 4 analysis for ZCT212:


Splits FIN MAR CTG DUP
0 0.00 0.00 0.00 0.00
1 22.30 22.55 22.70 22.70
2 39.20 39.45 39.95 40.25
3 53.60 54.20 54.75 55.55
4 60.10 60.65 62.15 61.95
5 68.55 68.90 71.15 70.15
6 81.60 82.50 85.90 86.65

Sectors FIN MAR CTG DUP
0 0.00 0.00 0.00 0.00
1 22.30 22.55 22.70 22.70
2 16.90 16.90 17.25 17.55
3 14.40 14.75 14.80 15.30
4 6.50 6.45 7.40 6.40
5 8.45 8.25 9.00 8.20
6 13.05 13.60 14.75 16.50

Gaps FIN MAR CTG DUP
0 0.00 0.00 0.00 0.00
1 0.00 0.25 0.40 0.40
2 0.00 0.25 0.75 1.05
3 0.00 0.60 1.15 1.95
4 0.00 0.55 2.05 1.85
5 0.00 0.35 2.60 1.60
6 0.00 0.90 4.30 5.05






Two remarks. Firstly, this was a tough track, with multiple spots where one subtle misstep (for instance, a fumbled gear change at the loopcut) could derail one's plans entirely. Carrera tracks do have a penchant for being unforgiving, one (in)famous example being Z86. Secondly, the final cut was absolutely decisive, as comparing FinRok's dream line by the first roof and Marco's spotless take on the second tunnel with everything else reveals. In particular, I have lost a podium place by not realising quite how much that would matter enough (my purple sectors 4 and 5 weren't worth much when followed by a somewhat careless final cut -- in fact, I had to abandon a partial replay with even better sectors 4 and 5 because I couldn't pull off a sufficiently good sector 6 before the deadline).
#19
Competition 2019 / ZCT211 analysis
March 19, 2019, 03:11:49 AM
Here is a top 6 analysis for last month's race -- it's been a while since we last did that, and also it is a nice opportunity to try it out with multiple cars:

(From the second table on, Audi times were corrected by the car bonuses to fit the Ranger ones.)


Splits MAR SEE DUP DRE OVE HER
0 0.00 0.00 0.00 0.00 0.00 0.00
1 41.15 41.95 34.60 43.25 34.95 35.25
2 74.25 75.05 59.50 77.90 62.70 63.70
3 109.80 109.95 88.25 113.75 91.95 93.70
4 125.60 126.65 101.15 130.85 105.00 106.90
5 148.80 149.75 118.75 156.15 123.85 126.80

Bonus MAR SEE DUP DRE OVE HER
1.00 1.00 1.29 1.00 1.29 1.29

Corr. Splits MAR SEE DUP DRE OVE HER
0 0.00 0.00 0.00 0.00 0.00 0.00
1 41.15 41.95 44.49 43.25 44.94 45.32
2 74.25 75.05 76.50 77.90 80.61 81.90
3 109.80 109.95 113.46 113.75 118.22 120.47
4 125.60 126.65 130.05 130.85 135.00 137.44
5 148.80 149.75 152.68 156.15 159.24 163.03

Sectors MAR SEE DUP DRE OVE HER
0 0.00 0.00 0.00 0.00 0.00 0.00
1 41.15 41.95 44.49 43.25 44.94 45.32
2 33.10 33.10 32.01 34.65 35.68 36.58
3 35.55 34.90 36.96 35.85 37.61 38.57
4 15.80 16.70 16.59 17.10 16.78 16.97
5 23.20 23.10 22.63 25.30 24.24 25.59

Gaps MAR SEE DUP DRE OVE HER
0 0.00 0.00 0.00 0.00 0.00 0.00
1 0.00 0.80 3.34 2.10 3.79 4.17
2 0.00 0.80 2.25 3.65 6.36 7.65
3 0.00 0.15 3.66 3.95 8.42 10.67
4 0.00 1.05 4.45 5.25 9.40 11.84
5 0.00 0.95 3.88 7.35 10.44 14.23






Some of the highlights:

  • Sectors 1 (GAR-esque) and 3 (flat out) do suggest a Ranger advantage under those bonuses.
  • Other sectors, however, gave the Audi pipsqueaks enough margin to (barely) remain in contention. There was the high speed outer line in sector 5, and, in my lap, one extra shortcut in sector 2 that wouldn't be as useful with the Ranger due to the lower attainable speeds.
  • While one would be forgiven for thinking nothing interesting would happen in sector 3, it turns out that Seeker managed to claw back almost the whole gap to Marco there, centisecond by centisecond.
  • Following that, Marco managed to fend off the challenge with a quite excellent line through sector 4 (the first dual-way sector).
  • Marco's lap also features an interesting inner dual-way cut through sector 5. Among the top 6, the only one to attempt a somewhat similar line was Heretic -- the alternative provided by outer line, though, meant the inner line was rather less effective with the Audi than with the Ranger.
#20
Stunts Chat / Keyboard issues and ghosting
November 30, 2018, 01:37:12 PM
Folks sometimes report not being able to press some of the keys needed to drive simultaneously (I remember BJ talking about that in the past). Such issues are, in all likelihood, a problem with the keyboard itself, and not with the operating system or with the configuration of any software. The heart of the matter is that, unless you are using one of those fancy mechanical keyboards plugged to a PS/2 port, there will be an upper limit on how many simultaneous key presses can be registered (a limit which might vary depending on which parts of the keyboard you are using). Some cheaper keyboards are bad enough in that respect that they are unable to handle enough simultaneous key presses for Stunts to be played properly. The keyboard of my newish laptop, for instance, has this issue, and so I still need my old desktop keyboard to race. The keyword to use when looking for more information about this issue, or for choosing a keyboard adequate for Stunts, is "ghosting". This page provides an interactive test for your keyboard. In my case, it reveals my laptop keyboard can't handle "Down Arrow", "Right Arrow" and "Z" simultaneously, which is clearly unacceptable.