News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

Elo-like ratings for ZakStunts: The Folyami Project

Started by Duplode, January 28, 2023, 07:24:25 AM

Previous topic - Next topic

Duplode

I'm delighted to finally release the ZakStunts Folyami ratings, an Elo-like rating system for ZakStunts! You can check the ratings right now at the Southern Cross site. Below is a quick Q&A about the ratings -- if you have extra questions, feel free to ask them!

Why are there two rankings?

Given that Folyami ratings are pretty dynamic, responding rather quickly to changes in form/current performance, it felt appropriate to have something the more permanent to go along with the ranking of current ratings. That being so, there is also a historical ranking, which lists the highest ever rating reached by pipsqueaks.

Why am I not showing up in the rankings?

There are basically two possibilities:

  • Firstly, at least five completed races are necessary to be included in the rankings, so that ratings have at least a few rounds to stabilise.

  • Secondly, pipsqueaks leave the current current ranking after four races of inactivity, and rejoin upon returning to the competition. (Note that there are a few rules for discarding race results that aim at removing unrepresentative ones, such as those reached with an obviously uncompetitive car. That being so, race entries might, in special circumstances, not be counted for the ratings.)

So don't worry: no one gets excluded from the rankings, and you just have to keep racing for (re)joining it  :)

How do the ratings work?

Here are links to a summary of the ZakStunts-specific aspects of the rankings and a technical overview of the rating system. (I have attached the latter here as a PDF, in case you find that easier to read.)

Can other competitions be included in the ratings?

The Folyami project began as an offshoot of my earlier investigations about race strength, which is why the ratings came into being as ZakStunts-only. Ideally, it would be nice to follow illustrious predecessors such as WRL and SWR and make Folyami an omni-rating covering all Stunts competitions. While I do want to explore ways of achieving that at some point, it can't help but be a project for the long term. Not only there are decades of competition results to be reviewed and formatted, but also harmonising the competitions into a single system could prove challenging, especially given how much the currently active competitions differ from each other.

Will there be an update of the race strength ranking?

Sure! I will add race strengths to the site as soon as I figure out a few details about how to best present the data. By the way, if you have any suggestions of additional features and visualisations for the ratings and the historical data thereof, please do let me know!

Argammon

#1
Wow, impressive!  :)

I did not read how the system works,so please allow me a stupid question:

The current ratings make sense to me,.but I am surprised about the historical ones. Stunts champions of the past like Roy, Bonsai Joe,.and Alain have surprisingly low all-time best ratings.

Could there be an issue with rating inflation?

Edit: It would be really cool if there was a seperate rating for each car. For example, is it true that I underperform with IMSA cars or is that just a myth? But I guess we don't have enough data for any reliable single-car ratings


Duplode

#2
Quote from: Argammon on January 28, 2023, 07:38:12 AMThe current ratings make sense to me,.but I am surprised about the historical ones. Stunts champions of the past like Roy, Bonsai Joe,.and Alain have surprisingly low all-time best ratings.

Could there be an issue with rating inflation?

Excellent question! The system probably underrates a bit results from 2001 and 2002, simply because there was too little time the ratings to settle. Besides that, I think the difference is mainly that, in later years, longer periods of dominance, earlier champions being defeated on-track by upstarts, and comebacks at the highest level (see e.g. Renato Biker) have become more common. Here are two charts to better illustrate the trends. The first one shows the ratings of current ranking leaders:

You cannot view this attachment.

The second shows the mean ratings at each race including pipsqueaks who were active in the previous 12 rounds and reached the 5-race initial cutoff:

You cannot view this attachment.

Quote from: Argammon on January 28, 2023, 07:38:12 AMEdit: It would be really cool if there was a seperate rating for each car. For example, is it true that I underperform with IMSA cars or is that just a myth? But I guess we don't have enough data for any reliable single-car ratings

Though the results wouldn't be very reliable indeed, that would be a fun thing to try! One detail is that we'd only have to decide if races like Z82 or Z98 should count as IMSA races.

Argammon

#3
 not unproblematic, but the results could change Ayrton's GTO rating, BJ's Jaguar rating etc

Cas

Earth is my country. Science is my religion.

KyLiE

Excellent work!  I really appreciate the effort that you put into this.  I'll be sure to keep an eye on my rating in the future! :)

Duplode

Thanks @Cas and @KyLiE ! On keeping an eye on your rating and tracking your progress, right now the only feature (if I can call it that!) the page has related to that is the handmade "personal best" announcement in the header. If you have suggestions on what could be added to make it easier to follow, I'm all ears!

Argammon

Quote from: Duplode on January 29, 2023, 01:10:42 PMThanks @Cas and @KyLiE ! On keeping an eye on your rating and tracking your progress, right now the only feature (if I can call it that!) the page has related to that is the handmade "personal best" announcement in the header. If you have suggestions on what could be added to make it easier to follow, I'm all ears!

There could be a rating graph for each pipsqueak. Yeah, I love creating work for you.  :o

Overdrijf

Very impressive. I would almost encourage you to publish it in some scientific journal for statistics.

Cas

"30 year old cars game becomes world famous after a scientific paper is published on tournament statistics"  8)
Earth is my country. Science is my religion.

Overdrijf

No seriously, applied to something like real car races, or horse races, or, well, you get the point, lots of races. Maybe athletics events, open water swimming, sailing. Preferably stuff where a record time doesn't tell enough of a story because there's a different track every time, different weather, the field of competitors keeps changing. Which second rate cycling teams get access to the Tour de France this year? Is Verstappen more dominant than Schumacher was? Which kayakers do we call for the Ötz Trophy? I am sure there are already lots of systems for tracking the performance of these things over time, and I have no idea how they compare to this version. But it is one of those things for which there oddly enough doesn't seem to be a single good standard.

The Folyami system seems remarkably well thought out, almost too good to be worth it for a few dozen people community player a game. That Youtube account 'd better be bringing in tons if new people, if we have a new powerful tool like this to rank them with.

Cas

This rank system is worth a video and/or wiki article, by the way
Earth is my country. Science is my religion.

Duplode

@Argammon A history graph for each pipsqueak could be done; it's something I'll consider doing in the mid term. The main thing to deal with will be that I will surely want to generate the 90-odd pages/graphs automatically, which means I will have to a little bit of integration between the program that generates the rankings and the program that generates the Southern Cross pages.

@Overdrijf Good point about variability of conditions. I think I could use both kind of examples: variable conditions, to see how the ratings fare, and stable conditions, to test the underlying hypothesis of the gamma performance model. Also, when I go looking for test cases, it is probably better not to focus much on motorsport, as it tends to involve too many confounding factors (for the ratings, there's car and team differences; for the performance model, there's changing car behaviour due to fuel load and tyre wear).

(There's also the matter of finding metrics that express how good the ratings actually are. Though I have done a little bit of that using NDCG, it would be probably sensible to dig deeper into the topic.)

Duplode

#13
The ratings have been updated for ZCT258! This is the first public update of the rankings. This round, we have seen two pipsqueaks returning to the ranking, Friker and Ryoma, and Argammon further improving his personal best from ZCT257. No new entries in the rankings quite yet, though Erik needs just one more counting race entry for that.

Overdrijf

#14
I'm not sure if it would be a bit silly here, but one of the best bits about elo systems is that little dopamine hit from winning a match and gaining 30 elo. Would that be something that could work here, like a collumn for last month or for how much you went up since last month?