News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

Unstable replays at ZakStunts

Started by Duplode, April 10, 2023, 02:48:47 AM

Previous topic - Next topic

Duplode

2023-04-22 update: The proposal below is being updated to better reflect our current understanding of unstable replays. To avoid invalidating the next several replies, removed passages will be struck out, while additions will be in italics. For a rundown on when instability arises and how to work around it using the replay controls, see this post.



I'd like to start a discussion about the unstable replays rule, motivated by the replay invalidations at ZCT260 (see the shoutbox posts from April 8th and 9th for what happened there, and the end note at the bottom of this post for an explanation of what unstable replays are). I will open it by putting a rule change proposal on the table for your (and dreadnaut in particular, of course) consideration.

The proposal

Change the relevant passage of the rule which makes unstable replays invalid (rules page, section "The System", bullet point #2) from:

"[...] and [a replay] should play correctly when loaded in Stunts, from 'Options → Load replay'."

To:

"[...] and [a replay] should show as complete upon loading in Stunts from 'Options → Load replay', rewinding to just before the finish line, and continuing from there."

2023-04-22 update: While checking how the replay ends upon loading from Options should be the primary means of validation, the current procedure of watching the lap from the beginning, with multiple cameras if necessary, should be kept as a fallback. That being so, it might be appropriate to concisely describe the validation procedure and how it is supposed to work in a separate item.

Rationale

My proposal tries to strike a compromise, keeping replay viewing and validation reasonably easy while excluding as few valid-as-driven replays as possible. To expand a bit on that, I'll consider the two goals which, as far as I understand it, motivated the 2021 rule change:

  • Ensuring competition management can validate replays by following a simple procedure with reproducible steps.
  • Avoiding confusing replay watchers who play laps from the competition archives.

Goal #1 is fully addressed by my proposal. In fact, as far as manager convenience goes it even improves upon the status quo, as it is no longer needed to watch the whole lap to be sure it is valid -- just loading from Options, rewinding a little bit, and continuing to the evaluation screen is enough. (This streamlined procedure is what I have always assumed managers short on time would default to on race closing, which partly explains my mix-up when replying to Friker at the shoutbox on April 2nd.) Besides that, unstable replays which are valid according to this procedure (i.e. the vast majority of them) can be watched to the end on normal speed through the well-known replay controls trick of rewinding a little around shortly after the point of divergence and resuming play, and are reproduced correctly by the repldump tool, and usually, with an adequate choice of starting frame, when played at double speed as well. (And conversely, Overdrijf's extraordinarily hard to verify replay from ZCT232, which triggered the rule change, fails the streamlined verification.)

2023-04-22 update: Considering what we have learned over the past two weeks about unstable replays -- in particular, that divergent timelines are system dependent, and that replays such as Overdijf's ZCT232 one aren't as rare as previously thought -- it makes sense, as suggested by Friker, to keep the current, play-from-the-beginning validation as a fallback method. It is fairer to drivers to accept as many unstable replays as we reasonably can, and keeping the current method as a fallback won't noticeably increase admin workload. The main caveat is that, since divergent timelines are system dependent, if a replay is only finished on a divergent timeline and not on the load-from-Options/fast-forward one there is no guarantee that the manager will be able to watch the lap being completed. That being so, it seems appropriate to keep the new wording of the rule as it was in the original proposal, recommending pipsqueaks to check if their laps show up as complete upon loading from Options -- if they don't, there is a nonzero chance of them being ruled invalid, even though the fallback validation might rescue them.

As for goal #2, there are two aspects to consider. Firstly, there being a reproducible way for managers to validate replays means every one else can check them in some way, and as pointed out above that also translates to reasonable ways of actually watching the laps. Secondly, I feel we should not penailse, in multiple ways, pipsqueaks in the here and now for the hypothetical benefit of a future visitor who might be confused by archival replays -- specially considering that the pipsqueaks have valid-as-driven laps and are at no fault, and that we can provide easy access to information that clarifies the matter to visitors (through Wiki articles, replay package readmes, and so on).

If need be, I can go into further detail on the points above, and in particular on what makes it so taxing to verify replays in the way specified by the current rule. But I have talked for long enough for an opening post, so it's time to give the floor to you. 



End note: what is an unstable replay?

An unstable replay is a lap, usually a RH powergear one, which plays differently in-game depending on how the replay controls are used. In the most common scenario, the player completes the lap normally without noticing anything strange, and the replay shows up as complete upon loading from the Options menu and looking at the end of the tape. However, rewinding to the beginning and playing at normal speed shows the car as crashing or veering off track.

Friker

So, I would like to repeat the question from ZakStunts's Shoutbox: If you would start a new drive and input exact same inputs as in a replay - do you think/know that result of that driving is such 1. exactly as it would play in a replay from beginning/2. exactly as last position of replay when loaded from a file?

For me the answer is a key for my stance of this topic. I would like to have a rule in accordance of what would happen if the input from a replay was driven during gameplay (and I silently hope it is the second option :) ).

Until the confirmation of what would happen I would like to relax the rule to Overdrijf's proposal - "counting any replay that can be played in any way as valid".


Daniel3D

It is option two. But, when you are Replay handling you basically are constantly time traveling. So the game has to recalculate from the point you enter the timeline. That can give a different result because of a rounded position calculation at that point. That can produce a slight offset, that is different when replaying the replay.
That little difference can be the difference between finishing and crashing.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Friker

Quote from: Daniel3D on April 10, 2023, 05:09:11 PMIt is option two.
Are you sure? How do you know? That would mean that current rule does not match "reality" - state that would be reached if "a replay would be driven".
Quote from: Daniel3D on April 10, 2023, 05:09:11 PMBut, when you are Replay handling you basically are constantly time traveling. So the game has to recalculate from the point you enter the timeline.
I was under impression that when you are doing RH the position is recomputed always from the beginning - at least when you rewind backwards (that's the reason why it is slower to rewind than to move forward).
Quote from: Daniel3D on April 10, 2023, 05:09:11 PMThat can produce a slight offset, that is different when replaying the replay.
That little difference can be the difference between finishing and crashing.
What exactly is producing a slight offset and a slight offset related to which other event?

I am sorry about asking so much but I would really like to understand different ways of simulation/how Stunts engine works.

Daniel3D

You create a cut at the position you stop. For normal gameplay that is not an issue because the game is precise enough at that level. In a high speed Pg lap it could be not perfect.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

dreadnaut

#5
Quote from: Friker on April 10, 2023, 03:43:19 PMIf you would start a new drive and input exact same inputs as in a replay - do you think/know that result of that driving is such 1. exactly as it would play in a replay from beginning/2. exactly as last position of replay when loaded from a file?

I'm a bit confused about this question, doubly so as Daniel mentioned replay handling which didn't seem to be what the question was about 🤔

I think the answer is "always 1, sometimes both". Entering movements (driving!) via keyboard, mouse, joystick or replay file appears to be equivalent. The fast-forward option used when loading a replay (without the lap actually appearing on screen) seems to handle things differently: when you load a replay and see the end result*, some unstable replays show a crash screen.

(*) with some actions you load the the replay and see the starting line instead

Quote from: Duplode on April 10, 2023, 02:48:47 AMI'd like to start a discussion about the unstable replays rule, motivated by the replay invalidations at ZCT260

Thank you for starting this thread Duplode, and for the thorough prososal. There is one bit missing though, which I think is important to direct the discussion: what is the problem we are trying to solve with this rule change?
 

Daniel3D

#6
Quote from: dreadnaut on April 10, 2023, 06:43:56 PM
Quote from: Friker on April 10, 2023, 03:43:19 PMIf you would start a new drive and input exact same inputs as in a replay
I'm a bit confused about this question, doubly so as Daniel mentioned replay handling which didn't seem to be what the question was about 🤔
Quoteassumption is the mother of all fuckups
I assumed the replay in the example being a RH PG replay, because the original proposal is based upon an issue that is (exclusively?) related to RH PG replays.

And I may read/understand the question wrong.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Frieshansen

Although I have already benefited from a loose interpretation, I still think that a valid replay should run completely from start to finish. But we need a well-defined way to determine that.

My solution would be a closed and clearly defined reference system on which the replays are played. Only if they run completely through there, the replay is valid. The reference system should reflect as well as possible an installation of stunts on a real computer of that time. Nobody knows on which system we will play in 20 years and therefore it is good to orientate on the original - that will always be the reference.

In my opinion DOSBox is out of the question, because it is not very close to the hardware. Better would be the very accurate emulator PCem (but only because it can hardly be realized with a real computer :) ). The perfect solution for me would be a virtual system, to which the replay is transferred via an API. It then outputs whether the replay is valid or not. Maybe even with a video. The whole thing could then also be used in a competition so you know short time after the upload for sure if the replay is valid.

I understand the point that it's difficult to spot mistakes while creating the replay. But just like, for example, sometimes a penalty-time isn't spotted until too late after a difficult passage, that's part of stunts for me, too.

Whatever we decide in the end, I think it is important that there is a clear procedure / rule with which everyone can live and which leaves no more room for maneuver, so that we don´t need this discussions about such replays anymore. As long as the system described above does not exist, I am for the current rule.

Friker

Quote from: dreadnaut on April 10, 2023, 06:43:56 PMEntering movements (driving!) via keyboard, mouse, joystick or replay file appears to be equivalent. The fast-forward option used when loading a replay (without the lap actually appearing on screen) seems to handle things differently.
How do you know that the first sentence is true? That is what I am asking - there are two options - either driving the same inputs as in a replay produces output either 1. as during replay (load replay, rewind to beginning, press play) or 2. as during "fast-forwarding" (even with one-step/0.05s forwarding - load replay, rewind to beginning, press fast-forward for one frame, repeat pressing fast-forward for one frame until finish). The point of my question is that I would personally prefer if the final ruling on valid replays was based on (what would be the result of) inputs from replay driven during a new drive (because ultimately they were driven and input during racing). That would result in: if my 1. option is true then I would agree with current ruling, if my 2. option is true then I would agree with the new proposal (load replay+rewind 1.05s).

But for that I would like to know how engine works. Not only guessing. We as community have Stunts code reversed so I would assume that there are some folks who know answer for this question for sure/can debug code to prove either option. (And I would be glad to learn this magic so if you can teach me I am ready to listen. :) )

Quote from: Frieshansen on April 10, 2023, 07:28:32 PMBut we need a well-defined way to determine that.
I would say if the folks who know how Stunts code works answer than we would have pretty deterministic way. I have pretty strong feeling that if it is a rounding error/overflow this is either 100 percent reproducible on all machines (it is some kind of forgotten register which is not re-set in assembly when you press play instead of fast-forwarding) or 100 percent different on different real-life machines (I can only assume different execution of operations in a given processor/overflowing of registers on different systems) thus there is no "ultimate true system" (even with PCem) so we would end up with one arbitrary chosen and in that case I would choose DOSBox packed in official .zip file not because of accuracy but because of accessibility - even checking if a replay is valid on my computer was PITA - handling it via some API would be ridiculous for me.

Quote from: Daniel3D on April 10, 2023, 07:28:21 PM... an issue that is (exclusively?) related to RH PG replays.
Maybe there are some high speed magic carpets which behaves differently?

dreadnaut

#9
Quote from: Friker on April 10, 2023, 08:35:47 PMHow do you know that the first sentence is true?

Because it contains the word appears ;D  Jokes aside, I would expect the code to have one "game loop" that connects inputs, timings, simulation, and graphics code. The fast-forward feature of the game remove timings and graphics, and joins together (replay) inputs and simulation. That's likely to be a separate loop, which can therefore behave differently. The alternative (both fast-forward and game are handled by the game loop) would require a bunch of conditions in the game code, which would make it more complicated and slower — it's possible of course, but seems inconsistent with the optimisation practices of the 80s and early 90s.


Quote from: Friker on April 10, 2023, 08:35:47 PMWe as community have Stunts code reversed so I would assume that there are some folks who know answer for this question for sure/can debug code to prove either option.

I would compare our current knowledge to: we took apart an amazing mechanical clock, and we are looking at the single gears, or small sets of gears. That is quite far from understanding how the complications and movements behave in the live clock, but it's the starting point to get there.

The single pieces of code are meant to do one things, but how they all interact together to create two different results... I don't think we understand that yet.

Frieshansen

Quote from: Friker on April 10, 2023, 08:35:47 PMI have pretty strong feeling that if it is a rounding error/overflow this is either 100 percent reproducible on all machines (it is some kind of forgotten register which is not re-set in assembly when you press play instead of fast-forwarding) or 100 percent different on different real-life machines (I can only assume different execution of operations in a given processor/overflowing of registers on different systems) thus there is no "ultimate true system" (even with PCem) so we would end up with one arbitrary chosen and in that case I would choose DOSBox packed in official .zip file not because of accuracy but because of accessibility - even checking if a replay is valid on my computer was PITA - handling it via some API would be ridiculous for me.

My problem with DOSBox is that it plays some replays differently on different computers despite having the same version and config. That's why it's out of the question for me for a predictable system.

Duplode

#11
It seems I did succeed in starting a discussion -- great!  :D

Quote from: dreadnaut on April 10, 2023, 06:43:56 PMThank you for starting this thread Duplode, and for the thorough prososal. There is one bit missing though, which I think is important to direct the discussion: what is the problem we are trying to solve with this rule change?

In ZCT260, we have seen laps from two pipsqueaks being deleted even though they were valid as driven. (In what follows, I will use "valid" as shorthand for "valid as driven" -- more on that in a moment.) I believe we should strive to have as few valid laps deleted as we reasonably can. In particular, if there is a simple validation procedure that doesn't report invalid laps as valid, and reports valid laps as invalid far less often than the current procedure, we should switch to using it -- ergo, my proposal.

The concept of "valid as driven" that I'm using is a really simple one. It is certain that, when Friker drove his deleted lap, he saw the evaluation screen at the end of the session, and it said "Elapsed time: 1:56.95". If we were sitting by his side at that moment, we'd accept that was a valid result, and no one would care about how the replay would be reproduced, or any of the technical minutiae we've been poring over here. In brief: the evaluation screen shown to the driver is the ultimate source of validity built in the game, and we should honour it to the maximum possible extent.

Relative to the evaluation screen, the replay file is secondary, derived, accessory -- a means of transmission, useful for when you can't be in the same room as your fellow pipsqueaks. Replay files are pretty reliable overall, but not completely so: watching an unstable replay will sometimes mislead one into thinking a lap driven to the finished by a pipsqueak was left incomplete. Fortunately, we are able to easily detect most such inaccuracies, which is what I'm suggesting that should be done.

Quote from: Friker on April 10, 2023, 03:43:19 PMIf you would start a new drive and input exact same inputs as in a replay - do you think/know that result of that driving is such 1. exactly as it would play in a replay from beginning/2. exactly as last position of replay when loaded from a file?

We don't know for sure, and it's likely that will only change when the code of the game gets analysed in enough detail for us to find exactly what causes the underlying bug. repldump plays unstable replays correctly (that is, to the finish line). That, however, is not enough to be sure that #2 holds: repldump only includes the core parts of the game loop needed to update the gamestate data structure, and as dreadnaut notes the bug might lie in an interaction with something outside of this core.

In any case, there is a deeper issue at play here. I don't think the hypothetical verification procedure you suggest (starting a new drive and entering the exact same inputs as recorded in the replay file) is actually the gold standard we should abide to. Since ZakStunts is an RH competition (as @Daniel3D has underlined), that procedure is not completely faithful to how the lap was actually driven, as it treats all replays as if they were driven without RH. If, in the course of an RH racing session, a timeline divergence affects a lap that the pipsqueak will eventually drive to completion, the divergence is an integral part of the gameplay, and not merely an artefact of replay watching. What matters at the end of the day is that the pipsqueak completed the lap and reached an evaluation screen displaying a lap time, thus making it valid.

Quote from: Frieshansen on April 10, 2023, 07:28:32 PMMy solution would be a closed and clearly defined reference system on which the replays are played. Only if they run completely through there, the replay is valid. The reference system should reflect as well as possible an installation of stunts on a real computer of that time. Nobody knows on which system we will play in 20 years and therefore it is good to orientate on the original - that will always be the reference.

In my opinion, this is not a direction we can afford moving in, for the reason Friker alludes to: most people rely on DOSBox to play, and so relying on an different and uncommon setup for validation would make racing less accessible. (Still, the issue of replays playing differently across DOSBox-running computers does sound worth investigating. Do you have any example replays at hand?)

Daniel3D

#12
Ok.
So, if I understand correctly.
Quoterepldump only includes the core parts of the game loop needed to update the gamestate data structure,
That is the same code the game uses upon loading a replay.
  • This is the replay as driven.

When replaying the replay the game has to draw the environment and interactions again, it can treat edge case calculations different (appearendly)
  • Replaying can change the outcome just as opponents behaviour can vary.

So. Loading a replay and rewinding enough to see the finish line should verify the replay "valid as driven"

Can replaydump be used to create an automated replay validator?
Check if the replay ends in the neighborhood of the Finish?
Or even check if checkpoint areas are passed?
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Duplode

Quote from: Daniel3D on April 11, 2023, 08:58:53 AMCan replaydump be used to create an automated replay validator?

Yes, in principle this can be done through repldump. Its output has all the information we might care about a lap, including the information needed to know if the lap was completed.

There are two caveats I find important to mention. Firstly, the result of such a validator is not truer than the in-game experience of the pipsqueak: as per my argument above, the evaluation screen rules supreme, and the accuracy of a validator is a matter of how well it can reproduce it. Secondly, to the best of my knowledge a repldump validator will give the same results as the streamlined verification procedure that I'm proposing. In any case, a repldump validator would certainly be an improvement over the status quo on this issue, as it would confirm more valid replays as such than the current procedure.

Daniel3D

I thought of the replaydump based replay verifier as tool to aid moderation.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)