News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

Unstable replays at ZakStunts

Started by Duplode, April 10, 2023, 02:48:47 AM

Previous topic - Next topic

Duplode

#60
@KyLiE As it stands, the main issue with the current rule is that sometimes it will be impossible for the driver to ensure the replay plays back correctly, as the bug won't happen on their computer, but will on someone else's. This has already happened in practice with Frieshansen at ZCT251. (I still stand by my earlier arguments, but this is much starker.)

I like the idea of an indicator in the site for replays that are known to be unstable. If the rule changes and such a feature gets implemented, I volunteer for reviewing laps from the past.

And yeah, more tests on legacy hardware are always welcome!  :)  The #1 test to do, I think, would be with Frieshansen's ZCT251 replays linked to above, as we're seeing a lot of variation from system to system with them. But the deleted ZCT260 replays, the Competition Archive ones I attached a couple posts above, or anything else you feel like trying are fair game, too.



@dreadnaut My current understanding is that the non-buggy timeline is the fast-forward/load-from-Options one, as that is the one which remains consistent across tests, while the watching-with-graphics timeline can change across systems and be manipulated through DOSBox cycles and graphics settings. As for legacy hardware versus DOSBox on low cycles, I think what happens is that you need to run the game on very slow hardware to eliminate the divergences from normal playback, and even legacy machines are often too fast for that. For instance, the configuration menus of DOSBox-X offer the following table of correspondences between processors and DOSBox cycles:

You cannot view this attachment.

The question about replay controls is an interesting one, which I think we might study by what happens around the divergence points. I'll also try to think of a specific experiment for that, probably involving an infinite powerslide track easy enough to be driven NoRH.

Daniel3D

Quote from: Duplode on April 16, 2023, 04:03:45 PMThe question about replay controls is an interesting one, which I think we might study by what happens around the divergence points. I'll also try to think of a specific experiment for that, probably involving an infinite powerslide track easy enough to be driven NoRH.
For that CAS has made a useful tool.
It exports Player input from the replay.
So it is easy to see what the Player did around the divergence..
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Cas

There's another one that allows you to see the keys live. Not sure if I passed that one
Earth is my country. Science is my religion.

Friker

Quote from: Duplode on April 16, 2023, 06:05:00 AMI plan to do further analysis on these replays, as well as on the other ones shared on this thread, including running repldump on them all -- the working hypothesis being that repldump output always matches the load-from-Options outcome.
I can confirm this - I looked at Restunts' c code.
In Repldump there is the same initialization as in Stunts. (check also init lines before linked ones - they are also matching)
restore_gamestate(0);
restore_gamestate(gameconfig.game_recordedframes);

With only difference being framespersec = 20; in Repldump
vs framespersec = gameconfig.game_framespersec; in Stunts

Quote from: Duplode on April 16, 2023, 06:05:00 AMAlmost all of those thirteen replays load properly from the Options menu, and diverge when played from the beginning. The sole, glorious exception is Gutix's ACT13 replay...
This quite strongly supports my second option (relax the current rule).

Quote from: Daniel3D on April 16, 2023, 05:06:34 PM
Quote from: Duplode on April 16, 2023, 04:03:45 PMThe question about replay controls is an interesting one, which I think we might study by what happens around the divergence points. I'll also try to think of a specific experiment for that, probably involving an infinite powerslide track easy enough to be driven NoRH.
For that CAS has made a useful tool.
It exports Player input from the replay.
So it is easy to see what the Player did around the divergence..
This can be seen by ordinary hex editor. And in most cases input is really not interesting. :) And in my last race - also may vary. I think I got to 1:46:00 point by slightly different inputs.

Friker

Quote from: Duplode on April 16, 2023, 04:03:45 PMAs for legacy hardware versus DOSBox on low cycles, I think what happens is that you need to run the game on very slow hardware to eliminate the divergences from normal playback, and even legacy machines are often too fast for that. For instance, the configuration menus of DOSBox-X offer the following table of correspondences between processors and DOSBox cycles:
This is like a godsend. Thank you for sharing this screenshot. I was not sure how low should I get with CPU speed with PCem.. Especially because 286 are completely different beasts to setup from 486 which I was used to. So.. I tried to create a 12Mhz 286 machine (~1510 cycles on DOSBox) and watched my replay.

Guess what - It did not crash. So the result is the same as rewind 1.05s on faster machines..
Fritest - finishes at 1:57.85
Fritest2 - Crashed into water. Went swimming with the fishes :P

These findings don't support any proposed ruling directly, only showing that at slow CPU speeds Play button behaves differently than on higher CPU speeds. At best they show that rewind 1.05s(/more in case of Fritest) is the most consistent method (and the fastest) of all of them.

Duplode

#65
Let me begin this message with some good news: testing with a broader range of example replays has allowed me to better understand how divergences interact with the replay controls, and why some replays are harder to play correctly than others. Knowing such things, in turn, should enable us to quickly identify the reachable timelines of a replay. Properly reporting those findings will require a more careful write-up, which I plan to do no later than Friday.

In the meantime, let's discuss one additional factor that hasn't been mentioned in this thread yet: cameras!

(Preliminary note: keep in mind that these results are what I'm seeing on my computer, with my usual DOSBox settings, and it's perfectly possible for you to see something else. The four example replays I'll mention are attached -- if you feel like checking them, reports of tests on different systems are always welcome!)

This tangent begins with Overdrijf's ZCT232 replay. Yesterday, while resuming an attempt to find the divergence point in it, it initially seemed I was getting different results from one day to the next. Thanks to a passing remark by @Daniel3D several days ago, I was able to quickly realise what was going on: in my system, playing from the beginning leads to a crash on the F2 camera, but not on the other ones! This replay has three main timelines:

  • Crashes into the right bridge wall at 31.25;
  • Crashes into the left bridge wall at 31.25; and
  • Successfully jumps over the bridge and goes on to complete the lap.

I get timeline #1 by loading from options as in my proposal (or fast-forwarding the tape beyond the divergence), timeline #2 by playing (at normal speed) from the beginning on the F2 camera, and timeline #3 by playing from the beginning on the other cameras. Since I naively had been doing my tests always on F2, there was no way I'd get to see the successful timeline unless I coincidentally changed cameras.

Now, while this is a bizarre effect, one might hope that knowing the confounding factor would make things more predictable. But alas:

  • "Maybe it's specifically the F2 camera which is buggy." Nope: Frieshansen's final ZCT251 replay (the one actually on the scoreboard, not one of those shared here) crashes at 1:12.85 from Options, crashes at 1:05.85 when watched from the beginning on F1, F2 or F3, and completes the lap only when watched on F4. (In contrast, Frieshanen's withdrawn replays from the same race load and play normally in my system.)
  • "Okay, so perhaps the F4 camera always works." Nope again: Alan's final ZCT256 replay (which AFAIK hadn't been regarded as unstable up to now) loads correctly from Options, makes a too aggressive cut at the final sector and gets 36 seconds of penalty when watched on F1, and misses the finish line altogether on the other three cameras.
  • "Well, if nothing else switching cameras gives us a way of dealing with complicated cases like Overdrijf's at ZCT232." Sometimes, but not always: Alan's final ZCT244 replay (the one which wasn't deleted) sinks at 58.35 when loaded from Options, and sinks at 1:09.65 when watched from the beginning on any camera. I still haven't figured out how to make that one play to the end here. (In contrast, the deleted 1:13.80 replay shows up as complete when loaded from Options.)

The conclusion I draw from these examples is that, in addition to the other issues discussed so far, the current rule doesn't actually make it easier to watch replays from the archives.



Quote from: Friker on April 18, 2023, 02:02:50 PM
Quote from: Duplode on April 16, 2023, 06:05:00 AMAlmost all of those thirteen replays load properly from the Options menu, and diverge when played from the beginning. The sole, glorious exception is Gutix's ACT13 replay...
This quite strongly supports my second option (relax the current rule).

Indeed. At this point, loading from Options (as in my original proposal) as the primary check and playing from the beginning (with multiple cameras if necessary) as a fallback looks like the natural thing to do. On the one hand, the load-from-Options timeline is, to the best of our knowledge, both the one consistent timeline and that most likely to complete successfully. On the other hand, playing from the beginning whenever necessary allows confirming a few more completed laps as being so, and won't significantly increase the validation workload from what it is now.

Cas

I'm watching Overdrijf's ZCT232 replay and I'm getting exactly the same as you when loading it from options and playing it with F2. Just as you describe, on different sides of the bridge at the exact same time. However, playing with F3 with the default setup from the beginning at normal speed, I see the car going near the start/finish line at 0:36.45, which triggers finishing, that is, Stunts stops reading inputs from the replay and so the car slows down to a stop almost crashing with a boat (it touches it slowly and bonces, then comes to a rest). I then tried it with the other cameras at normal speed and all yield the same result. Using double speed from the beginning, all four cameras crash at the bridge, but only F4 crashes at the beginning of the bridge. The other three crash on the right side of the bridge. I'm using 20000 cycles.

It seems to me that, when Stunts calculates the physics, strangely, instead of using a single deterministic cycle, it is using two parallel "times". That is, on one side, there's a cycle reading inputs and on another, there's a cycle executing the changes. Because they're not the same timeline, they can drift and this is more likely to happen when more things are going on inside the loop, such as rendering, but because the calculation never takes "zero" time, it can happen anytime. It'd be very interesting to get to read how the code does this, because it's really unusual. Why not use a simple single-timeline loop?  I guess it must be part of a strategy to make the game run faster on slower machines. I don't know. If this is the thing, I see no other solution but what you suggested: first, evaluate loading from Options, which is more likely to work, and if it doesn't, go and check the other ones.
Earth is my country. Science is my religion.

Duplode

Here goes the bigger post I promised on Wednesday. I have considered starting a separate thread for it, but at the end of the day it belongs here: there's a ton of relevant context in earlier replies (and it wouldn't have been possible without the input of you all). I'll eventually get to shape it into a Wiki article or two, to keep the information easily accessible.

The main goal here is showing ways to quickly find and watch reachable timelines for a replay in your system. In the process, at least one previously undocumented aspect of the replay system will be reported on.

Definitions

I'll start with a few definitions, just to avoid mixing up similar sounding things. Let's begin with the names of the replay controls, largely taken from the game manual:



Next, a few notions about unstable replays:

  • Fast-forward timeline: the events in a replay as seen by fast-forwarding it from the beginning, loading it from the Options menu, or reproducing it with repldump. To the best of our knowledge, the fast-forward timeline is the same across systems and settings. It is this timeline my proposed verification method checks. (Note the relevant replay control is fast forward, and not fast play/"double speed".)
  • Divergent timeline: a sequence of events in a replay different from the one in the fast-forward timeline. A divergent timeline might be seen by playing the replay (that is, watching with the play button), or through some other use of the replay controls. Whether it will actually be seen can depend on the computer running the game, the DOSBox cycles setting, camera choice, graphics settings, and heaven knows what else.
  • Divergence point: A point in the replay at which two or more timelines diverge. Divergence points are typically (perhaps always) vertex points in a powergear slide, in which the car stops for a moment before shooting into a different direction (note the converse is presumably not true: there are many powergear slides which, as far as we know, don't lead to divergences). By convention, I will report divergence points using the last frame before the timelines split (for instance, Friker's deleted ZCT260 replay has a divergence point at 1:46.00).
  • Unstable replay: A replay known to have a divergence point and multiple timelines when reproduced in some way on at least one system. (Given the way factors external to the game affect the observations, "known to be unstable" is arguably more accurate than "unstable", but let's try to keep the language straightforward.)

How rewinding works

Why are some divergences so easy to fix that you barely notice them, while others are recalcitrant enough to be nightmare fuel? The answer lies in a replay mechanic that is an integral part of RH racing: rewinding.

The first thing to note is that the game does not rewind a replay by moving backwards in time. Rather, it fast-forwards the tape from the beginning to the point you want to reach. That is why rewinding is so slow on the 1990 Broderbund (Stunts 1.0) version of the game (and gets slower the further ahead the tape is). To improve on that, later versions store checkpoints of the game state every thirty seconds, so that there is no need to fast-forward from the beginning, but only from the latest checkpoint. For a demonstration, rewind frame by frame across a xx:00 or xx:30 timestamp. You'll notice the game takes noticeably longer to update once you go behind the checkpoint:



That's a nifty optimisation if replays always play in consistent ways. If they don't, though, things get interesting. For instance, if you are on a divergent timeline, and the divergence point lies behind a checkpoint, there will be a visible discontinuity when you rewind past the checkpoint. Depending on how far away checkpoint and divergence are, the discontinuity can be very obvious, as in Usrin's ZCTP03 replay around 2:00 (divergence at 1:38.95)...



... or incredibly subtle, as in Overdijf's ZCT232 replay around 0:30 (divergence at 28.05):



Why rewinding fixes replays (sometimes)

To keep things simple, for now let's just consider replays with a single divergence point. Suppose that you're on a divergent timeline. Since rewinding amounts to fast-forwarding from the nearest checkpoint, any rewinding within the 30-second window between checkpoints which contains the divergence point switches to the fast-forward timeline. That being so, if said window lies behind a checkpoint, you see discontinuities like the ones above upon crossing it, and rewinding without crossing it will preserve the divergence. However, if you're already into the window, any use of the rewind button will instantly snap you back into fast-forward timeline. For instance, here is what would happen were I to attempt fixing this wayward jump in the divergent timeline of my ZCT099 replay (divergence at 42.00):



This explains why most unstable RH replays complete the lap successfully in the fast-forward timeline. What typically happens is that the pipsqueak enters a divergent timeline while driving and then, a few seconds later but before reaching a checkpoint, rewinds slightly to fix something. At this point, the replay switches to the fast-forward timeline (a change that might be barely noticeable, depending on how close the divergence point is), and the pipsqueak completes the lap on it. For the same reason, replays driven like that are easily played to the end: once the spectator notices something amiss, like an unexpected crash, any rewinding will put the replay back into the fast-forward timeline, which will be successful.

Instability usually gets harder to handle, though, if the divergence point is just before a checkpoint. The most familiar example is Overdijf's replay at ZCT232. Here is a demo of its successful divergent timeline:



My conjecture about what happened in this case (with apologies to @Overdrijf if I'm guessing it wrong) is that Overdrijf entered the divergent timeline at 28.05, completed the jump over the bridge on the first attempt and found no need to rewind past the 30.00 checkpoint for the rest of the session, until the lap was done. That being so, the lap was completed on the divergent timeline. To watch that timeline, though, you need to play the lap from 28.05 or earlier and hope for the best.

Two corollaries are worth mentioning here:

  • Firstly, attempting to "fix" an unstable replay with a successful fast-forward timeline can make things worse if the successful timeline becomes a divergent one which is hard to reproduce. This is likely what happened to @Frieshansen 's laps on ZCT251.
  • Secondly, any NoRH unstable replay must be completed on the divergent timeline. Since NoRH unstable replays are very unlikely to arise in practice, given how instability is associated with powerslides, I had to prepare a set piece; the resulting replay is attached to this post. You'll readily notice there's a moat surrounding the middle of the map. In the divergent timeline, which is the one I have driven NoRH, the car sinks from inside the moat at 2:09.15, whereas in the fast-forward timeline it sinks from outside the moat at 1:56.95. The divergence point comes very early at 30.50, and for a long time after that the differences between the timelines are barely noticeable by the naked eye. Below are two example screenshots at 1:20.00 (left is divergent, right is fast-forward):




That being so, we can turn to @dreadnaut 's question from last weekend:

Quote from: dreadnaut on April 16, 2023, 11:01:14 AMI have one more question: can we generate an unstable replay without using the "forward" button of the replay controls?

Yes: divergences can happen as you drive a lap or watch a replay without anything being done with the replay controls, to the point that unstable NoRH replays are possible. Broadly speaking, the role of the replay controls can be best described as providing a way to switch timelines, not to create them.

The alternative divergent timeline

The "broadly speaking" qualifier was added to the paragraph just above due to a kind of divergent timeline which is unlikely to arise except by toying with the replay controls. It was discovered by @Argammon , with the defining example being the parked-by-the-beach ending of Friker's deleted ZCT260 replay. If you advance the tape to the divergence point (in this case, 1:46.00) and play to the end, you will get a divergent timeline. However, fast-forwarding a single frame ahead from the divergence point and playing from there can result in a slightly different divergence. With Friker's replay, the margins are thin enough that such a subtle change leads to a finish completely different from what happens in the two main timelines.

(As a reminder of how divergent timelines depend on cameras, it is worth mentioning that I don't get the alternative timeline of Friker's replay on the F1 camera, only on the other ones.)

Fast play (aka double speed)

What about fast play -- is it a viable way of watching unstable replays, as has been suggested in the past? It largely is, though, not for the first time, it's a little more complicated than we might have expected.

In the examples I've seen so far, fast play follows either the fast-forward timeline or the (main) divergent one, depending on where fast play is started. Assuming a staring point earlier than the divergence, we have:

  • If the timestamps of the starting point and divergence point end with the same digit, it will follow the divergent timeline.
  • Otherwise, it will follow the fast-forward timeline.

For instance, fast-playing Friker's deleted ZCT260 replay from the beginning (0:00.00) leads to the (crashing) divergent timeline, as the divergence point is 1:46.00, while starting from 0:00.05 leads to the (successful) fast-forward timeline.

Fast play works by stepping through each frame of the simulation normally, as it must be, but only rendering graphics for every other frame (note how the timestamps in the replay bar advance by 0.10 s while using fast play). That we seemingly get different results from fast play depending on whether graphics get rendered at the divergence point is one more piece of evidence suggesting that divergences fundamentally have to do with graphics rendering in the game loop.

Navigating unstable replays

It's about time to try and distill practical advice from the information above:

  • Playing an unstable replay from the beginning, or from any point before the divergence, will lead to a divergent timeline. What this timeline will look might depend on a host of factors, most notably your system and the chosen camera.
  • Fast playing an unstable replay from either the beginning or 0:00.05, depending on where the divergence point is, will lead to the fast-forward timeline.
  • Loading the replay from the Options menu and fast-forwarding the tape from the beginning to the end lead to the same result: the final state of the fast-forward timeline.
  • Loading from Options and rewinding will keep you in the fast-forward timeline. (Fast-forwarding is okay as long as you don't stop at the divergence point, nor play through it.)
  • The divergence point will almost certainly be at the vertex of some powerslide, when the car changes direction suddenly.
  • To find the exact frame of the divergence point, load from Options (or fast-forward to the end), rewind to a candidate frame, and play from there. If you stay on the fast-forward timeline by doing so, the divergence is further back. If you switch to the divergent timeline, either you stopped exactly at the divergence point (and redoing it on the next frame will lead to the fast-forward timeline) or the divergence is further ahead.
  • Watching both the fast-forward and the divergent timeline in the manner described above will usually be enough give you an upper boundary for where the divergence point can be (it must be before the timelines become visibly different). If somehow that isn't helpful, you can systematically find the 30-seconds window the divergence must be in by playing from the beginning, pausing at ~29.95, ~59.95, etc., and rewinding a single frame at each stop until you see a timeline change.
  • If you see an unexpected crash while playing an unstable replay, and rewinding doesn't instantly switches timelines, chances are the divergence point is on the other side of a 30-seconds checkpoint. (A straightforward example is my ZCT232 replay: divergence point at 27.95, crashes on the divergent timeline at 33.05. Overdrijf's ZCT232 replay is similar, but more confusing still because the fast-forward timeline isn't the successful one.)

Daniel3D

Quote from: Cas on April 22, 2023, 01:38:37 AMHowever, playing with F3 with the default setup from the beginning at normal speed, I see the car going near the start/finish line at 0:36.45, which triggers finishing, that is, Stunts stops reading inputs from the replay....
That should not happen I believe. The game stops reading input when finish is triggered and the 20 ticks are counted..
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Overdrijf

Quote from: Duplode on April 22, 2023, 10:34:40 AMMy conjecture about what happened in this case (with apologies to @Overdrijf if I'm guessing it wrong) is that Overdrijf entered the divergent timeline at 28.05, completed the jump over the bridge on the first attempt and found no need to rewind past the 30.00 checkpoint for the rest of the session, until the lap was done. That being so, the lap was completed on the divergent timeline. To watch that timeline, though, you need to play the lap from 28.05 or earlier and hope for the best.

That sounds about right. It wouldn't have been my first attempt at that section, but I'll happily believe that it was the first attempt that happened after getting the divergence. I tend to go "hah, that trick worked, awesome, next part". I might go back to have certain inputs pressed during the jump, but the 0:30.00 checkmark is pretty early in this jump, I probably just didn't go that far.

Duplode

#70
Quote from: Daniel3D on April 22, 2023, 11:00:36 AMThat should not happen I believe. The game stops reading input when finish is triggered and the 20 ticks are counted..

The 20 ticks countdown only happens while driving. If you're watching a replay on a timeline that terminates early, the tape will continue for as many frames as there were in the saved replay even after the lap is finished, in the way Cas described.

By the way, that's indeed a new timeline for me, @Cas  :o I still want to try running the game on different environments here (e.g. QEMU + DOS) to see any of the timelines being reported here that I can't access through DOSBox -- or even brand new ones!

Also, it's about time we do justice to Overdrijf's ZCT232 replay. Unlike I had been saying before, there is nothing pathological about it: just  a bit of unlucky timing (getting a divergence right before a replay checkpoint, as detailed in my previous reply) leading to the lap being completed on a divergent timeline, one which played from the beginning to the finish on his system.

dreadnaut

#71
I'm worried to derail the awesome work going on here ;D  Still, I think here's not really much use in the rule as-is: it's technically 'unfounded', does not match what the [visible] community sees as a priority, and a pain for pipsqueaks.

Options I can think of:

1. Keep rule as-is - Because I'm evil! ::)

2. Relax to Duplode's original proposal - If I understand correctly, if a replay loads to completion, it will also pass the "rewind just before finish line" test, so I'm not sure if this is useful

3. Relax to "replay is valid if it loads to completion" - would still exclude "load to crash" (e.g., Overdrijf's ZCT232 replay)

4. Relax to "replay is valid if it loads or plays to completion" - to casts a wider net, easily validated?

5. Return to pre-2021 rules: "replay is valid if it can be shown to complete" - would cover both fast-forward and divergent timeline replays; we could require the driver to supply the correct steps to make validation of tricky cases easier.

Any others I have missed?

Daniel3D

Quote from: dreadnaut on April 22, 2023, 10:52:30 PMI'm worried to derail the awesome work going on here ;D
Maybe split the topic and move all research to reverse engineering?
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Daniel3D

#73
Quote from: dreadnaut on April 22, 2023, 10:52:30 PM.Options I can think of:


  • 1 | Keep rule as-is
  • 2 | Relax to Duplode's original proposal
  • 3 | Relax to "replay is valid if it loads to completion"
  • 4 | Relax to "replay is valid if it loads or plays to completion"
  • 5 |Return to pre-2021 rules: "replay is valid if it can be shown to complete"
On option 4 and 5, that is difficult,
We would need an X amount of people verifying the validity of it playing to finish.

on 2, rewind i bit and play would filter out false positives. A replay that ends on the road without crashing doesn't necessarily mean that it passed the finish. That is the usefulness of rewinding a bit.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Duplode

#74
@dreadnaut No derailment there, just a ton of switches in this stretch of railroad  :D Hand on heart, two weeks ago I didn't foresee there being so much investigation in this thread -- looking back, though, it was all for the best. Anyway, let's zigzag back to the rules. I think your options largely cover it; below are some quick notes on them.

Options #2 ("Duplode's original proposal") and #3 ("replay is valid if it loads to completion") are the same. As @Daniel3D notes, the rewinding part is just meant to catch laps that look okay but aren't (e.g. passing just wide of the finish tile, or having undeclared penalty time).

Option #4 ("replay is valid if it loads or plays to completion") is Friker's compromise: validate according to my proposal and, if it fails, try playing from the beginning, on multiple cameras if necessary. It is my favourite option at the moment. Much like you say, it casts a wider net while keeping the whole of the validation process in your hands. The one extra subtlety this option has is that, if it is adopted, pipsqueaks will only be 100% sure their replay will be deemed valid if it loads from Options -- they can send a replay which doesn't, but there's a nonzero risk of it being rejected due to you being unable to play it. That might be worth an explicit mention in the rules.

As for #5 ("replay is valid it can be shown to complete"), while I certainly like how it is the most liberal option of all, the procedural untidiness of allowing external proof (videos? testimony from other pipsqueaks?) does count against it. Though I think it could be a workable solution, at this moment #4 might offer a better balance between leniency and clarity.

(Ideally, we would figure out the technical means to easily explore all possible timelines, including the ones not directly available by running the game with one's usual setup, thus covering the external proof requirement of #5 in a fully reproducible way, but we aren't there yet.)