News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

About ZakStunts's handling of TRK files and my editor Cas-Stunts

Started by Cas, August 12, 2016, 06:49:05 AM

Previous topic - Next topic

Cas

I'll go in order:

  • Yes. If the track was created originally with the format byte zero (i.e: with another editor), Cas-Stunts will keep this format-byte as default, unless the user forces to change it at run-time by changing the saving format. Of course, Cas-Stunts won't look for meta-data if this byte is zero, but I have an idea for the future about that**  On the other hand, if a track is originally created with Cas-Stunts, the format-byte will not be zero, but it doesn't matter because whatever value it is, it will be kept. In particular, if the track is submitted to ZakStunts, I assume it will be in split format, so this value will always be 152 (constant) and therefore, not create problems with the hash.
  • Yes. It reads the whole first 1802 bytes as a string. I thought about skipping the format-byte, but then thought it makes more sense to include it. I can change that if you think it's better, though. This is still a beta version.
  • Cas-Stunts will currently not allow to enter more than 64 characters for each of those strings. Yet, if some other program generates the meta-data and creates a longer string, Cas-Stunts will still succeed to read the strings to a maximum size of 32767 bytes (because of the signed short value, which is signed in case I want to include some form of error detection or anything like that). Anyway, the whole meta-data should never exceed 12 kilobytes, because when saved as an overlay, that would make Stunts overwrite past the replay area in memory. When loading the strings, Cas-Stunts will truncate them to the first 64 characters currently.
  • I'll post the hash function below
  • I'd had a similar idea to the centralised ZakStunts registry. I believe I could make Cas-Stunts access the ZakStunts database in the future if that is implemented, but I also think I should not rely solely on that. I mean, what if the site one day goes online?  All data will be lost. What if the IP changes for some reason?  Cas-Stunts will have to be updated. What if I'm not here? Somebody else will have to delve in my code to fix it. (**)I have the idea of including an offline registry with Cas-Stunts that would contain meta-data for the built-in tracks and could also include that for ZakStunts tracks. If ZakStunts were not available or there were no internet, then Cas-Stunts could fall back to the offline registry.

The Hash function I'm using is this:
'Generate a 32bit hash value out of the given string
Function Hash32(content As String) As ULong
Dim i As Short, hash As ULong

For i = 1 To Len(content)
hash Or= 1
hash *= (i + ASC(Mid(content, i, 1)))
Next i

Return hash
End Function


In other words, we start with a hash=0. For each of the 1802 iterations, the hash is made odd (to avoid accumulation of trailing zeroes), then multiplied by the (one-based) position plus the ASCII value of the byte at that position. This is the simplest function I could think of that has these properties:


  • It will quickly fill and overflow the 32 bits, making the hash more random
  • Adding a constant to one byte and substracting it from another or similar actions will not generate the same hash
  • Swapping two bytes will not generate the same hash
  • No particular value (zero, for instance) will cause the procedure to converge to a constant (usually zero)
  • Despite the repeated mutiplications, factors will not accumulate (especially 2, since we're using binary), for the hash is forced odd at each iteration

If you find a problem with this function, I really would like to know it :P
Earth is my country. Science is my religion.

Duplode

Thank you for the clarifications. Right now only one of those issues concerns me to any extent: the format byte. To present a better case, I tried to think of concrete scenarios where it might lead to complications. I think might have found a not too contrived one, and I wonder how do you see it.

Let's say I want to archive metadata for the tracks of my old Southern Cross competition. Suppose that there are two popular ways of long-term metadata archival: Cas-Stunts and an online ZakStunts database, and that I want to use both, one serving as a fallback for the other. The ZakStunts database would be indexed by a 1802-byte hash, and it wouldn't read the Cas-Stunts binary metadata (at least not directly -- there might be a shared export format, etc.). Now, a natural thing to do would be using Cas-Stunts to enter the metadata and save it in the split format. If I do so, however, the format byte will be changed. Why does that matter? Because if I then add the track to the online database it will go under a different hash than the original TRK, which was available from the competition site up to that point. A pipsqueak who happens to have the old file and tries to check it against the database will get a false negative. Furthermore, I can't even circumvent the issue by writing a tool that creates standalone SMD files, as Cas-Stunts will not load them without the format byte change. If the format byte is not changed, however, there is no possible mismatch. While this issue is more obviously relevant to the split format, I believe it also applies, to a lesser extent, to the combined format as well. Suppose I pick an old, add combined metadata to it, drive a lap on the modified track and extract a TRK from the replay. If the format byte is left unchanged, the TRK at the end of the process would be identical to the original one, which is a small but IMO significant advantage.

In a nutshell, then, I believe the format byte should never be changed by Cas-Stunts. It seems to me a small price to pay for eliminating any potential incompatibilities with other tools as far as the 1802 bytes are concerned -- even with tools unaware of Cas-Stunts, and even when dealing with tracks older than Cas-Stunts itself.

As I said in the beginning, I don't think the other points they bring any more issues. The 32-bit hash question will become a moot point if you accept my main suggestion, but in any case that hash is meant for Cas-Stunts consumption, and other tools won't have to deal with it except when creating or validating metadata in the Cas-Stunts format, so it shouldn't really concern anyone. As for the field length, I first thought of 256 as a generous value (more than three 80-char lines of text!) that would be enough for even quite extended comments. A maximum length of 256 for all current variable length fields would amount to (if I counted it right) a maximum block size of 1086 bytes, almost 10% of the hard limit. It may sound like nothing, but who knows how many extra fields might be added in the future... 64 might be a fine value too, or perhaps there might be different limits for each field.

Cas

I had written a very long message. I'm editing it now, to make it more concise. I think your example is very good and we should continue to use these thought experiments to explain our positions. I already have an idea of a solution. It'd be good to chat and be able to discuss with more clarity, but I'll state my thinking here first:


  • I think the format byte can easily be left as zero if the track is saved in "combined" mode, since the file length shows it's clear there's meta-data. For split files, I think leaving it at zero will force the editor to always look up in the data-bases and if the SMD file is lost, which is something very likely to happen (same as when you forget to copy a HIG file), you will not know there was meta-data. Not a good thing. This is my thought experiment.
  • In the case you describe, you're talking about an old track that has been raced before, probably never going to be modified again, with a name that won't change. For tracks like this and for the tracks at ZakStunts, it makes sense that meta-data be stored in a read-only registry that could be online and/or distributed with a tool like Cas-Stunts, for offline access. For new tracks, I think it's a lot more sensible to keep the info in the most comfortable format for the editor and for passing it without data loss. The best format that serves this purpose is the combined one. It is the safest. This is an important point I've been thinking of. We must admit that the format Dreadnaut proposes is indeed the safest for ZakStunts and archival, but also the format I encourage (combined) is clearly the safest for editing and passing track files.
  • I believe we must seek a solution not by trying to change each other's original concepts, but by trying to make these two concepts compatible. Tracks to be edited, kept and passed in combined format until they are ready to be used, then submitted to ZakStunts registry (and possibly tournament) in the format Dreadnaut considers best. Once a file is not to be modified anymore, meta-data will be read from the registry. Before that, it will be read from the overlay. This is an idea. What do you think?  I would like to discuss this carefully in a chat.
  • Also, if ZakStunts will implement a registry system where everybody can submit tracks, it would be great if I could integrate Cas-Stunts to it, for example, making it possible to submit tracks from within Cas-Stunts to the website or to download them and their meta-data from there. Cas-Stunts would send the data in ZakStunts format and on retrieval would store it back in Cas-Stunts format. For fixed files (no longer to be changed), the files would still be 1802 bytes in the hard drive.
  • About field length. I agree. 256 bytes would be optimal. I used 64 because it was easy to do at the moment. Can change it easily.
Earth is my country. Science is my religion.

Duplode

Quote from: Cas on August 29, 2016, 05:28:29 AM
For split files, I think leaving it at zero will force the editor to always look up in the data-bases and if the SMD file is lost, which is something very likely to happen (same as when you forget to copy a HIG file), you will not know there was meta-data.

That is one objection I hadn't thought of. It is certainly worth considering, and after pondering about a bit I still haven't a settled opinion about it yet...

Quote from: Cas on August 29, 2016, 05:28:29 AM
It'd be good to chat and be able to discuss with more clarity

It would indeed -- this discussion is growing faster than linearly already  :) I believe our forum chat room -- http://forum.stunts.hu/chat/ -- would be a convenient meeting place. I will be online there from around 11 PM onward.

Cas

Earth is my country. Science is my religion.

Duplode


dreadnaut

Ah, the headache. As we knew, Stunts does not store Bliss metadata in the replay, but it turns out it does store the value of the reserved byte. I made the mistake to be "open" with the code, and simply ignore that byte in the tracks, and only reject tracks above the standard size.

Now, I've spend the last three hours trying to clean up the mess: I have tracks ending in 150, 151, 152, and... 2 and 3, which should be allowed values. But they all come from Bliss: I made ZCT208 Invaders with the editor, told it to save without metadata, and it's the one ending with 3.

That would be easy to clear: zero the last byte, update the file hashes... but! For the same track, I have replays containing the same track data, but with a different reserved bytes. So I also have to scrub all the stored replays, zeroing the last byte of the track, etc etc.  That's going to be at least another hour.

All this brings me back to non-standard standards, the in-track metadata, and the discussion in this thread :(

It's a Bad Idea™, can we get rid of it?


Cas

I can shine a little light on the issue....

First, for history, on how these extra bytes ended that way. A long time ago, in the first version of the editor which nobody ever had here, I wanted a way that a track made with it could be recognised. I saw that any trailing bytes were lost when the track was edited with the internal editor, but not the 1802nd byte, so I decided I'd set that byte to 149. You won't see any track file ending in this number because this editor was never used by the community. When I made the second version of the editor, that became public, I changed that to 150, to make a difference. So far, so good. This one shouldn't be an issue, but later on, I made a mistake.

At some point, we realised there was a conflict between trailing data in the track file and ZakStunts, besides other programs that would fail to run properly unless the file was exactly 1802 bytes long. Because of this, I created a second format, "split", so that one could save in either format depending on convenience. And here I made a terrible mistake. I thought it was important for each format to have a different final byte. Not only that, but also, I thought 150 should be left behind as an old format. So format byte 151 would come to overlayed files, whereas format byte 152 would go for split files. Some time later, I realised this was a mistake, because now one same track could be saved with different format bytes, so the 1802 bytes were not guaranteed to be the same. What I did was fix this to a single byte value, but, this was to be yet a new one, 153. Right now, Bliss only produces files ending in 153.

How to solve this
For tracks that have extra bytes 0, 150 or 153, there is actually no problem, even though we'd all like all those numbers to be the same. There is no problem because the extra byte is passed to replays and there exist only one of these values for the same track. The internal editor and Track Blaster both keep the extra byte value. But, bytes 151 and 152 are a problem. My opinion is that we can't just change them all to zero and it'll be fine, because if we later encounter a replay made with the other version, we'd have a new problem. From my point of view, the very simplest way of solving this is the following:

1 - Upon encountering track files ending in 0, 150 or 153, leave them as they are. Changing the byte would create a new version of the same file, with another trailing byte. I admit it'd be better if they were all the same, but if we change them now, we'd make it worse.
2 - When a track file comes up with bytes 151 or 152, no matter what we do, we'll still have a problem. Fortunately, they are not many. So if you like this number to be 0, change it. You're not making things any worse. Yet, this is not enough to solve the problem. Number 3 is the fundamental 1:
3 - When comparing tracks or a track and a replay, compare only the first 1801 bytes, not the whole 1802 typical track length. This single action solves the whole issue and it makes the greatest sense. In a way, the whole problem is never a problem as long as we understand tracks being made of 1801 bytes. Byte 1802 never has had any impact on what the track is. Stunts ignores this byte. So why go crazy about it?  I don't mean to argue. Just think about it for a moment. No solution is simpler than that. Of course, hashes should also be made about the 1801 first bytes.
4 - As for me, Bliss currently creates tracks only with final byte 153. This should not cause any problem, but if you wish, I can make the next version use byte 0 instead. Tracks opened with Bliss and then saved again currently keep the same byte. I don't remember how obsolete values 151 and 152 are saved after edition on the new editor, but I can make sure all track files in the new version are saved with last byte 0 if they have a last byte 151 or 152, and new ones always. This should help for the future. I offer an apology about this mistake, but I request to not be blamed on more than I actually did.

You already know my philosophical view of the matter: there is no "standard" here; there never was. Only changes that cause effects and changes that don't. Track Blaster introduced a great deal of changes that weren't previously allowed and nobody came screaming about standards violations. Quite the contrary, people were happy to embrace "illusion tracks" and road on water, even though they did conflict with the internal editor when one would accidentally open the file with it. What I've done is just changing a single byte that does nothing. While I did make the mistake I described above, I've created no mess. If you just blame me, we'll just keep on exchanging messages like this and it will be useless.

So... do we proceed like that?  Any other ideas?  What can I do to help?
Earth is my country. Science is my religion.

dreadnaut

As grumpy as my previous message could be, you should notice that I did not blame you, I described the problems I encountered. Note that the last byte of ZCT208 is actually 3, not 153, so I'm not sure what happened there.

I am not going to redefine the hash of a track as the hash on the first 1801 bytes, because that's not what the hash of a file is, and doing so would confuse anyone trying to compare tracks and unaware of the special definition.

What I'm going to do remove the lenient check and just zero the last byte of every track uploaded on ZakStunts. I'll probably have to do the same for replays as well. Which means that instead of calling sha1_file on the file and be done, I have to load the replay, check the Stunts version, zero the correct byte, calculate the hash on the data.

I would ask you to update Bliss to set the last byte to zero, all the time. If you want to append metadata, you can still do it: have the word at the 1803rd position be a magic number that you can use to recognise the record, followed by the same data as before.


Scratch that, I'll do the opposite and take the concept of reserved byte as strictly as possible. Software does not need to know what it is for, it just needs to persist the value, as Stunts does.

Make sure that Bliss writes one specific value there when you create a new track, if you must, and we're good. However, double check the code: ZCT208 ends with 3, while ZCT200 and ZCT201 end with 2. All the other tracks in ZakStunts end with 150, 151, or 152, which means people need to update to the latest version.


P.S. —In the end I would still feel more comfortable if that value were zero. Setting a non-zero value creates a precedent, and someone might write a program that changes that number to something else. For example, an automatic track downloader that stamps its tracks with 99, because hey, there's a spare byte, and there's no standard. And then we would have different versions of the same track, for no good reason.

P.P.S —apologies, there seems to be some extra bitterness above, but it's been a long day; thanks for not raising the stakes and keeping the conversation on thinking about solutions!

Cas

Sorry, mate. I read this thread and I was about to leave home at that time, so I replied with all the heat in my head. When I cooled down, I wasn't home to change anything. I'm a little susceptible to bitterness.

I had noticed before about some files being saved with last byte 2 or 3. First time, I thought somebody had changed this value for some reason. Even Bliss wouldn't recognise it. Then I found a bug in Bliss and fixed it. I think most pipsqueaks currently have a version new enough so that that doesn't happen anymore. It's bug. I never meant for those values to be output.

Alright, next version, I will switch this byte to zero, but... I will make Bliss not change the last byte on preexisting track files. Otherwise, I'd be making matters worse. In th meantime, with the current version, as I said, all new tracks should only have the value 153, so that shouldn't be a problem.

A rational reply to your slightly bitter "P.S.". Yes, you're right, but that supposed person creating that supposed tool wouldn't be doing what I did. They would be "modifying" a previous value. Bliss doesn't do that. It just uses a value for new tracks that's not the same as that of tracks created with the internal editor. In short, I do understand what you hate about this byte being changed. I am just not doing it.

And about the hash. I guess you're using a function to which you pass a whole file and it returns hash. In that case, it could be complicated to go around the last byte. I was imagining something more like I did in Bliss (I wrote my own hash function and tested it to have balanced sensitivity). Well, I guess number 1 priority is to make sure the value is fixed. That, we already have with the current version of Bliss. Even though I, personally, see no problem in this value having a value other than zero as long as it is fixed, I will change it to zero as I described, on future versions.
Earth is my country. Science is my religion.

dreadnaut

Quote from: Cas on February 24, 2019, 08:33:31 PMA rational reply to your slightly bitter "P.S.". Yes, you're right, but that supposed person creating that supposed tool wouldn't be doing what I did. They would be "modifying" a previous value. Bliss doesn't do that. It just uses a value for new tracks that's not the same as that of tracks created with the internal editor. In short, I do understand what you hate about this byte being changed. I am just not doing it.

Yes, when I realized that with Bliss you are creating new tracks, that makes more sense. Which is way you could leave 153 as a marker. As you said, though, when a tool reads an existing track it should not change the last byte if they save it again.

I'll double check to make sure that ZakStunts follows that rule, and leave all the tracks and replays as they are, because they match the tracks that all the pipsqueaks have on their computers. And I'll go and download the most recent Bliss :)

And sorry again for the grumpy text above :|

Cas

Earth is my country. Science is my religion.