Author Topic: About ZakStunts's handling of TRK files and my editor Cas-Stunts  (Read 1142 times)

Cas

  • Stunts Master
  • *
  • Posts: 171
    • View Profile
    • Dimioca Labs
Re: About ZakStunts's handling of TRK files and my editor Cas-Stunts
« Reply #15 on: August 28, 2016, 08:04:42 AM »
I'll go in order:
  • Yes. If the track was created originally with the format byte zero (i.e: with another editor), Cas-Stunts will keep this format-byte as default, unless the user forces to change it at run-time by changing the saving format. Of course, Cas-Stunts won't look for meta-data if this byte is zero, but I have an idea for the future about that**  On the other hand, if a track is originally created with Cas-Stunts, the format-byte will not be zero, but it doesn't matter because whatever value it is, it will be kept. In particular, if the track is submitted to ZakStunts, I assume it will be in split format, so this value will always be 152 (constant) and therefore, not create problems with the hash.
  • Yes. It reads the whole first 1802 bytes as a string. I thought about skipping the format-byte, but then thought it makes more sense to include it. I can change that if you think it's better, though. This is still a beta version.
  • Cas-Stunts will currently not allow to enter more than 64 characters for each of those strings. Yet, if some other program generates the meta-data and creates a longer string, Cas-Stunts will still succeed to read the strings to a maximum size of 32767 bytes (because of the signed short value, which is signed in case I want to include some form of error detection or anything like that). Anyway, the whole meta-data should never exceed 12 kilobytes, because when saved as an overlay, that would make Stunts overwrite past the replay area in memory. When loading the strings, Cas-Stunts will truncate them to the first 64 characters currently.
  • I'll post the hash function below
  • I'd had a similar idea to the centralised ZakStunts registry. I believe I could make Cas-Stunts access the ZakStunts database in the future if that is implemented, but I also think I should not rely solely on that. I mean, what if the site one day goes online?  All data will be lost. What if the IP changes for some reason?  Cas-Stunts will have to be updated. What if I'm not here? Somebody else will have to delve in my code to fix it. (**)I have the idea of including an offline registry with Cas-Stunts that would contain meta-data for the built-in tracks and could also include that for ZakStunts tracks. If ZakStunts were not available or there were no internet, then Cas-Stunts could fall back to the offline registry.

The Hash function I'm using is this:
Code: [Select]
'Generate a 32bit hash value out of the given string
Function Hash32(content As String) As ULong
Dim i As Short, hash As ULong

For i = 1 To Len(content)
hash Or= 1
hash *= (i + ASC(Mid(content, i, 1)))
Next i

Return hash
End Function

In other words, we start with a hash=0. For each of the 1802 iterations, the hash is made odd (to avoid accumulation of trailing zeroes), then multiplied by the (one-based) position plus the ASCII value of the byte at that position. This is the simplest function I could think of that has these properties:

  • It will quickly fill and overflow the 32 bits, making the hash more random
  • Adding a constant to one byte and substracting it from another or similar actions will not generate the same hash
  • Swapping two bytes will not generate the same hash
  • No particular value (zero, for instance) will cause the procedure to converge to a constant (usually zero)
  • Despite the repeated mutiplications, factors will not accumulate (especially 2, since we're using binary), for the hash is forced odd at each iteration

If you find a problem with this function, I really would like to know it :P
Earth is my country. Science is my religion.

Duplode

  • Considering trying out spam
  • *******
  • Posts: 3426
  • Through the astral door - to soar
    • View Profile
    • The Southern Cross Stunts Trophy
Re: About ZakStunts's handling of TRK files and my editor Cas-Stunts
« Reply #16 on: August 29, 2016, 12:55:13 AM »
Thank you for the clarifications. Right now only one of those issues concerns me to any extent: the format byte. To present a better case, I tried to think of concrete scenarios where it might lead to complications. I think might have found a not too contrived one, and I wonder how do you see it.

Let's say I want to archive metadata for the tracks of my old Southern Cross competition. Suppose that there are two popular ways of long-term metadata archival: Cas-Stunts and an online ZakStunts database, and that I want to use both, one serving as a fallback for the other. The ZakStunts database would be indexed by a 1802-byte hash, and it wouldn't read the Cas-Stunts binary metadata (at least not directly -- there might be a shared export format, etc.). Now, a natural thing to do would be using Cas-Stunts to enter the metadata and save it in the split format. If I do so, however, the format byte will be changed. Why does that matter? Because if I then add the track to the online database it will go under a different hash than the original TRK, which was available from the competition site up to that point. A pipsqueak who happens to have the old file and tries to check it against the database will get a false negative. Furthermore, I can't even circumvent the issue by writing a tool that creates standalone SMD files, as Cas-Stunts will not load them without the format byte change. If the format byte is not changed, however, there is no possible mismatch. While this issue is more obviously relevant to the split format, I believe it also applies, to a lesser extent, to the combined format as well. Suppose I pick an old, add combined metadata to it, drive a lap on the modified track and extract a TRK from the replay. If the format byte is left unchanged, the TRK at the end of the process would be identical to the original one, which is a small but IMO significant advantage.

In a nutshell, then, I believe the format byte should never be changed by Cas-Stunts. It seems to me a small price to pay for eliminating any potential incompatibilities with other tools as far as the 1802 bytes are concerned -- even with tools unaware of Cas-Stunts, and even when dealing with tracks older than Cas-Stunts itself.

As I said in the beginning, I don't think the other points they bring any more issues. The 32-bit hash question will become a moot point if you accept my main suggestion, but in any case that hash is meant for Cas-Stunts consumption, and other tools won't have to deal with it except when creating or validating metadata in the Cas-Stunts format, so it shouldn't really concern anyone. As for the field length, I first thought of 256 as a generous value (more than three 80-char lines of text!) that would be enough for even quite extended comments. A maximum length of 256 for all current variable length fields would amount to (if I counted it right) a maximum block size of 1086 bytes, almost 10% of the hard limit. It may sound like nothing, but who knows how many extra fields might be added in the future... 64 might be a fine value too, or perhaps there might be different limits for each field.

Cas

  • Stunts Master
  • *
  • Posts: 171
    • View Profile
    • Dimioca Labs
Re: About ZakStunts's handling of TRK files and my editor Cas-Stunts
« Reply #17 on: August 29, 2016, 05:28:29 AM »
I had written a very long message. I'm editing it now, to make it more concise. I think your example is very good and we should continue to use these thought experiments to explain our positions. I already have an idea of a solution. It'd be good to chat and be able to discuss with more clarity, but I'll state my thinking here first:

  • I think the format byte can easily be left as zero if the track is saved in "combined" mode, since the file length shows it's clear there's meta-data. For split files, I think leaving it at zero will force the editor to always look up in the data-bases and if the SMD file is lost, which is something very likely to happen (same as when you forget to copy a HIG file), you will not know there was meta-data. Not a good thing. This is my thought experiment.
  • In the case you describe, you're talking about an old track that has been raced before, probably never going to be modified again, with a name that won't change. For tracks like this and for the tracks at ZakStunts, it makes sense that meta-data be stored in a read-only registry that could be online and/or distributed with a tool like Cas-Stunts, for offline access. For new tracks, I think it's a lot more sensible to keep the info in the most comfortable format for the editor and for passing it without data loss. The best format that serves this purpose is the combined one. It is the safest. This is an important point I've been thinking of. We must admit that the format Dreadnaut proposes is indeed the safest for ZakStunts and archival, but also the format I encourage (combined) is clearly the safest for editing and passing track files.
  • I believe we must seek a solution not by trying to change each other's original concepts, but by trying to make these two concepts compatible. Tracks to be edited, kept and passed in combined format until they are ready to be used, then submitted to ZakStunts registry (and possibly tournament) in the format Dreadnaut considers best. Once a file is not to be modified anymore, meta-data will be read from the registry. Before that, it will be read from the overlay. This is an idea. What do you think?  I would like to discuss this carefully in a chat.
  • Also, if ZakStunts will implement a registry system where everybody can submit tracks, it would be great if I could integrate Cas-Stunts to it, for example, making it possible to submit tracks from within Cas-Stunts to the website or to download them and their meta-data from there. Cas-Stunts would send the data in ZakStunts format and on retrieval would store it back in Cas-Stunts format. For fixed files (no longer to be changed), the files would still be 1802 bytes in the hard drive.
  • About field length. I agree. 256 bytes would be optimal. I used 64 because it was easy to do at the moment. Can change it easily.
« Last Edit: August 29, 2016, 06:52:33 AM by Cas »
Earth is my country. Science is my religion.

Duplode

  • Considering trying out spam
  • *******
  • Posts: 3426
  • Through the astral door - to soar
    • View Profile
    • The Southern Cross Stunts Trophy
Re: About ZakStunts's handling of TRK files and my editor Cas-Stunts
« Reply #18 on: August 29, 2016, 08:29:02 PM »
For split files, I think leaving it at zero will force the editor to always look up in the data-bases and if the SMD file is lost, which is something very likely to happen (same as when you forget to copy a HIG file), you will not know there was meta-data.

That is one objection I hadn't thought of. It is certainly worth considering, and after pondering about a bit I still haven't a settled opinion about it yet...

It'd be good to chat and be able to discuss with more clarity

It would indeed -- this discussion is growing faster than linearly already  :) I believe our forum chat room -- http://forum.stunts.hu/chat/ -- would be a convenient meeting place. I will be online there from around 11 PM onward.

Cas

  • Stunts Master
  • *
  • Posts: 171
    • View Profile
    • Dimioca Labs
Re: About ZakStunts's handling of TRK files and my editor Cas-Stunts
« Reply #19 on: August 30, 2016, 03:56:44 AM »
Very good chat :)  There I am
Earth is my country. Science is my religion.

Duplode

  • Considering trying out spam
  • *******
  • Posts: 3426
  • Through the astral door - to soar
    • View Profile
    • The Southern Cross Stunts Trophy
Re: About ZakStunts's handling of TRK files and my editor Cas-Stunts
« Reply #20 on: August 30, 2016, 04:26:40 AM »
I'm there now (a little late, sorry).