I'm looking at the assembly source code. We've already been working on it to make the needle colour mod and the contents are pretty clear locally, but I'm very lost as regards the general structure. I'd appreciate if you guys can point me out in the general direction. It'd help me locate some things and organise.
I can see that there are a number of segxxx.asm files, which contains most of the code. For each of these, there's also a corresponding segxxx.inc. But there's also dseg.asm and dseg.inc, there's custom.inc, segments.asm, structs.inc and dseg.map. I think the map file, like the obj files, is something that's produced during compilation and can be ignored, but the other files, I'm not quite sure what they represent. My main question is here is: where is the code start?
its a Medium-Model (https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=104012) DOS Exe
that means multiple code-segments (far calls to code if in another segment), a single data-segment (so data is always NEAR adressed, not data-segment changes needed)
the original Stunts is based on "Microsoft C 5.1" (from ~1988, years before the Visual C stuff)
so there is some stuff in the code that comes from the standard-library and the compiler - for example the
code around the main function etc.
the assembler source is currently assemble-able with TASM only (would be nice to port to WASM/MASM/UASM to be able to build under any system - just takes time its not super hard - minor differences)
the splitting to the segment files was primary done for better overview/seperation and for beeing able to easier be able to check if a assembled object
is binary exact to the original segment block (due to tiny difference in the assemblers (optimization features) or redundant commands some codes can be expressed with different opcodes of different size - what will corrupt non-symbolic-offsets) it was needed to check first if the resulting exe is absolutely exact to the original
the segment inc files are forward declaration so it easy to give every segments access to the "globals" around
the other inc and asm files are more for making it assemble-able or collect type definitions (that do to their nature do not bases on code) outside of the code
most (98%) of the code is generated with a script from IDA Pro - so changes to the IDA Database (IDB) will result in differently generated code
also the overload with C functions is done in this script
there are some build types - the pure original assembler (directly based on the IDA information), a variant were the standard-library is used and combined with already ported C functions (that are much easier to read then the assembler functions)
so there was never handwritten assembler code - everything (except your changes) is fully automaticly generated by IDA
IDA-Screenshot: https://pasteboard.co/AnVaANHb0Qq3.png
Thank you so much. Even being something automatically generated, it does help me a lot to have a context on its structure to better follow it and understand it as I work on it. For sure, porting to other assemblers would be cool.
Things I would like to do would be of the sort of easily inserting things knowing that they won't break the code (because of alignment, like it happened while we were trying to build with the dual-colour needle at first) and extracting parts of the code replacing them with others that would take their pointers, etc. It'd also help me analyse how some things are internally done that I could later use for inspiration in creating another engine. There's a lot of work in that original code.
Easily identifying functions that were part of the C run-time library and separate them from Stunts-specific functions also helps navigate the sea of code.
the code-segment count is defined by the Microsoft compiler/linker
seg010 is for example std library code most others are game code, the splitting is mostly up to the compiler or how the libs were designed at start, there seems (not prooved) parts that are fully assembler based (maybe the engine, but could be also that only some functions, not segments are pure assembler based)
QuoteThings I would like to do would be of the sort of easily inserting things knowing that they won't break the code
alignment isn't a real problem here - just changing the offsets of code is
hard to tell as long there is no deep analysis of non-symbolic offsets in the code
evil stuff like addressing a variable by using another variables-symbols plus a offset etc.
the very first routine that gets run when the exe starts is (this is the first code that gets jumped after DOS loaded the exe and done the relocation)
seg010:0012 start proc near
this is the pre main routine that setups the stdlib stuff etc and calls the user main function
this is the standard int main(argc,argv,envp)
seg000:0000 ; int __cdecl stuntsmain(int p_argc, const char **p_argv, const char *envp)
and code like this for example is problematic
seg010:007C mov di, 55CAh ; offset in dseg where uninitialized data starts
seg010:007F mov cx, 0AD20h ; original size/end of dseg
seg010:0082 sub cx, di
the real problem is: you can not easily test if your changes are correct by just playing the game and looking for bugs - its so super easy to
introduce bugs without noticing it but the second or third extension - weeks/month later will trigger it hard enough etc. - working on
pure disassembled code is much much harder then working on real handwritten assembler :(
For sure! There are no "intentions" in dissassembled code, so whenever you change something, you have to do it in a way that forces everything to fail if there's any mistake. Alignment issues not necessarily result from purposeful alignment by the compiler. Like you say, sometimes a reference can be pretty dangerous to work with and then, when you make a change, it looks like an alignment problem. In practice, it is, but it was not meant to be.
The part that we can do rather confortably is... well... reading the code. 3D physics are going to be really hard to read because I expect fixed point arithmetic there, which doesn't look nice in assembler.
Quote from: Cas on September 02, 2022, 10:30:28 PMAlignment issues not necessarily result from purposeful alignment by the compiler.
alignment means the direct positioning of data/code to fullfill "others" needs: hardware(bus) or for example dos API needs - the normal positioning of data and code in an executable is usually not called "alignment" (inside of struct there is also "padding")
alignment is nearly not needed in 16bit code on x86 (but very needed for SPARC for example)
dword, words are not aligned (as in a normal 32bit windows/linux program) in stunts - usually the old compilers just ignored padding in any form
and most handwritten assembler code also ignored any form of alignment or padding - everything is packed together with no space between
so we normaly talk about offsets the change
16 bit is the width of the paragraph in real mode. As you mention, there never is an actual need for alignment, but if the code has been programmed carelessly or optimised for speed, sometimes the calculation made to locate a certain variable in memory is not rigorous and is based on the assumption that the offset will be zero relative to the paragraph. So what I mean is actually aligned to the paragraph boundary.
I can't tell, in the case of Stunts here, but exactly the calculation is for a certain part of the code, but I can tell you what we experienced with Daniël when we were trying to get the needle colour code running.
The first code did not require any displacement from the original because only a word at a fixed location had to be changed. That worked flawlessly. But when we wanted to insert new code, I knew there was a chance of it not working because of Stunts expecting the code at a certain location. When we tried it, the game ran, but there was corruption in the video. You could see that the images were there, but all graphics positions were being miscalculated. Then, since the game did not crash, I had to think that it could be solved by finding the padding necessary to achieve alignment with the paragraph. So we added a padding byte one after another and on each compilation, the result changed, but still failed, until we found the sweet spot.
Again, I know this is not something that has to do with the compiler or the architecture. It's Stunts code that's assuming this alignment, so it's "software" alignment. But well... in the end, the result was that. Maybe if you play with it, you can recognise better than I how it is being calculated, but I assure you that it's there!
so we talk about segment alignment (i think that is your case here)
and sometimes about offsets that are partially (or fully) non-symbolic
seg011 segment byte public 'STUNTSC' use16
the "byte" in the segment definition means no alignment so the assembler
will not put in alignment bytes before - but the segment needs to start
at a paragraph (or divideable by 16) address that far code/data dependencies can work
https://docs.microsoft.com/en-us/cpp/assembler/masm/segment?view=msvc-170
the reason for having "byte" as segment alignment here is due to disassembling
the disassembler can't detect if the code works or not - or if the code uses some magic trick to make it work at runtime so it falls back to 1:1 reversing aka "keep the byte offsets intact" - that means there are sometimes alignment bytes around the segment that the disassembler didn't detect as segment alignment (there are so many posibilities so the disassembler just uses the always working one)
seg011 segment PARA public 'STUNTSC' use16
would force the assembler to always align the segment to a paragraph adress - but the hard introduced bytes before/after the segments needs to be adjust to get the very same binary again
that would be maybe a first step
the partially or full inner non-symbolic-offset problems are different
O. I like that idea. Not that I fully understand what you mean but I like it anyway.
If this works on unmodified code and does not break if we insert the extend needle color code (if i understand correctly)
Then we may have an easier way to introduce new code..like a showroom that accepts all cars.
Honestly, I'm not familiar with the directives of Turbo Assembler, so I just tried to align the bytes manually. And I agree that, when reading the code, we can't tell if something starts at a location because the previous block was padded with extra bytes or because it just ended there, so we just align to the byte. It makes sense.
Of course, even having seen it work, I don't feel 100% sure that all is OK with the module we compiled for the needle colour. That is, some parts of the code assume that this block starts with the paragraph. Another, at some other point, might it expect it to be, for some reason, located within a certain fixed distance of some other block and perhaps this assumption didn't fall in the tests we've made. We can't tell. But I think it's pretty safe to assume this is it and it will continue to work. Working with disassembled code, you never know if you broke it if it seems to work well.
Anyway, being able to follow the structure of the whole thing will be very helpful.
As regards replacing the showroom, this should be no different from the needle module. That is, yes, we'd be dropping a portion of original code and plugging something else there, but the technique to do it is basically the same. Again, the same thing can happen.
I never seem to get the time, but I'll try to give the code a deeper read :)
Quote from: Cas on September 04, 2022, 08:45:26 PMI never seem to get the time, but I'll try to give the code a deeper read :)
The code is well named for the most part.
I still intend to make a document (spreadsheet probably) that just documents the segments with simple descriptions of the functions and where calls go to and other connections.
For the menus that is quite easy.
For the engine (biggest part) its more difficult. More undefined code there.
Quote from: Daniel3D on September 04, 2022, 12:11:55 PMIf this works on unmodified code...
that is what i wrote :)
Quotebut the hard introduced bytes before/after the segments needs to be adjust to get the very same binary again
Quote from: Cas on September 04, 2022, 08:45:26 PMwe can't tell if something starts at a location because the previous block was padded with extra bytes or because it just ended there, so we just align to the byte
one need to check if removing these bytes + segment align=para works - it should, but that isn't just replacing byte with para - every segment needs to be checked in the resulting exe to prove that it works - because you said it need to be aligned and para was were common in that days
Quote from: llm on September 05, 2022, 10:25:59 AMQuote from: Daniel3D on September 04, 2022, 12:11:55 PMIf this works on unmodified code...
that is what i wrote :)
I thought so, but I the confirmation I guess ::)
But that's relative easy to do i guess .
I (of course) don't know where to place this code. But i can compile and help the check.
Quote from: Daniel3D on September 05, 2022, 01:56:38 PMBut that's relative easy to do i guess .
more or less - one need to change (starting with the
very last segment, going up) to change "byte" to "para" - assemble and check if anything in the exe changed, then the next segment, one by one - producing as many exes as segments available
seg041.asm
dseg.asm
seg039.asm
seg038.asm
...
seg000.asm
from last to first because then its easier to see in a hex-editor what parts changes down, if you start with seg000 everything will change because the segments are orderd seg000, ... ,seg039, dseg, seg041 (segment 40 is the dseg (data segment))
it will mostly result in some bytes more (or less) around segment ends/begins then in the previous exe - which needs to be removed(added) in the assembler source
So if i understand correctly. Starting with seg041 (which is a small one luckily)
..model medium
nosmart
.stack 8000
include structs.inc
include custom.inc
include seg000.inc
include seg001.inc
include seg002.inc
include seg003.inc
include seg004.inc
include seg005.inc
include seg006.inc
include seg007.inc
include seg008.inc
include seg009.inc
include seg010.inc
include seg011.inc
include seg012.inc
include seg013.inc
include seg014.inc
include seg015.inc
include seg016.inc
include seg017.inc
include seg018.inc
include seg019.inc
include seg020.inc
include seg021.inc
include seg022.inc
include seg023.inc
include seg024.inc
include seg025.inc
include seg026.inc
include seg027.inc
include seg028.inc
include seg029.inc
include seg030.inc
include seg031.inc
include seg032.inc
include seg033.inc
include seg034.inc
include seg035.inc
include seg036.inc
include seg037.inc
include seg038.inc
include seg039.inc
include dseg.inc
include seg041.inc
seg041 segment byte public 'STACK' use16
assume cs:seg041
assume es:nothing, ss:nothing, ds:dseg
seg041 ends
end
The line
Quoteseg041 segment byte public 'STACK' use16
Has to be
Quoteseg041 segment para public 'STACK' use16
Do they need to be changed in the ASM file and the INC file simultaneously for each segment?
QuoteDo they need to be changed in the ASM file and the INC file simultaneously for each segment?
yes all segment names/definitions need to be equal
AND you should base on the original asm - i have no idea if you and Cas checked if the exe changed with your extension in an unwanted way (what does not mean in any way that it would crash, could be that it just works but still wrong)
Cas and I used asmorig in all cases. Also checked the executable against game.exe (the execombined executable)
Al changes in the code are local and besides a offset with the last modification there are no differences besides what is to be expected as far as I can tell.
We didn't touch the version with c code substitution because that brings extra uncertainty to the mix.
We kept it as close to unmodified original as possible.
Now we know more we could try the same thing with c code. See what effect that has and if that works better or not.
Quote from: Daniel3D on September 06, 2022, 11:37:42 AMCas and I used asmorig in all cases. Also checked the executable against game.exe (the execombined executable)
im mean the unchanged asmorig - just to prevent other problems with your extensions
(it a clean test that do not need to happen in a already changed source)
Quote from: Daniel3D on September 06, 2022, 11:37:42 AMAl changes in the code are local and besides a offset with the last modification there are no differences besides what is to be expected as far as I can tell.
your changes must move code and offsets - because new code can't be of size 0 (or did Cas magic?)
Quote from: Daniel3D on September 06, 2022, 11:37:42 AMWe didn't touch the version with c code substitution because that brings extra uncertainty to the mix.
We kept it as close to unmodified original as possible.
Now we know more we could try the same thing with c code. See what effect that has and if that works better or not.
its very possible to only use c code for your extensions - adding a new segment for example and some patching, the other c ports are not needed to use c code for your stuff
Oh, right. For this new thing we will use unchanged original code.
Every modification was made as stand alone if possible.
Exept for the multiple needle color all modifications are only values, so no offset changes.
Only the modification that makes it possible to give both meter needles different colors introduces new code.
The change of default car is pure values.
The position of the main menu buttons is pure values.
The first color of both needle change is a redirect to a different source but the same length.
As Daniel3D pointed out, only the dual colour mod pushes new code into the file. Not sure what "in an unwanted way" would be. I mean, if it crashes, it's clear it's unwanted, ha, ha... but if it doesn't, then we'd have to agree on what is unwanted. Yes, it could be "unstable", like, it doesn't crash, but may crash, sometimes.... But so far, that hasn't happened. If at any time, an instability were found, I'd recommend using the single colour mod instead.
Even the dual colour mod is too simple to justify using a C compiler.
It is possible to have zero-length code... in a way. This is what many viruses do. You analyse the original code, then make it shorter by either using more space-efficient code or dropping things that are not being used. Finally, use the newly available space to introduce new code.
And about magic... the guys at flatassembler.net, some appear to have obtained their degrees at Hogwarts :o
Quote from: Cas on September 06, 2022, 08:44:39 PMIt is possible to have zero-length code... in a way. This is what many viruses do. You analyse the original code, then make it shorter by either using more space-efficient code or dropping things that are not being used. Finally, use the newly available space to introduce new code.
im well aware of that and using it activly in some of my reversing projects
im in the assembler,C/C++, reversing, disassembling area more or less full time per day for the last 15 years - there is nearly nothing i haven't touched in that time :)
did it work to change the segment alignment to para?
Quote from: llm on September 09, 2022, 11:18:02 AMdid it work to change the segment alignment to para?
I don't know. I can't do much until I get a new laptop or fix the other one..
I haven't tried changing that.
Oh, it's great that you've kept your C and assembly polished throughout the years. I've had times and times. As you know, I've been wanting to start working more in C, but the lack of native graphics (native to the compiler, that is) has been stopping me.
One thing I've been considering is to use a workaround: make myself a wrapper libarary for XLib that looks like graphics.h more or less. Then of course, my programs would only compile for GNU/Linux, but because of the wrapper, another wrapper could be made, compatible with it, that would fall back to, say, SDL, in the Windows platform. This way, I'm not forced to using a shared library in the OS I use, but people are able to use my software whatever way they prefer. For DOS, I would just make an assembly version of the "wrapper" that would go directly to the hardware.
Quote from: Cas on September 10, 2022, 04:08:18 AMOh, it's great that you've kept your C and assembly polished throughout the years.
im working in this business - would be sad if i would stop polishing it :)
Quote from: Cas on September 10, 2022, 04:08:18 AMAs you know, I've been wanting to start working more in C, but the lack of native graphics (native to the compiler, that is) has been stopping me.
ever tried Allegro? https://liballeg.org
Version 4.2 supports DOS (Allegro 4.4+ removed the support) https://liballeg.org/readme.html
QuoteAllegro 4
Allegro 4 is the classic library, whose API is backwards compatible all the way back to Allegro 2.0 for DOS/DJGPP (1996). It is no longer actively developed, but we still apply patches sent to us by contributors, mainly to fix minor bugs. Every so often we will make new releases.
Allegro 4.4 supports the following platforms:
Unix/Linux
Windows (MSVC, MinGW, Cygwin)
MacOS X
Haiku/BeOS
PSP (currently in git repository only)
The older Allegro 4.2 branch additionally supports:
Windows (Borland)
QNX
DOS (DJGPP, Watcom)
i don't know if its worth to invest time in this area to come up with your own solution
just wrap the stuff in you graphs.h/c and replace it later when everything is working - the best
pixel drawing routine is for nothing if its not called for doing something magical :)
I've heard about Allegro and have been browsing its site many times, but never actually worked with it. If it can be statically compiled and doesn't get too big, it could work for me. Since the old version supports DOS, I reckon it does fit in that category.
I have a problem with including third party software in my work and it has to do with licensing. In part, it's just because of something I don't know very well, but it's also a matter of discomfort. I'll explain...
I usually make my programs GPLv3 unless I'm making them only for myself, in which case I don't apply a license (all rights reserved). Of course, this means that I can only use libraries that are compatible with the GPLv3. I think Allegro is. Now, the thing is, when I later distribute my program, I have to make the source available, but then this means I have to also make the library source available, since it's part of the program code now. Am I correct?
Including all the source is very untidy and complicates the compiling for whoever downloads the source, but if I include a precompiled version of the library, the package gets even bigger. So I don't know what's the best solution for this. To make matters worse, if the library has dependencies, now I have to include the source code of those dependencies as well. Each dependency will be written in a different style and have its compilation parameters, etc.
Now say I'm wrong and I can just leave out the source code that I didn't make, then this is what brings me the discomfort: I know I wouldn't really be providing the user with all they need to compile the program, so I feel it's cheating.
So... that's my issue with third party libraries, ha, ha. What would you do about this? How would you handle it?
As a user i like the option.
So basically a three step download.
- Just the program with what it needs to run.
- with your source and references to third parties used,
- With source and third party stuff as you used them.
The last should be a static addition because all code and changes are in the second developer version.
But sometimes third party stuff disappears and a being able to supply it is smart.
I don't know if i describe it correctly because i always go for option one.
But this is what I see when i look for software.
Right. When I am the user and I download software, I rarely download the source unless I want to compile it for some reason or examine the code. Most of the free software code around today is written in horrible languages such as java, ha, ha or using lots of OOP, so studying the source normally isn't of any use to me. Then it's good to have an option like your #1.
Of course, if I split it, then people have to be aware that they can't redistribute package #1 without also putting the other packages available at the same location.
Quote from: Cas on September 11, 2022, 10:33:20 PMRight. When I am the user and I download software, I rarely download the source unless I want to compile it for some reason or examine the code. Most of the free software code around today is written in horrible languages such as java, ha, ha or using lots of OOP, so studying the source normally isn't of any use to me. Then it's good to have an option like your #1.
Of course, if I split it, then people have to be aware that they can't redistribute package #1 without also putting the other packages available at the same location.
you provide the source for anyone interested in reading it.
If you want to compile it from your source then they need to get the third party stuff, either from you or directly from the source (if they want to work with your code, the latest version of the libraries is probably advisable)
therefore option 2 and 3.
If somebody wants to redistribute package 1, 2 or 3 they have to refer to the other options as well (as is stated in the documentation), but they can point to the source (you). So a redirect would suffice in my opinion.
Uhm... maybe that's legal, I don't know, but I don't feel it to be correct. I mean, if I don't provide the source myself, then whatever site I point to might go down or remove it or change it and then I'd be providing an incomplete source.
As far as I understand you have to provide access to the third party source.
Therefore is option 3,that includes the content.
But you don't have to keep it updated, you provide what you used. In option two you provide information so one can get the original (maybe updated). And can get support for that part.
Quote from: llm on September 09, 2022, 11:18:02 AMdid it work to change the segment alignment to para?
i had little time so i did an all or nothing approach.
Changing all files creates a near copy, but there are many bit differences, and it does not run,.
A screenshot of a visual check included.
There is also an offset further in the file.
But i now have all obj files. So i can try again, one segment at a time.
Quote from: Daniel3D on September 23, 2022, 12:04:33 PMi had little time so i did an all or nothing approach.
Changing all files creates a near copy, but there are many bit differences, and it does not run,.
expected result :)
Quote from: llm on September 30, 2022, 08:13:04 AMQuote from: Daniel3D on September 23, 2022, 12:04:33 PMi had little time so i did an all or nothing approach.
Changing all files creates a near copy, but there are many bit differences, and it does not run,.
expected result :)
Yes and no.
After fixing a typo I did it again and although it still doesn't work, i did get a clear error message.
I forgot to write it down but it was something along the line of that it failed to read the contents of sdmain. Vsh..
So maybe..
Te correct version is already in the post above, just didn't go into details because I had no time for it at that moment.
Quote from: Daniel3D on September 30, 2022, 09:11:34 AMQuote from: llm on September 30, 2022, 08:13:04 AMQuote from: Daniel3D on September 23, 2022, 12:04:33 PMi had little time so i did an all or nothing approach.
Changing all files creates a near copy, but there are many bit differences, and it does not run,.
expected result :)
Yes and no.
After fixing a typo I did it again and although it still doesn't work, i did get a clear error message.
I forgot to write it down but it was something along the line of that it failed to read the contents of sdmain. Vsh..
So maybe..
Te correct version is already in the post above, just didn't go into details because I had no time for it at that moment.
you will get random problems when not beeing binary equal - and you will not be able to test all effects (its impossible) - to not create a binary compatible version (which is easy and clear what to do) is like asking for random trouble somewhere over the complete code - anytime in the future - the 100% binary equal version is by design correct
there is no "it seems to work" partially :) - every move of code is just wrong (what does not mean in any form that the game will crash - but still its wrong, for example subtile errors in the physic engine, speed, while drawing etc., unlimited amount of silly invisble bugs)
That is true. That's why, about the needle mod, I recommended the fall back to the single colour mod, which is exactly identical to the original except for a word that we know exactly what it does. The bi-colour mod "seems to work" and probably does, but we will never be done testing. Yet, we had to do it that way because there was no other in this case, but a full rewrite of the game to C or a recreation with a new engine are the only options if we want a stable modded game (unless we could get to the original source, but it's many times been said it doesn't exist anymore).
Quote from: Cas on October 01, 2022, 04:45:58 PMYet, we had to do it that way because there was no other in this case, but a full rewrite of the game to C or a recreation with a new engine are the only options if we want a stable modded game
its still possible but you need to be very carefull - don't add code in between that moves code, always try
to be binary equal or not equal in very small well known parts
for example - adding only code by link-virus behavior, add a new segment, ignore relocation table changes, patch calls into the code (save the original code) - recover original code after running the new code - this way you can add large amounts of code without changing too much
or search the code for non-symbolic offsets and fix them to symbolic ones, then your able to change everything without problems - but that could be time consuming
but this is all in all very "tinker"
my current solution for modifying games more or less safe is using dosbox as a backend
for example: im able to hook function calls and overwrite code parts, very good for porting because
you can port a function while the function is in use by the emulated code
for example the data compression routine of the Alpha Waves game
disassembled in IDA then converter to my tiny "emulator" that fakes the minimal
aspects of the x86 code to ease the porting to C
emu_t just got some registers, memory and methods that look like the original
asm and behave like the original asm code - but its just C/C++ code
this function gets called by dosbox when the emulated code actually wanted to call the original
16bit code, i can debug, step through it, log data, write unit-tests etc.
this is my third try to port that function properbly - before just in assembler and 16bit C
but subtile micro difference seemed to work but my port was until now only working with 95%
of the data
void UNCOMPRESS_sub_1BAE7(emu_t &e)
{
start:
e.push(e.es);
e.push(e.di);
e.cx = 0x80;
e.ax = e.ds;
e.es = e.ax;
e.di = 0x301;
e.xor(e.ax, e.ax);
e.rep_stosw();
e.pop(e.di);
e.pop(e.es);
e.sub(e.di, *e.word_ptr(e.cs, 0xBAA2));
e.ax = e.di;
e.shr(e.ax, 1);
e.shr(e.ax, 1);
e.shr(e.ax, 1);
e.shr(e.ax, 1);
e.cx = e.es;
e.add(e.cx, e.ax);
e.es = e.cx;
e.and (e.di, 0x0F);
e.add(e.di, *e.word_ptr(e.cs, 0xBAA2));
e.push(e.ds);
e.push(e.es);
e.push(e.si);
e.push(e.di);
e.cx = 4;
e.di = 0xBA9A; // offset byte_1BA9A; ???
e.ax = e.cs; // seg seg000 // cs register; ???
e.es = e.ax;
e.lds(e.si, *e.dword_ptr(e.cs, 0xBAA4));
e.ax = e.si;
e.shr(e.ax, 1);
e.shr(e.ax, 1);
e.shr(e.ax, 1);
e.shr(e.ax, 1);
e.dx = e.ds;
e.add(e.ax, e.dx);
e.ds = e.ax;
e.and (e.si, 0x0F);
*e.word_ptr(e.cs, 0xBAA4) = e.si;
*e.word_ptr(e.cs, 0xBAA4 + 2) = e.ds;
e.add(*e.word_ptr(e.cs, 0xBAA4), e.cx);
e.rep_movsb();
e.pop(e.di);
e.pop(e.si);
e.pop(e.es);
e.pop(e.ds);
e.dx = *e.word_ptr(e.cs, 0xBA9C);
e.inc(e.dx);
e.cmp(*e.byte_ptr(e.cs, 0xBA9A), 0);
if (e.jnz())
goto loc_1BB63;
goto loc_1BC52;
// ---------------------------------------------------------------------------
loc_1BB63:
e.push(e.ds);
e.push(e.es);
e.push(e.di);
e.xor (e.ch, e.ch);
e.cl = *e.byte_ptr(e.cs, 0xBA9A);
e.di = 0x201;
e.ax = e.ds;
e.es = e.ax;
e.ds = *e.word_ptr(e.cs, 0xBAA4 + 2);
e.si = *e.word_ptr(e.cs, 0xBAA4);
e.add(*e.word_ptr(e.cs, 0xBAA4), e.cx);
e.rep_movsb();
e.cl = *e.byte_ptr(e.cs, 0xBA9A);
e.xor (e.ch, e.ch);
e.di = 1;
e.add(*e.word_ptr(e.cs, 0xBAA4), e.cx);
e.rep_movsb();
e.cl = *e.byte_ptr(e.cs, 0xBA9A);
e.di = 0x101;
e.add(*e.word_ptr(e.cs, 0xBAA4), e.cx);
e.rep_movsb();
e.pop(e.di);
e.pop(e.es);
e.pop(e.ds);
e.xor (e.ch, e.ch);
e.cl = *e.byte_ptr(e.cs, 0xBA9A);
e.xor (e.ah, e.ah);
e.bx = 1;
loc_1BBB4:
e.al = *e.byte_ptr(e.ds, e.bx + 0x200);
e.si = e.ax;
e.dl = *e.byte_ptr(e.ds, e.si + 0x301);
*e.byte_ptr(e.ds, e.bx + 0x402) = e.dl;
*e.byte_ptr(e.ds, e.si + 0x301) = e.bl;
e.inc(e.bx);
if (e.loop())
goto loc_1BBB4;
e.dx = *e.word_ptr(e.cs, 0xBA9C);
e.inc(e.dx);
e.cx = 1;
loc_1BBD2:
e.dec(e.dx);
if (e.jnz())
goto loc_1BBE1;
loc_1BBD5:
e.cmp(*e.byte_ptr(e.cs, 0xBA9B), 0);
if (e.jz())
goto locret_1BBE0;
goto start;
// ---------------------------------------------------------------------------
locret_1BBE0:
return;
// ---------------------------------------------------------------------------
loc_1BBE1:
e.push(e.ds);
e.si = *e.word_ptr(e.cs, 0xBAA4 + 2);
e.ds = e.si;
e.si = *e.word_ptr(e.cs, 0xBAA4);
e.lodsb();
*e.word_ptr(e.cs, 0xBAA4) = e.si;
e.pop(e.ds);
e.bx = e.ax;
e.cmp(*e.byte_ptr(e.ds, e.bx + 0x301), 0);
if (e.jnz())
goto loc_1BC01;
e.stosb();
goto loc_1BBD2;
// ---------------------------------------------------------------------------
loc_1BC01:
e.bl = *e.byte_ptr(e.ds, e.bx + 0x301);
e.xor (e.ax, e.ax);
e.push(e.ax);
goto loc_1BC35;
// ---------------------------------------------------------------------------
loop_x:
e.bp = e.ax;
e.cmp(*e.byte_ptr(e.ds, e.bp + 0x301), 0);
if (e.jz())
goto loc_1BC44;
e.cmp(e.bl, *e.byte_ptr(e.ds, e.bp + 0x301));
if (e.ja())
goto loc_1BC30;
e.al = e.bl;
e.bl = *e.byte_ptr(e.ds, e.bp + 0x301);
loc_1BC22:
e.bl = *e.byte_ptr(e.ds, e.bx + 0x402);
e.or (e.bl, e.bl);
if (e.jz())
goto loc_1BC42;
e.cmp(e.bl, e.al);
if (e.jb())
goto loc_1BC35;
goto loc_1BC22;
// ---------------------------------------------------------------------------
loc_1BC30:
e.bl = *e.byte_ptr(e.ds, e.bp + 0x301);
loc_1BC35:
e.al = *e.byte_ptr(e.ds, e.bx + 0x100);
e.ah = e.bl;
e.push(e.ax);
e.xor (e.ah, e.ah);
e.al = *e.byte_ptr(e.ds, e.bx);
goto loop_x;
// ---------------------------------------------------------------------------
loc_1BC42:
e.ax = e.bp;
loc_1BC44:
e.stosb();
e.pop(e.ax);
e.or (e.ax, e.ax);
if (e.jnz())
goto loc_1BC4C;
goto loc_1BBD2;
// ---------------------------------------------------------------------------
loc_1BC4C:
e.bl = e.ah;
e.xor (e.ah, e.ah);
goto loop_x;
// ---------------------------------------------------------------------------
loc_1BC52:
e.push(e.ds);
e.push(e.es);
e.cx = *e.word_ptr(e.cs, 0xBA9C);
e.push(e.cx);
e.ds = *e.word_ptr(e.cs, 0xBAA4 + 2);
e.si = *e.word_ptr(e.cs, 0xBAA4);
e.add(*e.word_ptr(e.cs, 0xBAA4), e.cx);
e.rep_movsb();
e.pop(e.cx);
e.pop(e.es);
e.pop(e.ds);
goto loc_1BBD5;
}
from my VS2019 IDE - uncompress is started - the 16bit DOS game Alpha Waves is waiting for the uncompressed data - based on my 32bit C++ code :)
so i change dosbox in a way that calls inside of the emulated code are hooked and replaced by my own C++ code, this way i can partially replace code
https://imgur.com/a/vwhrMzY (use this link for a larger image)
(https://i.imgur.com/tWLPOrB.png)
It is indeed difficult to tell if the extended color needle mod is 100% functioning like stunts 1.1.
For this reason i use it in the ccc.
Every replay that is checked is done in zakstunts and ccc version.
I play only the ccc version on my chromebook.
So far nothing found, all replays checked are the same in both versions.
Still, use at your own risk.
That is really interesting to see! Inline assembly emulation. That would help solve lots of things!
As I said, the dual-colour needle mod is unverifiable. I chose to do it that way because it was a way in which it could be implemented with the code we have, but while it has worked so far, building on top of it with the same approach would accumulate more and more likelyhood of failure and is not acceptable.
On the other hand, the interaction is good. What I mean is, if I later rewrite this some other way, but it does exactly the same thing with the same variables, the new mod would, in practical terms, be the same as this, so what we've done is just "one implementation" of the mod. I could redo it virus-like and we already have a live proof of concept.
I had already thought in the past about the possibility of just inserting a call instead of the direct code and putting the main code somewhere else, but what I don't know is if the new segment will end up at the end of the whole program code, because if it's somewhere in the middle, still some program code would be moved done. I don't know much about Turbo Assembler and how it does its thing. If I were sure about it, I could use that.
On the other hand, there's another idea I'm having right now which could simplify all this. Instead of inserting new code within the compiled program... this is DOS! No memory protection, no difference between data and code. And while this is bad for many things, it has its advantages. How about I create a TSR that hooks up a custom interrupt and have Stunts call this interrupt as an API. The hooking would be small and I could do it virus-like, but then, every other bigger mod would just be part of the TSR, not the main program, so nothing would be moved down! What's more, I could make the TSR be a mod hub where other mods can be plugged in. When I have a moment, I'll start working on that.
Quote from: Cas on October 04, 2022, 03:21:56 AMThat is really interesting to see! Inline assembly emulation. That would help solve lots of things!
what things do you think about, different than my things?
Quote from: Cas on October 04, 2022, 03:21:56 AMI had already thought in the past about the possibility of just inserting a call instead of the direct code and putting the main code somewhere else, but what I don't know is if the new segment will end up at the end of the whole program code, because if it's somewhere in the middle, still some program code would be moved done. I don't know much about Turbo Assembler and how it does its thing. If I were sure about it, I could use that.
the order of the re-states segments in the inc file is the order of segments in the executable
new code (with segment-realtions) will change the relocation-table but that isn't a problem
Quote from: Cas on October 04, 2022, 03:21:56 AMOn the other hand, there's another idea I'm having right now which could simplify all this. Instead of inserting new code within the compiled program... this is DOS! No memory protection, no difference between data and code. And while this is bad for many things, it has its advantages. How about I create a TSR that hooks up a custom interrupt and have Stunts call this interrupt as an API. The hooking would be small and I could do it virus-like, but then, every other bigger mod would just be part of the TSR, not the main program, so nothing would be moved down! What's more, I could make the TSR be a mod hub where other mods can be plugged in. When I have a moment, I'll start working on that.
a TSR will not change the problem of moving code around
(which is solveable by link-virus style programming)
it makes not real sense staying away from the source by adding a patch-systems with mods that still can't be created without
decent knowledege about the code and positions
Quote from: llm on October 04, 2022, 08:37:17 AMa TSR will not change the problem of moving code around
(which is solveable by link-virus style programming)
it makes not real sense staying away from the source by adding a patch-systems with mods that still can't be created without
decent knowledege about the code and positions
That is where my idea for a replacement main menu came from. We only need to know how it stores global variables (player car, opponent and car, track and graphics settings).
Then we can use the new menu as tsr.
We could add a simple redirect in the code and compile it. It would make further messing with the code unnecessary for menu related changes.
We could use a extended car showroom
A version of bliss as track editor
Make default car and track changeable.
Quote from: Daniel3D on October 04, 2022, 01:31:09 PMQuote from: llm on October 04, 2022, 08:37:17 AMa TSR will not change the problem of moving code around
(which is solveable by link-virus style programming)
it makes not real sense staying away from the source by adding a patch-systems with mods that still can't be created without
decent knowledege about the code and positions
That is where my idea for a replacement main menu came from. We only need to know how it stores global variables (player car, opponent and car, track and graphics settings).
Then we can use the new menu as tsr.
We could add a simple redirect in the code and compile it. It would make further messing with the code unnecessary for menu related changes.
i don't see anything better in using a TSR (when NOT only changing variable values like a game-cheater, which is currently the easiest change in source)
patching the code at runtime is equaly error prone then doing it in source - because there is no difference
as long as you cleanly check what you code-change changes - that goes for the sourc-code as for the TSR changes, a TSR isn't some sort of magical working isolation - and yes im developed some TSRs and runtime patchers - but only because i did not got the source around
I know little about this subject. But it seems to me that reverse engineering the entire code will not happen anytime soon.
But knowing what we do about the code , we are able to make pretty good TSR's or runtime patches.
It is probably a better choice than changing the half understood source.
It may be a quick and dirty solution but it might increase the playability of the game.
Quote from: Daniel3D on October 04, 2022, 07:20:41 PMI know little about this subject. But it seems to me that reverse engineering the entire code will not happen anytime soon.
TSR+runtime patching are a complete different story than reversing the entire code - completely unrelated comment
Quote from: Daniel3D on October 04, 2022, 07:20:41 PMBut knowing what we do about the code , we are able to make pretty good TSR's or runtime patches.
It is probably a better choice than changing the half understood source.
It may be a quick and dirty solution but it might increase the playability of the game.
the fixes you install with the TSR can be easier and safe be done in the source itself, the TSR does not help to prevent ANY of the technical problems (anything thats wrong in source-time is also wrong at runtime)
and the changes to the main source should be already separated by IFDEFs to keep it clean
a TSR isn't something that can magical changes the nature of the executable in
a way that modifications are more trival/easier to do, a runtime-approach makes it even more complex
in can do the very same inside of the source but much safer without playing around with non-symbolic offsets to patch etc.
but that is all based on missing knowledge about how assembling or executables work - too much guessing and assuming, it is absolutely clear whats needed and how time consuming it is, for every case we talked about (and porting to C is just one of 8 possible further steps) - but you have currently no chance (but your knowledge is already growing) to take the right decision, because that needs a deeper level of understanding, without trick and trial&error
im telling you the best approach, but you aren't able to understand it, you want to go further, ignoring my tips because they also fail very quick (due to missing konwledge etc. on your side) - as i told you before - reverse engineering is the prime class of development - all problems at one point combined :)
i hope to find more time in the future to help "finishing" the project
I know there is a huge difference between a TSE and reverse engineering the entire code.
Doing it in code is better, but just as unsafe.
And I am very much aware of things i don't know. I am also terribly exited and inpatient..
I feel like a kid that's poking the fire to see the "fireflies"... bound to be burnt.
One question before I shut up about it for a while..
If one would redirect to code outside the source, could we then use memory outside the 640K? Like high memory or even ram? And could that code run in a higher resolution than 320x200?
it's a learning experience. I learn by doing best. And that includes breaking it over and over again..
I prefer one of the 8 paths you mentioned, and if that is not included a good rewrite of the engine is also fine by me.
QuoteAnd I am very much aware of things i don't know. I am also terribly exited and inpatient..
me too, started this project with others over 10 years ago :)
QuoteOne question before I shut up about it for a while..
you don't need to shut up :)
QuoteIf one would redirect to code outside the source, could we then use memory outside the 640K?
Like high memory or even ram?
you mean extended-ram? because everything is ram - we are currently in "conventional" ram
here is a memory-map: https://www.phatcode.net/res/155/images/fig1-1.png
QuoteAnd could that code run in a higher resolution than 320x200?
your question is confusing:
code does not "run" in a resolution and the position and size were code gets loaded is unrelevant for the resolution
video-modes uses different "memory-layouts" for displaying pixels, some of them are easy some of them
needs you to splitt the color of a pixel over multiple planes by switching using grafics card registers for each draw, some needs banking - so parts of the video-ram needs to be transfered between grafics card and conventional ram - due to its size
stunts in MCGA/VGA mode is more or less easy Mode 13h, 320x200x256 colors, so every byte starting
at 0xA000:0 is a pixel with one of the 256 palett-colors
drawing a pixel in C in Mode 13h is this
//y = 0..199
//x = 0..320
uint8_t far* video_ram = MK_FPT(0xA000,0);
video_ram[y*200+x] = palett-color
the problem is now that somewhere in the code are these 320 and 200 values hidden
sometimes even not as the values itself but as 4*80 or something - everything is possible
linear block operations for moving texture etc. - so its not just looking for put-pixel, its just to slow for the video-output doing it like that
switching to a different video-mode is possible but you need to change the drawing routines
and their local buffers, and what else is resolution dependend, for pre-drawing etc. its never just put-pixel but also the algorithm data behind
for example VESA mode 640x480x256 seems to fit (it just doubles resolution) - but this mode needs that much ram (~300kb)
that you need to switch the banks (because stunts isn't a protected mode program that would alllow linear accessing) while drawing, the bank switch is a super-super-slow BIOS call in real-mode - so every pixel drawing, polygons drawing/filling routine needs to be fixed, and every trick based on "it will be ever 320x200" is now a problem and without - its a mess
and all graphics needs to be doubled or else they will not be drawn tiny but (due to x doubling) distorted
and the resulting grafic can contain holes, due to the current low-resolution line drawing and polygon filling can be a little unprecise
which getting more and more a problem when the resolution gets bigger
and also the collision detection code could be resolution dependend ....
it could be simple like everything is always drawn on polygons then not everything needs to get changed - but a analysis of the gfx engine is
needed for that, maybe only the driving in higher resolution could be possible first
its way more complex than switching the segment from byte to para alignment :)
Quoteit's a learning experience. I learn by doing best. And that includes breaking it over and over again..
you need to understand that there is stuff that can eventually break (that is also hard to test stuff) and stuff , when done right cannot break - not everything is trial&error, there are some part of this workload that can be done in a uncluttered fashion - and these are the parts we should not also break due to misunderstanding, for example changing segment alignment from byte to para is one of these clean things
and all that would be much more doable when having C code (still not super easy) so the biggest workload gives us the biggest win - as usual in the reversing world :/
FYI: someone does that with Duke Nukem 2: https://github.com/lethal-guitar/Duke2Reconstructed - a full C reverse to 100% binary compatiblity - im am so envious :)
The "War-Story" on Twitter with all the dirty details: https://twitter.com/lethal_guitar/status/1575123187360227335 - read the follow up posts
(https://www.phatcode.net/res/155/images/fig1-1.png)
I know a bit of how memory works. But just don't know the right terms.
I know that stunts in its present situation is locked in base (conventional) memory.
And stunts does not have a separate graphics, physics and collision engine but has those jumbled into one pile of code.
That tangle will be difficult to unravel even if the code is ported to c.
That will take time.
My question therefore was that, if we plug in external code (by whatever means we can do effectively) are we still restricted to the memory allocation and graphics limitations of STUNTS?
You are absolutely right about the best ways to do this. I'm just curious.
For example. One thing I would like to change on short-term is increasing the 32 car limit.
This is not easily changed in the source. I know what part of the code handles this. There is a load function that loads 32 cars. Setting a lower number works as expected. Setting a higher number creates an error when trying to load a car beyond 32.
That would imply that the memory allocated for this is fixed at 32 times the 4 characters of the car. Whatever space that takes is a multitude of 8 so it makes sense. And going beyond that the game tries to read data that is reserved for other things.
Changing what memory is used is not possible in its current state. So increasing it is not possible within the current source.
Can you make a hack for this?
My idea was writing a new main menu with a new car/opponent selector and a different track selector /editor. I want to make this to increase playability and maybe increase interest in both the competitions and the reverse engineering. Draw new people by showing off the things we have done.
QuoteAnd stunts does not have a separate graphics, physics and collision engine but has those jumbled into one pile of code.
it isn't called a "engine" because of physical separation, and no developer mixing up such code
they could be still very separated in the assembler code (never analysed them)
QuoteThat tangle will be difficult to unravel even if the code is ported to c.
the C code amount would be 5 to 8 times fewer lines - that would help alot
the problem is: you've never built something like that/or even the simplest form of that (according to your posts) on your own
so you've got nearly no background to estimate anything, you're just assuming on a very very high level
QuoteMy question therefore was that, if we plug in external code (by whatever means we can do effectively)
are we still restricted to the memory allocation and graphics limitations of STUNTS?
this is also a very strange question - nothing in a computer will just enhanced the capatility of a program - only if the constraints are memory, hard disk space or cpu-power hunger like for example a database system got
yes we will still be restricted - because the code itself limits it to this constraints, someone wrote code that is constraint that way and the compile generated cpu-code based on that constraints to make it small and fast, its like a house build of concrete where every dependcy (offset) is fixed, there are dynamic parts in there that could be changed easier - but not because its a standard but how this specific thing in the game was implemented
but i don't know if it makes sense to explain stuff like that to you because even a senior-level C application developer (without hardware or assembler background) couldn't follow easily - could be that you getting it all wrong or in a different way as intended :( - not your fault
QuoteFor example. One thing I would like to change on short-term is increasing the 32 car limit.
This is not easily changed in the source. I know what part of the code handles this. There is a load function that loads 32 cars. Setting a lower number works as expected. Setting a higher number creates an error when trying to load a car beyond 32.
That would imply that the memory allocated for this is fixed at 32 times the 4 characters of the car. Whatever space that takes is a multitude of 8 so it makes sense. And going beyond that the game tries to read data that is reserved for other things.
Changing what memory is used is not possible in its current state. So increasing it is not possible within the current source.
that means the needed change isn't at one point but spreaded over several functions
you need to expand the space for the cars - that will move data and code - therefor moving of data, code should be
safe, which means segments should para aligned and all non-symbolic offsets should be found, every change will always come
back to that point
QuoteCan you make a hack for this?
i can possibly technicaly do it, but i have no time lurking through the code for maybe weeks (and i want to work more on stunts)
Quote from: llm on October 05, 2022, 06:59:00 PMthe problem is: you've never built something like that/or even the simplest form of that (according to your posts) on your own
so you've got nearly no background to estimate anything, you're just assuming on a very very high level
That is very true.
In this case I'd like to be proven wrong..
I also assumed that multi-player would not be possible. But that kind of seems to be possible without much effort.
What can ww make of that with more effort?
Quote from: Daniel3D on October 05, 2022, 08:30:30 PMI also assumed that multi-player would not be possible. But that kind of seems to be possible without much effort.
What can ww make of that with more effort?
there is much possible for a full blown developer with time :)
the multiplay-stunts(https://github.com/kurtis2222/stuntsmp) by kurtis2221 is using the CheatEngine (or direct memory read/write) to look through the dosbox-exe into Stunts memory - its technically like my extension of the dosbox source (the code hooking, variable watching) working with a specific version of dosbox+stunts, and then he adds TCP/IP from outside to control the keys and set variables of the opponents, i don't think he is patching or changing the game size at all (but the project source is very small and readable - so work yourselfe through it), but his
solution only works because of the 32/64bit dosbox environment, doing that in pure DOS would be much more harder, im not to deep into it but i can technically explain what and how he is doing it on a very deep level :)
im also using dosbox for doing different stuff - because the 16bit environment is so limited that i better work in the 32bit code of dosbox to "help" stunts doing the stuff i want (first win: even if the game is in segmented-memory, my own 32bit code works in pure linear style, i can do things that are not possible in the pure stunts code, like writing data with size over 64k in a for loop, copy memory of that size etc., trace data from every point in code into a textfile ...) - but that is also only doable with a good background of coding skills which kurtis2221 obviously got
BUT don't get me wrong here extending dosbox eases SOME modifications/ideas but no per default - extending game features
that are usable/seeable in game are still not easy because dosbox will not help changing the code of the game - its only easier to overwrite
behavior (sometimes) - but i don't know how you can do it without software development knowledge
working on a disassembled C based game of that size without any deep knowledge about how programming works is just very hard
look at my super-simple: dosbox/stunts branch: https://github.com/LowLevelMahn/dosbox-staging/tree/main_stunts_tests/src/_stunts
im using that code for tracing all data that gets outputed to the sound driver - its super easy code and all knowledge one need
is the very-same as kurtis2221 uses - know the offsets of functions/variables + how to hook stuff
or my also simple AlphaWaves branch of dosbox: https://github.com/LowLevelMahn/dosbox-staging/tree/main_alpha_waves_tests/src/_alpha_waves
which uses my tiny "emulator" in the uncompress function: https://github.com/LowLevelMahn/dosbox-staging/blob/b89837740a5ab65fc28548440b8de3d7c3516a49/src/_alpha_waves/_alpha_waves.cpp#L773
and this is the C-Port of that routine: https://github.com/LowLevelMahn/dosbox-staging/blob/b89837740a5ab65fc28548440b8de3d7c3516a49/src/_alpha_waves/_alpha_waves.cpp#L495
most of the time these reverse engineering projects use serveral strategies combined to reach the ultimate goal - that is totaly
different to real C/C++ source based projects were this type of cumulated strategies just not needed at all
Quote from: llm on October 05, 2022, 06:59:00 PMsafe, which means segments should para aligned and all non-symbolic offsets should be found, every change will always come
back to that point
"non-symbolic offset" Like these lines?
mov di, 55CAh ; offset in dseg where uninitialized data starts
mov cx, 0AD20h ; original size/end of dseg
Link to the code in Bitbucket (https://bitbucket.org/dreadnaut/restunts/src/master/src/restunts/asmorig/seg010.asm#lines-216)
Quote from: Daniel3D on October 10, 2022, 10:37:39 AM"non-symbolic offset" Like these lines?
mov di, 55CAh ; offset in dseg where uninitialized data starts
mov cx, 0AD20h ; original size/end of dseg
yes stuff like that - could be that there are many of these - or just a few - but every of these in-ables the code moveablitity - or could make it harder
so more or less every magic value that is a offset - but not
add sp,some-value
that is for stack-cleanup
Quote from: llm on October 10, 2022, 11:55:43 AMQuote from: Daniel3D on October 10, 2022, 10:37:39 AM"non-symbolic offset" Like these lines?
mov di, 55CAh ; offset in dseg where uninitialized data starts
mov cx, 0AD20h ; original size/end of dseg
yes stuff like that - could be that there are many of these - or just a few - but every of these in-ables the code moveablitity - or could make it harder
so more or less every magic value that is a offset - but not
add sp,some-value
that is for stack-cleanup
Assuming that the hex value is a number that corresponds to the line it seems to be a little off.
But i guess that the start of the file should be excluded from the line count as that is compiler info.
Or am i missing the bat again ;)
Quote from: Daniel3D on October 10, 2022, 12:55:19 PMAssuming that the hex value is a number that corresponds to the line it seems to be a little off.
But i guess that the start of the file should be excluded from the line count as that is compiler info.
the hex value does NOT coresponds to a line number, NEVER - the hex-value is an byte-offset from the image start (behind the exe header), it depends on the size of code that sits before and every asm command is of different size in binary
for example: this is a assembler routine (not from stunts) - on the left is the binary-offset, then the binary code and the corespondig asm source
IDA-Offset | Binary code | Assembler source
| |
seg000:BDF4 | | sub_1BDF4 proc near
seg000:BDF4 | |
seg000:BDF4 | 06 | push es
seg000:BDF5 | 1E | push ds
seg000:BDF6 | 56 | push si
seg000:BDF7 | 57 | push di
seg000:BDF8 | 8D 36 76 BC | lea si, ds:0BC76h
seg000:BDFC | B9 06 00 | mov cx, 6
seg000:BDFF | |
seg000:BDFF | | loc_1BDFF:
seg000:BDFF | 83 C6 04 | add si, 4
seg000:BE02 | 2E 8B 04 | mov ax, cs:[si]
seg000:BE05 | 2E 0B 44 02 | or ax, cs:[si+2]
seg000:BE09 | 74 02 | jz short loc_1BE0D
seg000:BE0B | E2 F2 | loop loc_1BDFF
seg000:BE0D | |
seg000:BE0D | | loc_1BE0D:
seg000:BE0D | 2E 89 1C | mov cs:[si], bx
seg000:BE10 | 2E 89 7C 02 | mov cs:[si+2], di
seg000:BE14 | 5F | pop di
seg000:BE15 | 5E | pop si
seg000:BE16 | 1F | pop ds
seg000:BE17 | 07 | pop es
seg000:BE18 | C3 | retn
seg000:BE18 | | sub_1BDF4 endp
Shellstorm disassembly of the same Binary code without symbolic offsets
0x0000000000000000: 06 push es
0x0000000000000001: 1E push ds
0x0000000000000002: 56 push si
0x0000000000000003: 57 push di
0x0000000000000004: 8D 36 76 BC lea si, [0xbc76]
0x0000000000000008: B9 06 00 mov cx, 6
0x000000000000000b: 83 C6 04 add si, 4
0x000000000000000e: 2E 8B 04 mov ax, word ptr cs:[si]
0x0000000000000011: 2E 0B 44 02 or ax, word ptr cs:[si + 2]
0x0000000000000015: 74 02 je 0x19 <-- jmp offset
0x0000000000000017: E2 F2 loop 0xb <-- jmp offset
0x0000000000000019: 2E 89 1C mov word ptr cs:[si], bx
0x000000000000001c: 2E 89 7C 02 mov word ptr cs:[si + 2], di
0x0000000000000020: 5F pop di
0x0000000000000021: 5E pop si
0x0000000000000022: 1F pop ds
0x0000000000000023: 07 pop es
0x0000000000000024: C3 ret
so "lea si, ds:0BC76h" is encoded in the exe as {8D 36 76 BC}
Quote from: llm on October 10, 2022, 01:23:38 PMthe hex value does NOT coresponds to a line number, NEVER - the hex-value is an byte-offset from the image start (behind the exe header), it depends on the size of code that sits before and every asm command is of different size in binary
I was fearing that.
So to make a symbolic offset out of it you must first find the correct byte offset and locate it in the assembly code?
Looking at your example i guess it is not very difficult for you. But i understand why they are not all done.
If i find more (i now have an idea of what they look like) and they are not commented as such I'll make a note of it.
Quote from: Daniel3D on October 10, 2022, 01:43:11 PMSo to make a symbolic offset out of it you must first find the correct byte offset and locate it in the assembly code?
thats why people using IDA or Ghidra for reversing - they keep the assembler source view and and the binary code in sync - so you can easier see what an offset could target
Quote from: Daniel3D on October 10, 2022, 01:43:11 PMLooking at your example i guess it is not very difficult for you. But i understand why they are not all done.
it could be difficul because sometimes offsets are calculated using serveral lines of assembler code
which could be also some sort of 3d point calculation - its not always easy to differ
Quote from: Daniel3D on October 10, 2022, 01:43:11 PMIf i find more (i now have an idea of what they look like) and they are not commented as such I'll make a note of it.
great
for example
mov di, 55CAh
is
mov di, offset word_40D3A
should produce the very same executable (binary equal)
from IDA-Editor:
dseg:55C8 db 0FFh
dseg:55C9 db 0
dseg:55CA word_40D3A dw 0 ; DATA XREF: end_hiscore+638␘w
dseg:55CA ; end_hiscore+656␘r start+6A␘o
dseg:55CC word_40D3C dw 0 ; DATA XREF: end_hiscore+63E␘w
dseg:55CC ; end_hiscore+6C1␘r
dseg:55CE word_40D3E dw 0 ; DATA XREF: end_hiscore+644␘w
more or less easy in IDA Pro - but first you need to know that this is really a offset value
and which segment the offset targets - in this case seeable by looking at the code above
seems to be dseg - so its a offset to a variable in the data segment (some copy/init operation is done)
IDA always shows the binary information (offsets, opcodes) in parallel to the disassembly: https://imgur.com/fsUvtVI
thats the primary reason for using a professional tool for reverse engineering, thats also the reason for using a IDA script
to produce the asm code - any finding can result in multiple changes over the asm files - for example you finding a common type
and start using it in IDA - IDA will use that information to extend other parts of the disassembly, resolving more and more
that is not easy with dead end assembler code - and a huge part of the reversing process
IDA Pro is not an assembler editor (you can't change anything in the assembler-code) is just a tool to help reverse engineering - so
cross references, graphs, deep analysis etc., you can add types, structs and annotated the found functions, giving IDA more infos
how to disassemble stuff he didn't understand by itself
That is very cool. Way to advanced for me at this point in time.
Like you, time is limited.
But i have my strengths(perseverance ;) ) . So i will make an effort to locate the non symbolic offsets.
There will be false positives.
Quote from: Daniel3D on October 10, 2022, 07:22:24 PMThere will be false positives.
if you change a non-symbolic offset to an symbolic one and compare the exe before/after no bit should have changed - then could it be still wrong but still does not can harm the gameplay because the exe is not changed, doing such changes without checking before/after is like playing roulett for earning bugs without any need
Quote from: llm on October 11, 2022, 08:02:20 AMQuote from: Daniel3D on October 10, 2022, 07:22:24 PMThere will be false positives.
if you change a non-symbolic offset to an symbolic one and compare the exe before/after no bit should have changed - then could it be still wrong but still does not can harm the gameplay because the exe is not changed, doing such changes without checking before/after is like playing roulett for earning bugs without any need
I only intend to catalogue them.
I don't have an ida, and probably not the knowledge to verify without a doubt.
I'll see what I can find in a few hours.
With why I think it is one or why not.
If i am right often ill continue. If not I'll leave it to the pro's.. 😅
Quote from: Daniel3D on October 11, 2022, 08:15:07 AMI only intend to catalogue them.
this regex finds most of magic-values numbers, that could be offsets, and only global offsets are relevant
(\,|\-|\+)\s*((0[a-fA-F0-9]*|[1-9][a-fA-F0-9]*)h|[0-9])
im using that with Notepad++ (but other editors with regex support should also work)
searching all asmorig asm-files
removing all "add or sub sp,VALUE" + defines reduces the list to ~13.000, but most of the findings
are value-sets or something
as usual - a huge mess of assembler code :(
Quote from: llm on October 11, 2022, 09:04:12 AMQuote from: Daniel3D on October 11, 2022, 08:15:07 AMI only intend to catalogue them.
this regex finds most of magic-values numbers, that could be offsets, and only global offsets are relevant
(\,|\-|\+)\s*((0[a-fA-F0-9]*|[1-9][a-fA-F0-9]*)h|[0-9])
im using that with Notepad++ (but other editors with regex support should also work)
searching all asmorig asm-files
removing all "add or sub sp,VALUE" + defines reduces the list to ~13.000, but most of the findings
are value-sets or something
as usual - a huge mess of assembler code :(
indeed over 18.000 hits in notepad++
many are clearly not an offset or already defined (i say after viewing about 50 ::) )
But I'll have a look anyway.. thanks for the regex..
First line of interest.. 8)
seg000 Line 607: mov ax, 0FFFFh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-607)
_ask_dos:
sub ax, ax
push ax
push ax
push dialogarg2
mov ax, 0FFFFh
push ax
push ax
mov ax, offset aDos ; "dos"
push ax
if this is one there are 85 other hits on "ax, 0FFFFh"
seg000 Line 1060: mov ax, 0AC74h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-1060)
mov ax, offset aGsta; "gsta"
push ax
push [bp+var_38]
push [bp+var_3A]
call locate_shape_alt
add sp, 6
push dx
push ax
mov ax, 0AC74h
push ax
call copy_string
add sp, 6
push word_407D6
push word_407D4
mov ax, 4Ch ; 'L'
if this is one there are 13 other hits on "ax, 0AC74h"
This is the last for now. Enough to test if I am finding them correctly .. And to see if it is useful..
(I will make more compact logs of others I find when useful to continue. I did it this way, so you can easily see if I make obvious mistakes)
seg000 Line 1795: mov ax, 0FFFEh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-1795)
call shape3d_load_all
mov ax, 0C8h ; 'È'
push ax
mov ax, 140h
push ax
mov ax, 28h ; '('
push ax
push ax
call set_projection
add sp, 8
mov ax, 0FFFEh
push ax
call init_game_state
add sp, 2
call sprite_copy_wnd_to_1
push skybox_grd_color
call sprite_clear_1_color
if this is one there are 3 other hits on "ax, 0FFFEh"
Quote from: Daniel3D on October 11, 2022, 10:42:57 AMFirst line of interest.. 8)
seg000 Line 607: mov ax, 0FFFFh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-607)
_ask_dos:
sub ax, ax
push ax
push ax
push dialogarg2
mov ax, 0FFFFh
push ax
push ax
mov ax, offset aDos ; "dos"
push ax
if this is one there are 85 other hits on "ax, 0FFFFh"
that is very likly just -1, in assembler everything is unsiged, but that does not
mean that a value IS unsigned, -1 isn't very likely an offset :)
see online-conversion:
https://cryptii.com/pipes/integer-converter
https://imgur.com/BiCqyoI
Quote from: Daniel3D on October 11, 2022, 10:47:13 AMseg000 Line 1060: mov ax, 0AC74h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-1060)
mov ax, offset aGsta; "gsta"
push ax
push [bp+var_38]
push [bp+var_3A]
call locate_shape_alt
add sp, 6
push dx
push ax
mov ax, 0AC74h
push ax
call copy_string
add sp, 6
push word_407D6
push word_407D4
mov ax, 4Ch ; 'L'
if this is one there are 13 other hits on "ax, 0AC74h"
0AC74h is very likely an offset into the data segment, to some string or something
you need to analyse copy_string - in IDA you would annotate the parameter of copy_string so IDA can infere further
Quote from: Daniel3D on October 11, 2022, 10:51:24 AMThis is the last for now. Enough to test if I am finding them correctly .. And to see if it is useful..
(I will make more compact logs of others I find when useful to continue. I did it this way, so you can easily see if I make obvious mistakes)
seg000 Line 1795: mov ax, 0FFFEh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-1795)
call shape3d_load_all
mov ax, 0C8h ; 'È'
push ax
mov ax, 140h
push ax
mov ax, 28h ; '('
push ax
push ax
call set_projection
add sp, 8
mov ax, 0FFFEh
push ax
call init_game_state
add sp, 2
call sprite_copy_wnd_to_1
push skybox_grd_color
call sprite_clear_1_color
if this is one there are 3 other hits on "ax, 0FFFEh"
0FFFEh is not a valid looking offset - just too big, and 0FFFEh as signed is -2 - so its maybe some sort
of parameter or really the value 65534
you need to understand hex/dec, signed/unsigned and type-size very well do get a "feeling" what that number could be - combined with knowledge about the called functions
just to give you a feeling what the code does in one of your examples:
seg000:053A _ask_dos: ; CODE XREF: stuntsmain+43D␘j
seg000:053A sub ax, ax
seg000:053C push ax ; show_dialog param 9
seg000:053D push ax ; show_dialog param 8
seg000:053E push dialogarg2 ; show_dialog param 7
seg000:0542 mov ax, 0FFFFh
seg000:0545 push ax ; show_dialog param 6
seg000:0546 push ax ; show_dialog param 5
seg000:0547 mov ax, offset aDos ; "dos"
seg000:054A push ax ; locate_text_res param 3
seg000:054B push word ptr mainresptr+2 ; locate_text_res param 2
seg000:054F push word ptr mainresptr ; locate_text_res param 1
seg000:0553 call locate_text_res
seg000:0558 add sp, 6 -> 6 bytes removed from strack (du to the previous 3 pushes 'a 2 bytes)
seg000:055B push dx ; show_dialog param 4
seg000:055C push ax ; show_dialog param 3
seg000:055D mov ax, 1
seg000:0560 push ax ; show_dialog param 2
seg000:0561 mov ax, 2
seg000:0564 push ax ; show_dialog param 1
seg000:0565 call show_dialog
seg000:056A add sp, 12h ; 12h = 18 bytes bytes on stack removed (due to the previus 9 pushes)
this is the C-port of that asm-code
locate_text_res(mainresptr.offset, mainresptr.segment, "dos"); // sets dx and ax (could be a ptr)
show_dialog(2, 1, ax, dx, -1, -1, dialogarg2, 0, 0);
Quote from: llm on October 11, 2022, 11:15:47 AMQuote from: Daniel3D on October 11, 2022, 10:47:13 AMseg000 Line 1060: mov ax, 0AC74h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-1060)
mov ax, offset aGsta; "gsta"
push ax
push [bp+var_38]
push [bp+var_3A]
call locate_shape_alt
add sp, 6
push dx
push ax
mov ax, 0AC74h
push ax
call copy_string
add sp, 6
push word_407D6
push word_407D4
mov ax, 4Ch ; 'L'
if this is one there are 13 other hits on "ax, 0AC74h"
0AC74h is very likely an offset into the data segment, to some string or something
you need to analyse copy_string - in IDA you would annotate the parameter of copy_string so IDA can infere further
Ok. This was the one i had a good feeling about. The other two felt to tidy, to deliberate..
I'm not starting with ida. I'm just going to try and find them all.
It's up to you and other pros to double check and change them.
Quote from: Daniel3D on October 11, 2022, 12:38:31 PMI'm not starting with ida.
would be the easiest - but IDA is commercial, costs ~400$ in the home edition
i would love to go back to IDA Freeware 5 (the only free version that still supports DOS)
official download available on ScummVM homepage: https://www.scummvm.org/news/20180331/
but upgrading the IDA database (idb) is a one-way-ticket - and im currently working with 6.8
but you should install the freeware - give you a good idea how that all works, even if IDA is not the latest of the latest - most reversing projects using this freeware version (or Ghidra - which is sometimes problematic with segment/offset support)
Quote from: llm on October 11, 2022, 12:47:52 PMQuote from: Daniel3D on October 11, 2022, 12:38:31 PMI'm not starting with ida.
would be the easiest - but IDA is commercial, costs ~400$ in the home edition
O i would love to learn more about this.
But with the current state of my knowledge everything has to be checked anyway.
But i can significantly reduce the amount of options.
This should be one to. (https://bitbucket.org/dreadnaut/restunts/src/master/src/restunts/asmorig/seg010.asm#lines-188)
Lucky find on my phone..
Quote from: Daniel3D on October 11, 2022, 01:03:15 PMThis should be one to. (https://bitbucket.org/dreadnaut/restunts/src/master/src/restunts/asmorig/seg010.asm#lines-188)
Lucky find on my phone..
no thats the DOS-API (int 21h, function=ah=4Ch=exit program, with error=al=0FFh result == -1)
http://www.osfree.org/doku/en:docs:dos:api:int21:4c
could be written as
mov ah,4Ch
mov al,0FFh ; -1
int 21h
or
mov ax,4CFFh
int 21h
you always need to analyse the context around a little - everything in assembler is more or less global, typeless (pointer, value, ... everything is possible)
C port of that is
exit(-1);
Alright,..
As far as I understand now, the non-symbolic offsets point to a location in the binary with a hex value.
Looking at you regex i removed some variables to filter results that i wouldn't recognize as offset anyway.
resulting in : ((0[a-fA-F0-9]*|[1-9][a-fA-F0-9]*)h) i suppose that will only result in all hex values.
On that note, I looked at the first 12 segments. And I have come up with this..
This is everything resembling an offset I can find..
seg000 Line 2185: mov ax, 95F8h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg000.asm#lines-2185)
seg003 Line 860: mov ax, 0AE6h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg003.asm#lines-860) ; 2790 .. probably nothing
seg003 Line 3996: mov ax, 4650h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg003.asm#lines-3996) 1/3 in short space.. probably nothing
seg003 Line 4002: mov ax, 3A98h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg003.asm#lines-4002) 2/3 in short space.. probably nothing
seg003 Line 4012: mov ax, 0B9B0h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg003.asm#lines-4012) 2/3 in short space.. probably nothing
seg003 Line 5847: mov ax, 0AA0Eh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg003.asm#lines-5847)
seg005 Line 1282.84,96: mov ax, 0AA5Eh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg005.asm#lines-1282) also 3 in a few lines probably nothing
seg007 Line 123: cmp si, 6AD0h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg007.asm#lines-123)
seg007 Line 844: add bx, 6364h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg007.asm#lines-844) ; 25444 ??
seg007 Line 861: mov ax, 33BCh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg007.asm#lines-861)
seg009 Line 2201: mov ax, 95F8h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg009.asm#lines-2201)
seg010 Line 835: mov si, 54C6h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-835)
seg010 Line 980: mov bx, 36BAh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-980)// these two and several others all between 3000H and 4000h
seg010 Line 999: mov si, 36D0h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-999)\\ until Line 1460 // probably nothing. just in case
seg010 Line 2066: mov ax, 37EAh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-2066)
seg010 Line 2071: mov ax, 37F1h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-2071)
seg010 Line 3184: mov di, 360Ah (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-3184)
seg010 Line 3206: cmp si, 365Ah (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-3206)
seg010 Line 3244: cmp si, 365Ah (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-3244)
seg010 Line 3868: mov ax, 43FDh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-3868)
seg010 Line 3878: add ax, 9EC3h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg010.asm#lines-3878)
seg012 Line 7499: mov ax, 49A0h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-7499)
seg012 Line 7882: mov ax, 4BC6h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-7882)
seg012 Line 9800: sub cx, 425Ch (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-9800)
others not an offset but could be mistaken.
seg001 Line 944: mov ax, 1E00h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg001.asm#lines-944) ; i think based on contents: full trk grid length/4
seg001 Line 2407: cmp [bx+di+CARSTATE.car_rc1], 5AEBh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg001.asm#lines-944) ; "23275" ???
Assuming I didn't miss any.. (which is a bold assumption, I am aware of that) the number of offsets is not too many, even if all of these are actually correct.
I'm about 2/3 through the results now. I hope I have not missed any. I like to have the code clean of them.
Furthermore, I could not find any in segment 16 to 26. But I am only looking at big hex values that are not 0FFsomthing or round thousands in Dec..
(you don't have to explain every possible hit or miss, but I do hope it is of use when somebody has time to work on the source)
seg012 Line 10106: mov ax, ds:4E92h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-10106) : 20114..
seg012 Line 10252: cmp bx, ds:3C56h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-10252) : 15446
seg012 Line 13214: mov ax, 5416h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-13214) : 21526
seg012 Line 18575: mov ss:54AAh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-18575) : 21674
seg012 Line 18597: mov cl, ss:54ABh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg012.asm#lines-18597) : 21675
seg014 Line 116: mov ax, 2D41h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg014.asm#lines-116) : 11585
seg014 Line 193: mov ax, 393Eh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg014.asm#lines-193) : 14654
seg015 Line 194: mov ax, 3E17h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg015.asm#lines-194) : 15895
seg015 Line 249: mov ax, 3333h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg015.asm#lines-249) : 13107
seg027 Line 271: mov di, 8224h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-271) : 33316
seg027 Line 313: mov di, 0A2B6h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-313) : 41654
seg027 Line 510: mov si, 81FCh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-510) : 33276
seg027 Line 569: mov di, 86BCh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-569) : 34492
seg027 Line 744: mov si, 86E0h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-744) : 34528
seg027 Line 1383: mov ax, 4EC6h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-1383) : 20166
seg027 Line 2705: mov ax, 4FA3h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2705) : 20387
seg027 Line 2714: mov ax, 4FD5h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2714) : 20437
seg027 Line 2719: mov di, 8214h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2719) : sp? 33300
seg027 Line 2720: mov word ptr [bp-6], 81FCh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2720) : 33276
seg027 Line 2728: mov ax, 4FFBh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2728) : 20475
seg027 Line 2738: mov ax, 5010h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2738) : 20496
seg027 Line 2748: mov di, 0A2C2h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2748) --
seg027 Line 2750: mov word ptr [bp-4], 0A2BEh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2750) --
seg027 Line 2752: mov word ptr [bp-8], 0A2B7h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2752) --
seg027 Line 2769: mov ax, 501Dh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2769) --
seg028 Line 237: add ax, 81FCh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg028.asm#lines-237) ref to dseg
seg028 Line 262: mov ax, 728Eh (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg028.asm#lines-262) --
don't know...
seg027 Line 2224: mov di, 0A2B7h (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg027.asm#lines-2224) ?? 5 similar in 5 lines,
Line 2224: mov di, 0A2B7h
Line 2225: mov [bp+var_6], 0A2B6h
Line 2226: mov [bp+var_8], 0A2B8h
Line 2227: mov [bp+var_A], 0A2C6h
Line 2228: mov [bp+var_C], 0A2E2h
nice findings - i will have a look and check what of these a real offsets - but i think at least 50% are very likely offsets
Quote from: llm on October 13, 2022, 01:17:24 PMbut i think at least 50% are very likely offsets
That is a motivating estimate.
again, for your daily training :)
what happens if offsets are not symbolic?
0x3440 func0
0x3440 mov ax,0x3456
0x3442 call XYZ
0x3448
0x3450 func1
0x3451 some code <-- the above non-symbolic offset will get wrong if you add/remove code here
0x3452
0x3453
0x3454
0x3455
0x3456: dw some_value 234
func0
mov ax,offset some_value
call XYZ
func1
some code <-- the above symbolic offset will not get wrong if you add/remove code here
dw some_value 234
Quote from: llm on October 14, 2022, 09:10:47 AMwhat happens if offsets are not symbolic?
Code Select Expand
0x3440 func0
0x3440 mov ax,0x3456
0x3442 call XYZ
0x3448
0x3450 func1
0x3451 some code <-- the above non-symbolic offset will get wrong if you add/remove code here
0x3452 added code
0x3453 added code
0x3454 added code
0x3455
0x3456: Something entirely different (not: dw some_value 234)
0x3457
0x3458
0x3459: dw some_value 234
Like this?
Then func0 fails. I know. That is why getting rid of them is important.
Just like making it para 16 and removing the alignment bytes; that I can do myself, I think.
I have more time next year. I will do everything I can to have those two finished next year.
It will benefit reversing the code, but also modding the code.
And it does not matter where Func0 is. If the new code is before the location that Func0 is looking for it fails.
The function itself is not affected by the changed offset, but it fails because the required data has moved. Like..
0x3448
0x3450 func1
0x3451 some code <-- the above non-symbolic offset will get wrong if you add/remove code here
0x3452 added code
0x3453 added code
0x3454 added code
0x3455
0x3456: Something entirely different (not: dw some_value 234)
0x3457
0x3458
0x3459: dw some_value 234
0
0
more...
0
0
0x4440 func0
0x4440 mov ax,0x3456
0x4442 call XYZ
There is a function that loads horizons. That function gets its filenames from Dseg.
You can not add a name there because it will create an offset and make all non-symbolic functions in the whole code fail. If we fix that, we can easily (probably not ::) ) create new horizons.
it's a trivial, unimportant change, but a nice small project to make and to kinda test stability.
Quote from: Daniel3D on October 14, 2022, 12:16:40 PMQuote from: llm on October 14, 2022, 09:10:47 AMwhat happens if offsets are not symbolic?
Code Select Expand
0x3440 func0
0x3440 mov ax,0x3456
0x3442 call XYZ
0x3448
0x3450 func1
0x3451 some code <-- the above non-symbolic offset will get wrong if you add/remove code here
0x3452 added code
0x3453 added code
0x3454 added code
0x3455
0x3456: Something entirely different (not: dw some_value 234)
0x3457
0x3458
0x3459: dw some_value 234
Like this?
Then func0 fails. I know. That is why getting rid of them is important.
yes 100% correct - but "fails" isnt defined here - it could be that the algorithm works still because its just not that robust, or there is a identical or nearly identical value at the target offset
think of values like 0,255,-1 or something there a very typical around so it "could" still work
Quote from: Daniel3D on October 14, 2022, 12:42:16 PMAnd it does not matter where Func0 is. If the new code is before the location that Func0 is looking for it fails.
yes 100% correct - its not called fails, but "undefined behavior"
its not clear what happens when the value gets read from the wrong offset - nearly everything is possile - like random-problem-generator, it could be that there is always 0 and the correct code always wanted 0, or there is a ever changing value that most of the time is in a range were the function can work with and producing no visual or audio glitches, maybe some strange physic behavior while driving a special way
Quote from: Daniel3D on October 14, 2022, 12:49:10 PMThere is a function that loads horizons. That function gets its filenames from Dseg.
You can not add a name there because it will create an offset and make all non-symbolic functions in the whole code fail. If we fix that, we can easily (probably not ::) ) create new horizons.
it's a trivial, unimportant change, but a nice small project to make and to kinda test stability.
yes, but you could add it to the end of the data-segment, and move the stack-segment a little for example
- the stack location is only needed in very early stage of the game while initilizing, this offset is already symbolic, there are some options that do not make the offsets go corrupt, but full symbolic is always the best we can have
Quote from: llm on October 14, 2022, 01:56:05 PMThere is a function that loads horizons. That function gets its filenames from Dseg.
how is that function called? be always precise :)
Quote from: llm on October 14, 2022, 05:19:16 PMQuote from: llm on October 14, 2022, 01:56:05 PMThere is a function that loads horizons. That function gets its filenames from Dseg.
how is that function called? be always precise :)
I don't remember precise. And I can't access the code now. But I found it by searching for aDesert I believe (could be another horizon file name, but I think it is this one). It is, I think, also a misnamed variable because only the desert is used in the code, but it points to the first horizon file name with the same name, the others follow with a regular offset. I followed the trail of this variable and found the part where the horizon is loaded. It's been more than a year ago, so I don't remember exactly.. I'll see if I have some notes on it,..
EDIT : a few minutes later...
Found a reference to the code. It is difficult to read, and I may be very mistaken, but I believe that it is where the horizon is loaded, it points default to the first one Desert.
Seg003 line 6120 (https://bitbucket.org/dreadnaut/restunts/src/aa1e714a66f8f9bd0d78bb1c0c3ab6b69252721d/src/restunts/asmorig/seg003.asm#lines-6120)
loc_1D7C8:
push cs
call near ptr unload_skybox
mov al, [bp+arg_0]
mov byte_46167, al
mov byte_3B8F6, 1
cbw
mov cx, ax
shl ax, 1
shl ax, 1
shl ax, 1
add ax, cx
add ax, offset aDesert; "desert"
push ax
call file_load_shape2d_fatal_thunk
add sp, 2
mov skybox_res_ofs, ax
mov skybox_res_seg, dx
mov ax, offset skyboxes
push ax
mov ax, offset aScensce2sce3sce4; "scensce2sce3sce4"
push ax
push dx
push skybox_res_ofs
call locate_many_resources
The game reads the TRK file and loads the landscape byte into its corresponding memory region. Then, this byte is read at some point and is passed to a function to load the background graphic. The function has to multiply this value by 9 and then add it to a memory offset which is aDesert. This will cause the resulting value to point to the first letter of the landscape graphics file name
It reads the filename aDesert in Dseg (https://bitbucket.org/dreadnaut/restunts/src/master/src/restunts/asmorig/dseg.asm#lines-1824)
aDesert db 100
db 101
db 115
db 101
db 114
db 116
db 0
db 0
db 0
aTropical db 116
db 114
db 111
db 112
db 105
db 99
db 97
db 108
db 0
aAlpine db 97
db 108
db 112
db 105
db 110
db 101
db 0
db 0
db 0
aCity db 99
db 105
db 116
db 121
db 0
db 0
db 0
db 0
db 0
aCountry db 99
db 111
db 117
db 110
db 116
db 114
db 121
db 0
db 0
db 0
hillHeightConsts dw 0
shl ax, 1
shl ax, 1
shl ax, 1
add ax, cx
Those three "shl ax, 1" That command shifts the bits in AX to the left. It does it three times, so it means it's multiplying by eight. At the end, there's "add ax, cx", which adds the value once again completing the multiplication by nine.
*Learned that form CAS. 8)
Quote from: Daniel3D on October 15, 2022, 10:44:16 PM*Learned that form CAS. 8)
yes correct:
seg003:38BC mov al, [bp+arg_0] <-- al = arg0
seg003:38BF mov byte_46167, al
seg003:38C2 mov byte_3B8F6, 1
seg003:38C7 cbw <== ax = signe-extended(al)
seg003:38C8 mov cx, ax
seg003:38CA shl ax, 1
seg003:38CC shl ax, 1
seg003:38CE shl ax, 1
seg003:38D0 add ax, cx
seg003:38D2 add ax, offset aDesert ; "desert"
seg003:38D5 push ax <-- first parameter of file_load_shape2d_fatal_thunk
seg003:38D6 call file_load_shape2d_fatal_thunk
CBW: https://c9x.me/x86/html/file_module_x86_id_27.html
its ax = 9 * cbw(arg0) + offset aDesert
so in C that would be like
aDesert[arg0*9]
and the aDesert could be just the first element - but nothing todo with desert
or some other strange way to adress a array or member inside of aDesert
and the "9" is the max size of the string
dseg:0140 aDefault db 'DEFAULT',0
dseg:0148 db 0
dseg:0149 db 0
==> table with 5, 8+1 byte strings
dseg:014A aDesert db 'desert',0,0,0 ; DATA XREF: sub_1D7A2+40␘o
dseg:0153 aTropical db 'tropical',0
dseg:015C aAlpine db 'alpine',0,0,0
dseg:0165 aCity db 'city',0,0,0,0,0
dseg:016E aCountry db 'country',0,0
so in C that would be "char[9] background[5]" and arg0 is then 0-4
dseg:0177 db 0
dseg:0178 db 0
dseg:0179 db 0
in C++ that would be exactly (and 100% binary equal)
using background_name_t = char[9];
const background_name_t background_names[5] // the missing 0 is implicitly added due to beeing a c-string and a global var
{
"desert",
"tropical",
"alpine",
"city",
"country"
};
so C/C++ knows that every entry in background_names is a 9 byte string - so
the arithmetic of multiplying by 9 is done implicit - based on the type definition
ptr to background_names is equal to background_names at position of "desert"
thats the reason that IDA thinks the code offsets aDesert directly
but the code just referes the whole table
and then just
file_load_shape2d_fatal_thunk(background_names[arg0]);
the same as
ax = 9 * cbw(arg0) + offset aDesert
push ax
call file_load_shape2d_fatal_thunk
as you can see the complexity reduce is big, comparing C with asm :)
compiling this C/C++ code with the original Stunts 16bit compiler "Microsoft C 5.1" reveals this code
#include <string.h>
typedef char background_name_t[9];
const background_name_t background_names[5] =
{
"desert",
"tropical",
"alpine",
"city",
"country"
};
int main(int argc, char* argv[])
{
return strlen(background_names[argc]);
}
the generated assembler code for this small snipped looks very much like the original code (or can be tuned to look exact the same)
seg000:0010 ; int __cdecl main(int argc, const char **argv, const char **envp)
seg000:0010 _main proc near ; CODE XREF: start+8D␙p
seg000:0010
seg000:0010 arg_0 = word ptr 4
seg000:0010
seg000:0010 push bp
seg000:0011 mov bp, sp
seg000:0013 xor ax, ax
seg000:0015 call __chkstk
here is your original assembler code (ignoring cbw) as a result from my C/C++ code
seg000:0018 mov ax, [bp+arg_0]
seg000:001B mov cx, ax
seg000:001D shl ax, 1
seg000:001F shl ax, 1
seg000:0021 shl ax, 1
seg000:0023 add ax, cx
seg000:0025 add ax, offset aDesert ; "desert"
seg000:0028 push ax ; char *
seg000:0029 call _strlen
seg000:002C add sp, 2
seg000:002F pop bp
seg000:0030 retn
seg000:0030 _main endp
also the data-segment part of the background tables is 100% binary identical
dseg:003C db 43h ; C
dseg:003D db 6Fh ; o
dseg:003E db 72h ; r
dseg:003F db 70h ; p
dseg:0040 db 11h
dseg:0041 db 0
dseg:0042 aDesert db 'desert',0 ; DATA XREF: _main+15␘o
dseg:0049 db 0
dseg:004A db 0
dseg:004B db 74h ; t
dseg:004C db 72h ; r
dseg:004D db 6Fh ; o
dseg:004E db 70h ; p
dseg:004F db 69h ; i
dseg:0050 db 63h ; c
dseg:0051 db 61h ; a
dseg:0052 db 6Ch ; l
dseg:0053 db 0
dseg:0054 db 61h ; a
dseg:0055 db 6Ch ; l
dseg:0056 db 70h ; p
dseg:0057 db 69h ; i
dseg:0058 db 6Eh ; n
dseg:0059 db 65h ; e
dseg:005A db 0
dseg:005B db 0
dseg:005C db 0
dseg:005D db 63h ; c
dseg:005E db 69h ; i
dseg:005F db 74h ; t
dseg:0060 db 79h ; y
dseg:0061 db 0
dseg:0062 db 0
dseg:0063 db 0
dseg:0064 db 0
dseg:0065 db 0
dseg:0066 db 63h ; c
dseg:0067 db 6Fh ; o
dseg:0068 db 75h ; u
dseg:0069 db 6Eh ; n
dseg:006A db 74h ; t
dseg:006B db 72h ; r
dseg:006C db 79h ; y
dseg:006D db 0
dseg:006E db 0
dseg:006F db 0
dseg:0070 word_105D0 dw 0 ; DATA XREF: start+4A␘w
Thank you. That really makes it clearer. I kind of deducted the functionality but this is a lot more detailed.
My guess is that if the non symbolic offsets are fixed and the para alignment (do i say that correctly? You know what I mean) is done. Then it may be very easy to expand the horizons.
Quote from: Daniel3D on October 16, 2022, 03:28:12 PMThank you. That really makes it clearer. I kind of deducted the functionality but this is a lot more detailed.
the more you understand the better...
Quote from: Daniel3D on October 16, 2022, 03:28:12 PMMy guess is that if the non symbolic offsets are fixed and the para alignment (do i say that correctly? You know what I mean) is done. Then it may be very easy to expand the horizons.
it would reduce problems alot
im currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for
Can it be that the ida has mistaken them for labels and that they are just values?
I don't know how much the ida has evolved since the first disassembly. Also from what I've read about the process I have a feeling that you have a bit more experience with this. So maybe your settings create a cleaner result..
That would be unfortunate because that would mean that it is smart to redo the entire process. And there has been done a lot of research and analyzing that has to be copied and checked.
Quote from: Daniel3D on October 16, 2022, 06:23:19 PMCan it be that the ida has mistaken them for labels and that they are just values?
normaly not - i also can't find any code that uses that lables - they are just there...
Quote from: Daniel3D on October 16, 2022, 06:23:19 PMI don't know how much the ida has evolved since the first disassembly. Also from what I've read about the process I have a feeling that you have a bit more experience with this. So maybe your settings create a cleaner result..
even an old version of IDA doesn't create these labels, strange - but they are not everywere only some functions
Quote from: Daniel3D on October 16, 2022, 06:23:19 PMThat would be unfortunate because that would mean that it is smart to redo the entire process. And there has been done a lot of research and analyzing that has to be copied and checked.
IDA is able to store the analyze results as script - everythingin IDA is script-based - for reproduciblity
sadly it doesn't work very good for downgrading to the freeware version :(
another thing that i've found is that very few functions are typed - except the 3d engine everything in stunts in C based so every function from C following the cdecl calling convention (https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions#CDECL), reverse order stack pushes for the parameters - very well defined
normaly you start very early to annotate the functions in the disassembly with IDA to be clean cdecl defined - that helps IDA to infere more about the code and spread type infos over the code
it seems that we started with that but never done it for most of the functions - that makes the code more
harder to read - it would be a big win to annotate the C-functions properly, and even for non cdecl functions there is the __usercall (https://www.hex-rays.com/products/ida/support/idadoc/1361.shtml) feature of IDA that allows to annotated registers etc. as parameter to descripe the "interface" of a pure-assembler function better
my goal is it to write a simple IDA script that contains all functions + names + signatures
and global structs and its usage to feed it at a very early state of analyse to IDA, so IDA can infer more
maybe its also possible to use this script-variant on Ghidra or the freeware version of IDA - to make it more easy to play with the information in open source or freewaret tools
the cdecl information would be enough for me to trace every cdecl call from the game in my dosbox extension - its formalized enough that
i just need the signature and then im able to print what the parameter content and return values are - that helps sometimes to understand better
what the code is doing (a trace every function call)
such a function
seg016:0002 locate_many_resources proc far ; CODE XREF: load_intro_resources+2A␘P
seg016:0002 ; run_opponent_menu+4A␘P
seg016:0002 ; load_skybox+60␘P
seg016:0002 ; load_sdgame2_shapes+2C␘P
seg016:0002 ; setup_intro+2E␘P
seg016:0002 ; setup_car_shapes+9C␘P
seg016:0002 ; setup_car_shapes+B4␘P
seg016:0002 ; setup_car_shapes+D3␘P
seg016:0002 ; loop_game+34␘P
seg016:0002 ; load_tracks_menu_shapes:loc_2A2E3␘P
seg016:0002 ; load_tracks_menu_shapes:loc_2A2F9␘P
seg016:0002 ; load_tracks_menu_shapes+53␘P
seg016:0002
seg016:0002 arg_0 = word ptr 6
seg016:0002 arg_2 = word ptr 8
seg016:0002 arg_4 = word ptr 0Ah
seg016:0002 arg_6 = word ptr 0Ch
seg016:0002
seg016:0002 push bp
seg016:0003
seg016:0003 loc_367B3:
seg016:0003 mov bp, sp
seg016:0005
seg016:0005 loc_367B5:
seg016:0005 jmp short loc_367D9
seg016:0005 ; ---------------------------------------------------------------------------
seg016:0007 align 2
seg016:0008
seg016:0008 loc_367B8: ; CODE XREF: locate_many_resources+2D␙j
seg016:0008 push [bp+arg_4]
seg016:000B
seg016:000B loc_367BB:
seg016:000B push [bp+arg_2]
seg016:000E
seg016:000E loc_367BE:
seg016:000E push [bp+arg_0]
seg016:0011
seg016:0011 loc_367C1:
seg016:0011 call locate_shape_fatal
seg016:0016
seg016:0016 loc_367C6:
seg016:0016 add sp, 6
seg016:0019
seg016:0019 loc_367C9:
seg016:0019 mov bx, [bp+arg_6]
seg016:001C
seg016:001C loc_367CC:
seg016:001C add [bp+arg_6], 4
seg016:0020
seg016:0020 loc_367D0:
seg016:0020 mov [bx], ax
seg016:0022 mov [bx+2], dx
seg016:0025 add [bp+arg_4], 4
seg016:0029
seg016:0029 loc_367D9: ; CODE XREF: locate_many_resources:loc_367B5␘j
seg016:0029 mov bx, [bp+arg_4]
seg016:002C
seg016:002C loc_367DC:
seg016:002C cmp byte ptr [bx], 0
seg016:002F jnz short loc_367B8
seg016:0031 pop bp
seg016:0032 retf
seg016:0032 locate_many_resources endp
most of the inner labels are complete unused
fresh IDA import
seg016:0002 sub_367B2 proc far ; CODE XREF: sub_10786+2A␘P
seg016:0002 ; sub_1293C+4A␘P ...
seg016:0002
seg016:0002 arg_0 = word ptr 6
seg016:0002 arg_2 = word ptr 8
seg016:0002 arg_4 = word ptr 0Ah
seg016:0002 arg_6 = word ptr 0Ch
seg016:0002
seg016:0002 push bp
seg016:0003 mov bp, sp
seg016:0005 jmp short loc_367D9
seg016:0005 ; ---------------------------------------------------------------------------
seg016:0007 nop
seg016:0008
seg016:0008 loc_367B8: ; CODE XREF: sub_367B2+2D␙j
seg016:0008 push [bp+arg_4]
seg016:000B push [bp+arg_2]
seg016:000E push [bp+arg_0]
seg016:0011 call sub_30F9D
seg016:0016 add sp, 6
seg016:0019 mov bx, [bp+arg_6]
seg016:001C add [bp+arg_6], 4
seg016:0020 mov [bx], ax
seg016:0022 mov [bx+2], dx
seg016:0025 add [bp+arg_4], 4
seg016:0029
seg016:0029 loc_367D9: ; CODE XREF: sub_367B2+3␘j
seg016:0029 mov bx, [bp+arg_4]
seg016:002C cmp byte ptr [bx], 0
seg016:002F jnz short loc_367B8
seg016:0031 pop bp
seg016:0032 retf
seg016:0032 sub_367B2 endp
and something to learn for you - how this cdecl,stack stuff for function calls work:
seg016:0008 push [bp+arg_4] ; 2 byte push - parameter 2
seg016:000B push [bp+arg_2] ; 2 byte push - parameter 1
seg016:000E push [bp+arg_0] ; 2 byte push - parameter 0
seg016:0011 call sub_30F9D
seg016:0016 add sp, 6 ; 3*2
the add sp,6 after the call means that the stack-pointer (where the parameter of sub_30F9D laying)
cleanups 6 bytes from the stack - so sub_30F9D is very likely a cdecl function - because these needs to do that - and 3 pushes = 3 parameter
and the 6 bytes are comming from 3 pushes a' 2 bytes before
this 80(1)86 code only allows 2 byte pushes onto the stack - so even bytes are pushed as words
but there are also 32bit values (for example far-ptr with segment+offset) that are pushed as parts
in C is this for example a void "test(int far* value)" -> segment/offset on stack as 2 pushes
Quote from: llm on October 16, 2022, 08:31:02 PMQuoteI don't know how much the ida has evolved since the first disassembly. Also from what I've read about the process I have a feeling that you have a bit more experience with this. So maybe your settings create a cleaner result..
even an old version of IDA doesn't create these labels, strange - but they are not everywere only some functions
Is it possible to "fix" these functions with your disassembled code. (I still have to process the rest of the code, maybe i can do that Wednesday or Friday). If both versions create a bit perfect assembly then they should be interchangeable right?
Quote from: llm on October 16, 2022, 08:31:02 PMmy goal is it to write a simple IDA script that contains all functions + names + signatures
and global structs and its usage to feed it at a very early state of analyse to IDA, so IDA can infer more
maybe its also possible to use this script-variant on Ghidra or the freeware version of IDA - to make it more easy to play with the information in open source or freewaret tools
I'm personally not a fan of freeware if you have to sacrifice functionality, I rather use a cracked version so that the Pro's can keep using the good stuff. If they complain about me using a crack I just quit or if i'm really doing important stuff (that's a joke :P ;D ) we could consider a VPN and remote desktop account. But to make it easier for others to join in the project, it could be a good option.
Quote from: llm on October 16, 2022, 08:31:02 PManother thing that i've found is that very few functions are typed - except the 3d engine everything in stunts in C based
Kevin said as such in the interview a few years ago. And i also noticed a difference between the menu structure code and that of the game's 3D engine. Which makes a lot of sense because that needs the most optimal code.
Quote from: llm on October 17, 2022, 10:24:39 AMand something to learn for you - how this cdecl,stack stuff for function calls work:
seg016:0008 push [bp+arg_4] ; 2 byte push - parameter 2
seg016:000B push [bp+arg_2] ; 2 byte push - parameter 1
seg016:000E push [bp+arg_0] ; 2 byte push - parameter 0
seg016:0011 call sub_30F9D
seg016:0016 add sp, 6 ; 3*2
the add sp,6 after the call means that the stack-pointer (where the parameter of sub_30F9D laying)
cleanups 6 bytes from the stack - so sub_30F9D is very likely a cdecl function - because these needs to do that - and 3 pushes = 3 parameter
and the 6 bytes are comming from 3 pushes a' 2 bytes before
this 80(1)86 code only allows 2 byte pushes onto the stack - so even bytes are pushed as words
but there are also 32bit values (for example far-ptr with segment+offset) that are pushed as parts
in C is this for example a void "test(int far* value)" -> segment/offset on stack as 2 pushes
I kinda get what you mean, but this is a few steps too advanced for me. I don't really know how memory stacking works. I have a vague impression, but that is part literal and part logical and most likely a big part wrong.. 8)
Quote from: llm on October 16, 2022, 08:31:02 PManother thing that i've found is that very few functions are typed - except the 3d engine everything in stunts in C based so every function from C following the cdecl calling convention (https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions#CDECL), reverse order stack pushes for the parameters - very well defined
Ok, Reading this:
QuoteThe calling function cleans the stack. This allows CDECL functions to have variable-length argument lists (aka variadic functions). For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used.
Is this kind of optimization the reason that it is difficult to reverse assembly back to C? (after it is assembled, compiled, decompiled, disassembled and converted to C) I probably have the steps wrong or mixed but (again >) you know what I mean. 8)
Quote from: Daniel3D on October 17, 2022, 10:55:35 AMI kinda get what you mean, but this is a few steps too advanced for me. I don't really know how memory stacking works. I have a vague impression, but that is part literal and part logical and most likely a big part wrong.. 8)
that stack space is located by segment: ss and register sp for offset
the stack grows down - normaly the stack is at the end of the exe, so i grows down in direction of the code the stack should never reach the code space - but could if there a bugs (aka stack-overflow)
push means put this word value on stack
pop means get it back from stack - so called LIFO principe - last-in-first-out
push 1
push 2
push 3
pop ax => 3
pop bx => 2
pop cx => 1
thats it, no further magic
Quote from: Daniel3D on October 17, 2022, 10:36:21 AMIs it possible to "fix" these functions with your disassembled code. (I still have to process the rest of the code, maybe i can do that Wednesday or Friday). If both versions create a bit perfect assembly then they should be interchangeable right?
sadly not direct - the IDA Database (IDB) is not really good merge-able - doesn't cleany follow source-only principe (much more then every tool i know, but still not enough) - but i think i will be ok in the end
Quote from: Daniel3D on October 17, 2022, 11:10:32 AMIs this kind of optimization the reason that it is difficult to reverse assembly back to C? (after it is assembled, compiled, decompiled, disassembled and converted to C) I probably have the steps wrong or mixed but (again >) you know what I mean. 8)
thats the primary reason with todays compilers, they optimize it the code so damn hard that you even can't find the functions anymore (inlineing etc.) - old 1990 compilers lucky weren't that advanced :)
so at least for Stunts - every C function (that implise cdecl calling convention) is more or less directly "seeable" also the parameters etc. because there is nearly no optimization
the pure assembler based functions (written in assembler in original) like the 3d engine doesn't need to follow any calling convention and can transport function-parameters in any technical possible way - using registers, evil stack filling, etc. - this are harder to detect - because stack pushes are very easy to see, some registers sets somewhere before the call not that much - you need to read the function code to understand if a register is a parameter, for a cdecl function you just need to look for add sp,VALUE after a call and some pushes before and its absolutley clear (in the case of stunts) that it is a cdecl C function
so i thing every call ..., add sp,VALUE is a cdecl C function call in the code
sorry
... For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used...
is that text from me? because that talks about name-mangling, that means the signature types of the function are also attached in a special way to the function name - but that does not happen for cdecl C functions - so its not relevant here
and this "problem" only happen with variadic parameters - that means functions like printf with an open parameter count - these variadic parameters
are nearly never used in normal code - so also not relevant here
Quote from: llm on October 17, 2022, 12:25:28 PMsorry
... For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used...
is that text from me? because that talks about name-mangling, that means the signature types of the function are also attached in a special way to the function name - but that does not happen for cdecl C functions - so its not relevant here
and this "problem" only happen with variadic parameters - that means functions like printf with an open parameter count - these variadic parameters
are nearly never used in normal code - so also not relevant here
No it is form the link you sent:
QuoteCDECL
In the CDECL calling convention the following holds:
Arguments are passed on the stack in Right-to-Left order, and return values are passed in eax.
The calling function cleans the stack. This allows CDECL functions to have variable-length argument lists (aka variadic functions). For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used.
Quote from: llm on October 16, 2022, 03:41:54 PMim currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for
The code has things that even i find strange, like in seg000:
loc_143BB:
cmp ax, 4D00h
[u]jnz short loc_143C3[/u]
jmp loc_144A4
loc_143C3:
jmp loc_14188
loc_143C6:
cmp [bp+var_selectedmenu], 0
jnz sh
I guess this could be written as:
loc_143BB:
cmp ax, 4D00h
jnz short loc_14188 ;loc_143C3
jmp loc_144A4
;loc_143C3:
;jmp loc_14188
loc_143C6:
cmp [bp+var_selectedmenu], 0
jnz sh
Quote from: Daniel3D on October 20, 2022, 09:43:28 AMQuote from: llm on October 16, 2022, 03:41:54 PMim currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for
The code has things that even i find strange, like in seg000:
loc_143BB:
cmp ax, 4D00h
[u]jnz short loc_143C3[/u]
jmp loc_144A4
loc_143C3:
jmp loc_14188
loc_143C6:
cmp [bp+var_selectedmenu], 0
jnz sh
I guess this could be written as:
loc_143BB:
cmp ax, 4D00h
jnz short loc_14188 ;loc_143C3
jmp loc_144A4
;loc_143C3:
;jmp loc_14188
loc_143C6:
cmp [bp+var_selectedmenu], 0
jnz sh
you're right - could be written as you said
sometimes assembler is that much of code that minor details like these gets lost while
developing because it still works - seems to be assembler-code in the first place or
some strage C code with gotos in original - the C code of that 2000 lines monster would be somewhere around <200-300 lines i think
or in Kevin Pickell words:
QuoteIt was my first 3d game and I made many mistakes
Take into account that conditional jumps are always short (maybe there are longer, but not in real mode, if I'm not mistaken). This means that you can't jump more than 128 bytes from a location with a conditional jump. This is why they're used for branching, but then inconditional jumps are used for moving between different regions of code. In some cases, if short enough, you can do this.
Cas is correct, i've forgot that detail
so your logic is correct Daniel but the CPU still needs different code
Quote from: llm on October 22, 2022, 08:31:59 AMCas is correct, i've forgot that detail
so your logic is correct Daniel but the CPU still needs different code
Nobody can know everything. But you two are such opposites in your experience that you have a very broad knowledge spectrum together.
The important thing is what we know, not what we like, so it's good to be able to complement
Quote from: llm on October 16, 2022, 03:41:54 PMim currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for
found the reason for that: IDA got a "Display assembly lines/basic block boundaries" feature for the disassembling - these strange lables get generated if that option is activated - sadly that feature can't be reverted
Quote from: llm on October 30, 2022, 09:50:22 AMQuote from: llm on October 16, 2022, 03:41:54 PMim currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for
found the reason for that: IDA got a "Display assembly lines/basic block boundaries" feature for the disassembling - these strange lables get generated if that option is activated - sadly that feature can't be reverted
There are a lot of them (if i read correctly)
Is it possible to redo it while maintaining the labels and comments that are made?
Quote from: Daniel3D on October 30, 2022, 11:49:25 AMThere are a lot of them (if i read correctly)
Is it possible to redo it while maintaining the labels and comments that are made?
Quotesadly that feature can't be reverted
but i check if there is some other option to revert it