News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

Wanting to understand Restunts source code structure

Started by Cas, August 28, 2022, 11:24:08 PM

Previous topic - Next topic

Cas

I'm looking at the assembly source code. We've already been working on it to make the needle colour mod and the contents are pretty clear locally, but I'm very lost as regards the general structure. I'd appreciate if you guys can point me out in the general direction. It'd help me locate some things and organise.

I can see that there are a number of segxxx.asm files, which contains most of the code. For each of these, there's also a corresponding segxxx.inc. But there's also dseg.asm and dseg.inc, there's custom.inc, segments.asm, structs.inc and dseg.map. I think the map file, like the obj files, is something that's produced during compilation and can be ignored, but the other files, I'm not quite sure what they represent. My main question is here is: where is the code start?
Earth is my country. Science is my religion.

llm

its a Medium-Model (https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=104012) DOS Exe
that means multiple code-segments (far calls to code if in another segment), a single data-segment (so data is always NEAR adressed, not data-segment changes needed)

the original Stunts is based on "Microsoft C 5.1" (from ~1988, years before the Visual C stuff)
so there is some stuff in the code that comes from the standard-library and the compiler - for example the
code around the main function etc.

the assembler source is currently assemble-able with TASM only (would be nice to port to WASM/MASM/UASM to be able to build under any system - just takes time its not super hard - minor differences)

the splitting to the segment files was primary done for better overview/seperation and for beeing able to easier be able to check if a assembled object
is binary exact to the original segment block (due to tiny difference in the assemblers (optimization features) or redundant commands some codes can be expressed with different opcodes of different size - what will corrupt non-symbolic-offsets) it was needed to check first if the resulting exe is absolutely exact to the original

the segment inc files are forward declaration so it easy to give every segments access to the "globals" around

the other inc and asm files are more for making it assemble-able or collect type definitions (that do to their nature do not bases on code) outside of the code

most (98%) of the code is generated with a script from IDA Pro - so changes to the IDA Database (IDB) will result in differently generated code
also the overload with C functions is done in this script

there are some build types - the pure original assembler (directly based on the IDA information), a variant were the standard-library is used and combined with already ported C functions (that are much easier to read then the assembler functions)

so there was never handwritten assembler code - everything (except your changes) is fully automaticly generated by IDA

IDA-Screenshot: https://pasteboard.co/AnVaANHb0Qq3.png

Cas

Thank you so much. Even being something automatically generated, it does help me a lot to have a context on its structure to better follow it and understand it as I work on it. For sure, porting to other assemblers would be cool.

Things I would like to do would be of the sort of easily inserting things knowing that they won't break the code (because of alignment, like it happened while we were trying to build with the dual-colour needle at first) and extracting parts of the code replacing them with others that would take their pointers, etc. It'd also help me analyse how some things are internally done that I could later use for inspiration in creating another engine. There's a lot of work in that original code.

Easily identifying functions that were part of the C run-time library and separate them from Stunts-specific functions also helps navigate the sea of code.
Earth is my country. Science is my religion.

llm

the code-segment count is defined by the Microsoft compiler/linker
seg010 is for example std library code most others are game code, the splitting is mostly up to the compiler or how the libs were designed at start, there seems (not prooved) parts that are fully assembler based (maybe the engine, but could be also that only some functions, not segments are pure assembler based)

QuoteThings I would like to do would be of the sort of easily inserting things knowing that they won't break the code

alignment isn't a real problem here - just changing the offsets of code is

hard to tell as long there is no deep analysis of non-symbolic offsets in the code
evil stuff like addressing a variable by using another variables-symbols plus a offset etc.

the very first routine that gets run when the exe starts is (this is the first code that gets jumped after DOS loaded the exe and done the relocation)

seg010:0012 start          proc near
this is the pre main routine that setups the stdlib stuff etc and calls the user main function

this is the standard int main(argc,argv,envp)
seg000:0000 ; int __cdecl stuntsmain(int p_argc, const char **p_argv, const char *envp)
and code like this for example is problematic

seg010:007C                 mov     di, 55CAh       ; offset in dseg where uninitialized data starts
seg010:007F                 mov     cx, 0AD20h      ; original size/end of dseg
seg010:0082                 sub     cx, di

the real problem is: you can not easily test if your changes are correct by just playing the game and looking for bugs - its so super easy to
introduce bugs without noticing it but the second or third extension - weeks/month later will trigger it hard enough etc. - working on
pure disassembled code is much much harder then working on real handwritten assembler :(

Cas

For sure!  There are no "intentions" in dissassembled code, so whenever you change something, you have to do it in a way that forces everything to fail if there's any mistake. Alignment issues not necessarily result from purposeful alignment by the compiler. Like you say, sometimes a reference can be pretty dangerous to work with and then, when you make a change, it looks like an alignment problem. In practice, it is, but it was not meant to be.

The part that we can do rather confortably is... well... reading the code. 3D physics are going to be really hard to read because I expect fixed point arithmetic there, which doesn't look nice in assembler.
Earth is my country. Science is my religion.

llm

Quote from: Cas on September 02, 2022, 10:30:28 PMAlignment issues not necessarily result from purposeful alignment by the compiler.

alignment means the direct positioning of data/code to fullfill "others" needs: hardware(bus) or for example dos API needs - the normal positioning of data and code in an executable is usually not called "alignment" (inside of struct there is also "padding")

alignment is nearly not needed in 16bit code on x86 (but very needed for SPARC for example)

dword, words are not aligned (as in a normal 32bit windows/linux program) in stunts - usually the old compilers just ignored padding in any form
and most handwritten assembler code also ignored any form of alignment or padding - everything is packed together with no space between

so we normaly talk about offsets the change


Cas

16 bit is the width of the paragraph in real mode. As you mention, there never is an actual need for alignment, but if the code has been programmed carelessly or optimised for speed, sometimes the calculation made to locate a certain variable in memory is not rigorous and is based on the assumption that the offset will be zero relative to the paragraph. So what I mean is actually aligned to the paragraph boundary.

I can't tell, in the case of Stunts here, but exactly the calculation is for a certain part of the code, but I can tell you what we experienced with Daniël when we were trying to get the needle colour code running.

The first code did not require any displacement from the original because only a word at a fixed location had to be changed. That worked flawlessly. But when we wanted to insert new code, I knew there was a chance of it not working because of Stunts expecting the code at a certain location. When we tried it, the game ran, but there was corruption in the video. You could see that the images were there, but all graphics positions were being miscalculated. Then, since the game did not crash, I had to think that it could be solved by finding the padding necessary to achieve alignment with the paragraph. So we added a padding byte one after another and on each compilation, the result changed, but still failed, until we found the sweet spot.

Again, I know this is not something that has to do with the compiler or the architecture. It's Stunts code that's assuming this alignment, so it's "software" alignment. But well... in the end, the result was that. Maybe if you play with it, you can recognise better than I how it is being calculated, but I assure you that it's there!
Earth is my country. Science is my religion.

llm

so we talk about segment alignment (i think that is your case here)
and sometimes about offsets that are partially (or fully) non-symbolic

seg011 segment byte public 'STUNTSC' use16
the "byte" in the segment definition means no alignment so the assembler
will not put in alignment bytes before - but the segment needs to start
at a paragraph (or divideable by 16) address that far code/data dependencies can work

https://docs.microsoft.com/en-us/cpp/assembler/masm/segment?view=msvc-170

the reason for having "byte" as segment alignment here is due to disassembling
the disassembler can't detect if the code works or not - or if the code uses some magic trick to make it work at runtime so it falls back to 1:1 reversing aka "keep the byte offsets intact" - that means there are sometimes alignment bytes around the segment that the disassembler didn't detect as segment alignment (there are so many posibilities so the disassembler just uses the always working one)

seg011 segment PARA public 'STUNTSC' use16
would force the assembler to always align the segment to a paragraph adress - but the hard introduced bytes before/after the segments needs to be adjust to get the very same binary again

that would be maybe a first step

the partially or full inner non-symbolic-offset problems are different


Daniel3D

O. I like that idea. Not that I fully understand what you mean but I like it anyway.

If this works on unmodified code and does not break if we insert the extend needle color code (if i understand correctly)
Then we may have an easier way to introduce new code..like a showroom that accepts all cars.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

Cas

Honestly, I'm not familiar with the directives of Turbo Assembler, so I just tried to align the bytes manually. And I agree that, when reading the code, we can't tell if something starts at a location because the previous block was padded with extra bytes or because it just ended there, so we just align to the byte. It makes sense.

Of course, even having seen it work, I don't feel 100% sure that all is OK with the module we compiled for the needle colour. That is, some parts of the code assume that this block starts with the paragraph. Another, at some other point, might it expect it to be, for some reason, located within a certain fixed distance of some other block and perhaps this assumption didn't fall in the tests we've made. We can't tell. But I think it's pretty safe to assume this is it and it will continue to work. Working with disassembled code, you never know if you broke it if it seems to work well.

Anyway, being able to follow the structure of the whole thing will be very helpful.

As regards replacing the showroom, this should be no different from the needle module. That is, yes, we'd be dropping a portion of original code and plugging something else there, but the technique to do it is basically the same. Again, the same thing can happen.

I never seem to get the time, but I'll try to give the code a deeper read :)
Earth is my country. Science is my religion.

Daniel3D

Quote from: Cas on September 04, 2022, 08:45:26 PMI never seem to get the time, but I'll try to give the code a deeper read :)
The code is well named for the most part.
I still intend to make a document (spreadsheet probably) that just documents the segments with simple descriptions of the functions and where calls go to and other connections.

For the menus that is quite easy.
For the engine (biggest part) its more difficult. More undefined code there.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

llm

Quote from: Daniel3D on September 04, 2022, 12:11:55 PMIf this works on unmodified code...

that is what i wrote :)

Quotebut the hard introduced bytes before/after the segments needs to be adjust to get the very same binary again

llm

Quote from: Cas on September 04, 2022, 08:45:26 PMwe can't tell if something starts at a location because the previous block was padded with extra bytes or because it just ended there, so we just align to the byte

one need to check if removing these bytes + segment align=para works - it should, but that isn't just replacing byte with para - every segment needs to be checked in the resulting exe to prove that it works - because you said it need to be aligned and para was were common in that days


Daniel3D

Quote from: llm on September 05, 2022, 10:25:59 AM
Quote from: Daniel3D on September 04, 2022, 12:11:55 PMIf this works on unmodified code...

that is what i wrote :)

I thought so, but I the confirmation I guess   ::)

But that's relative easy to do i guess .
I (of course) don't know where to place this code. But i can compile and help the check.
Edison once said,
"I have not failed 10,000 times,
I've successfully found 10,000 ways that will not work."
---------
Currently running over 20 separate instances of Stunts
---------
Check out the STUNTS resources on my Mega (globe icon)

llm

Quote from: Daniel3D on September 05, 2022, 01:56:38 PMBut that's relative easy to do i guess .

more or less - one need to change (starting with the very last segment, going up) to change "byte" to "para" - assemble and check if anything in the exe changed, then the next segment, one by one - producing as many exes as segments available

seg041.asm
dseg.asm
seg039.asm
seg038.asm
...
seg000.asm

from last to first because then its easier to see in a hex-editor what parts changes down, if you start with seg000 everything will change because the segments are orderd seg000, ... ,seg039, dseg, seg041 (segment 40 is the dseg (data segment))

it will mostly result in some bytes more (or less) around segment ends/begins then in the previous exe - which needs to be removed(added) in the assembler source