News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu
Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - llm

#16
Wow, great Stuff

Do you think it could be possible to export all that information to created a Blender movie from replays?
#17
Quote from: Daniel3D on October 30, 2022, 11:49:25 AMThere are a lot of them (if i read correctly)
Is it possible to redo it while maintaining the labels and comments that are made?

Quotesadly that feature can't be reverted

but i check if there is some other option to revert it
#18
Quote from: llm on October 16, 2022, 03:41:54 PMim currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for

found the reason for that: IDA got a "Display assembly lines/basic block boundaries" feature for the disassembling - these strange lables get generated if that option is activated - sadly that feature can't be reverted
#19
Cas is correct, i've forgot that detail

so your logic is correct Daniel but the CPU still needs different code

#20
Quote from: Daniel3D on October 20, 2022, 09:43:28 AM
Quote from: llm on October 16, 2022, 03:41:54 PMim currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for

The code has things that even i find strange, like in seg000:
loc_143BB:
    cmp     ax, 4D00h
    [u]jnz     short loc_143C3[/u]
    jmp     loc_144A4
loc_143C3:
    jmp     loc_14188
loc_143C6:
    cmp     [bp+var_selectedmenu], 0
    jnz     sh

I guess this could be written as:
loc_143BB:
    cmp     ax, 4D00h
    jnz     short loc_14188            ;loc_143C3
    jmp     loc_144A4
                                       ;loc_143C3:
                                       ;jmp     loc_14188
loc_143C6:
    cmp     [bp+var_selectedmenu], 0
    jnz     sh

you're right - could be written as you said

sometimes assembler is that much of code that minor details like these gets lost while
developing because it still works - seems to be assembler-code in the first place or
some strage C code with gotos in original - the C code of that 2000 lines monster would be somewhere around <200-300 lines i think

or in Kevin Pickell words:

QuoteIt was my first 3d game and I made many mistakes
#21
sorry

... For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used...
is that text from me? because that talks about name-mangling, that means the signature types of the function are also attached in a special way to the function name - but that does not happen for cdecl C functions - so its not relevant here

and this "problem" only happen with variadic parameters - that means functions like printf with an open parameter count - these variadic parameters
are nearly never used in normal code - so also not relevant here
#22
Quote from: Daniel3D on October 17, 2022, 11:10:32 AMIs this kind of optimization the reason that it is difficult to reverse assembly back to C? (after it is assembled, compiled, decompiled, disassembled and converted to C) I probably have the steps wrong or mixed but (again >) you know what I mean.  8)

thats the primary reason with todays compilers, they optimize it the code so damn hard that you even can't find the functions anymore (inlineing etc.) - old 1990 compilers lucky weren't that advanced :)
so at least for Stunts - every C function (that implise cdecl calling convention) is more or less directly "seeable" also the parameters etc. because there is nearly no optimization

the pure assembler based functions (written in assembler in original) like the 3d engine doesn't need to follow any calling convention and can transport function-parameters in any technical possible way - using registers, evil stack filling, etc. - this are harder to detect - because stack pushes are very easy to see, some registers sets somewhere before the call not that much - you need to read the function code to understand if a register is a parameter, for a cdecl function you just need to look for add sp,VALUE after a call and some pushes before and its absolutley clear (in the case of stunts) that it is a cdecl C function
so i thing every call ..., add sp,VALUE is a cdecl C function call in the code
#23
Quote from: Daniel3D on October 17, 2022, 10:36:21 AMIs it possible to "fix" these functions with your disassembled code. (I still have to process the rest of the code, maybe i can do that Wednesday or Friday). If both versions create a bit perfect assembly then they should be interchangeable right?

sadly not direct - the IDA Database (IDB) is not really good merge-able - doesn't cleany follow source-only principe (much more then every tool i know, but still not enough) - but i think i will be ok in the end
#24
Quote from: Daniel3D on October 17, 2022, 10:55:35 AMI kinda get what you mean, but this is a few steps too advanced for me. I don't really know how memory stacking works. I have a vague impression, but that is part literal and part logical and most likely a big part wrong..  8)

that stack space is located by segment: ss and register sp for offset
the stack grows down - normaly the stack is at the end of the exe, so i grows down in direction of the code the stack should never reach the code space - but could if there a bugs (aka stack-overflow)

push means put this word value on stack
pop means get it back from stack - so called LIFO principe - last-in-first-out


push 1
push 2
push 3
pop ax => 3
pop bx => 2
pop cx => 1

thats it, no further magic
 
#25
and something to learn for you - how this cdecl,stack stuff for function calls work:

seg016:0008                 push    [bp+arg_4] ; 2 byte push - parameter 2
seg016:000B                 push    [bp+arg_2] ; 2 byte push - parameter 1
seg016:000E                 push    [bp+arg_0] ; 2 byte push - parameter 0
seg016:0011                 call    sub_30F9D
seg016:0016                 add     sp, 6 ; 3*2

the add sp,6 after the call means that the stack-pointer (where the parameter of sub_30F9D laying)
cleanups 6 bytes from the stack - so sub_30F9D is very likely a cdecl function - because these needs to do that - and 3 pushes = 3 parameter
and the 6 bytes are comming from 3 pushes a' 2 bytes before

this 80(1)86 code only allows 2 byte pushes onto the stack - so even bytes are pushed as words
but there are also 32bit values (for example far-ptr with segment+offset) that are pushed as parts
in C is this for example a void "test(int far* value)" -> segment/offset on stack as 2 pushes
#26
such a function

seg016:0002 locate_many_resources proc far          ; CODE XREF: load_intro_resources+2A␘P
seg016:0002                                        ; run_opponent_menu+4A␘P
seg016:0002                                        ; load_skybox+60␘P
seg016:0002                                        ; load_sdgame2_shapes+2C␘P
seg016:0002                                        ; setup_intro+2E␘P
seg016:0002                                        ; setup_car_shapes+9C␘P
seg016:0002                                        ; setup_car_shapes+B4␘P
seg016:0002                                        ; setup_car_shapes+D3␘P
seg016:0002                                        ; loop_game+34␘P
seg016:0002                                        ; load_tracks_menu_shapes:loc_2A2E3␘P
seg016:0002                                        ; load_tracks_menu_shapes:loc_2A2F9␘P
seg016:0002                                        ; load_tracks_menu_shapes+53␘P
seg016:0002
seg016:0002 arg_0          = word ptr  6
seg016:0002 arg_2          = word ptr  8
seg016:0002 arg_4          = word ptr  0Ah
seg016:0002 arg_6          = word ptr  0Ch
seg016:0002
seg016:0002                push    bp
seg016:0003
seg016:0003 loc_367B3:
seg016:0003                mov    bp, sp
seg016:0005
seg016:0005 loc_367B5:
seg016:0005                jmp    short loc_367D9
seg016:0005 ; ---------------------------------------------------------------------------
seg016:0007                align 2
seg016:0008
seg016:0008 loc_367B8:                              ; CODE XREF: locate_many_resources+2D␙j
seg016:0008                push    [bp+arg_4]
seg016:000B
seg016:000B loc_367BB:
seg016:000B                push    [bp+arg_2]
seg016:000E
seg016:000E loc_367BE:
seg016:000E                push    [bp+arg_0]
seg016:0011
seg016:0011 loc_367C1:
seg016:0011                call    locate_shape_fatal
seg016:0016
seg016:0016 loc_367C6:
seg016:0016                add    sp, 6
seg016:0019
seg016:0019 loc_367C9:
seg016:0019                mov    bx, [bp+arg_6]
seg016:001C
seg016:001C loc_367CC:
seg016:001C                add    [bp+arg_6], 4
seg016:0020
seg016:0020 loc_367D0:
seg016:0020                mov    [bx], ax
seg016:0022                mov    [bx+2], dx
seg016:0025                add    [bp+arg_4], 4
seg016:0029
seg016:0029 loc_367D9:                              ; CODE XREF: locate_many_resources:loc_367B5␘j
seg016:0029                mov    bx, [bp+arg_4]
seg016:002C
seg016:002C loc_367DC:
seg016:002C                cmp    byte ptr [bx], 0
seg016:002F                jnz    short loc_367B8
seg016:0031                pop    bp
seg016:0032                retf
seg016:0032 locate_many_resources endp

most of the inner labels are complete unused

fresh IDA import

seg016:0002 sub_367B2       proc far                ; CODE XREF: sub_10786+2A␘P
seg016:0002                                         ; sub_1293C+4A␘P ...
seg016:0002
seg016:0002 arg_0           = word ptr  6
seg016:0002 arg_2           = word ptr  8
seg016:0002 arg_4           = word ptr  0Ah
seg016:0002 arg_6           = word ptr  0Ch
seg016:0002
seg016:0002                 push    bp
seg016:0003                 mov     bp, sp
seg016:0005                 jmp     short loc_367D9
seg016:0005 ; ---------------------------------------------------------------------------
seg016:0007                 nop
seg016:0008
seg016:0008 loc_367B8:                              ; CODE XREF: sub_367B2+2D␙j
seg016:0008                 push    [bp+arg_4]
seg016:000B                 push    [bp+arg_2]
seg016:000E                 push    [bp+arg_0]
seg016:0011                 call    sub_30F9D
seg016:0016                 add     sp, 6
seg016:0019                 mov     bx, [bp+arg_6]
seg016:001C                 add     [bp+arg_6], 4
seg016:0020                 mov     [bx], ax
seg016:0022                 mov     [bx+2], dx
seg016:0025                 add     [bp+arg_4], 4
seg016:0029
seg016:0029 loc_367D9:                              ; CODE XREF: sub_367B2+3␘j
seg016:0029                 mov     bx, [bp+arg_4]
seg016:002C                 cmp     byte ptr [bx], 0
seg016:002F                 jnz     short loc_367B8
seg016:0031                 pop     bp
seg016:0032                 retf
seg016:0032 sub_367B2       endp
#27
Quote from: Daniel3D on October 16, 2022, 06:23:19 PMCan it be that the ida has mistaken them for labels and that they are just values?

normaly not - i also can't find any code that uses that lables - they are just there...

Quote from: Daniel3D on October 16, 2022, 06:23:19 PMI don't know how much the ida has evolved since the first disassembly. Also from what I've read about the process I have a feeling that you have a bit more experience with this. So maybe your settings create a cleaner result..

even an old version of IDA doesn't create these labels, strange - but they are not everywere only some functions

Quote from: Daniel3D on October 16, 2022, 06:23:19 PMThat would be unfortunate because that would mean that it is smart to redo the entire process. And there has been done a lot of research and analyzing that has to be copied and checked.

IDA is able to store the analyze results as script - everythingin IDA is script-based - for reproduciblity
sadly it doesn't work very good for downgrading to the freeware version :(

another thing that i've found is that very few functions are typed - except the 3d engine everything in stunts in C based so every function from C following the cdecl calling convention (https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions#CDECL), reverse order stack pushes for the parameters - very well defined

normaly you start very early to annotate the functions in the disassembly with IDA to be clean cdecl defined - that helps IDA to infere more about the code and spread type infos over the code
it seems that we started with that but never done it for most of the functions - that makes the code more
harder to read - it would be a big win to annotate the C-functions properly, and even for non cdecl functions there is the __usercall (https://www.hex-rays.com/products/ida/support/idadoc/1361.shtml) feature of IDA that allows to annotated registers etc. as parameter to descripe the "interface" of a pure-assembler function better

my goal is it to write a simple IDA script that contains all functions + names + signatures
and global structs and its usage to feed it at a very early state of analyse to IDA, so IDA can infer more
maybe its also possible to use this script-variant on Ghidra or the freeware version of IDA - to make it more easy to play with the information in open source or freewaret tools

the cdecl information would be enough for me to trace every cdecl call from the game in my dosbox extension - its formalized enough that
i just need the signature and then im able to print what the parameter content and return values are - that helps sometimes to understand better
what the code is doing (a trace every function call)
#28
Quote from: Daniel3D on October 16, 2022, 03:28:12 PMThank you. That really makes it clearer. I kind of deducted the functionality but this is a lot more detailed.

the more you understand the better...

Quote from: Daniel3D on October 16, 2022, 03:28:12 PMMy guess is that if the non symbolic offsets are fixed and the para alignment (do i say that correctly? You know what I mean) is done. Then it may be very easy to expand the horizons.

it would reduce problems alot

im currently a little bit confused about the current state of some functions in the asmorig - some of the functions you've showed me are full of unused labels, messing the asm code a little
these labels do not exists if i freshly analyze the current game exe with IDA - need to find out what these labels are for
#29
Quote from: Daniel3D on October 15, 2022, 10:44:16 PM*Learned that form CAS.  8)

yes correct:

seg003:38BC                mov    al, [bp+arg_0] <-- al = arg0
seg003:38BF                mov    byte_46167, al
seg003:38C2                mov    byte_3B8F6, 1
seg003:38C7                cbw    <== ax = signe-extended(al)
seg003:38C8                mov    cx, ax
seg003:38CA                shl    ax, 1
seg003:38CC                shl    ax, 1
seg003:38CE                shl    ax, 1
seg003:38D0                add    ax, cx
seg003:38D2                add    ax, offset aDesert ; "desert"
seg003:38D5                push    ax <-- first parameter of file_load_shape2d_fatal_thunk
seg003:38D6                call    file_load_shape2d_fatal_thunk

CBW: https://c9x.me/x86/html/file_module_x86_id_27.html

its ax = 9 * cbw(arg0) + offset aDesert

so in C that would be like

aDesert[arg0*9]

and the aDesert could be just the first element - but nothing todo with desert

or some other strange way to adress a array or member inside of aDesert

and the "9" is the max size of the string

dseg:0140 aDefault        db 'DEFAULT',0
dseg:0148                db    0
dseg:0149                db    0

==> table with 5, 8+1 byte strings
dseg:014A aDesert        db 'desert',0,0,0      ; DATA XREF: sub_1D7A2+40␘o
dseg:0153 aTropical      db 'tropical',0
dseg:015C aAlpine        db 'alpine',0,0,0
dseg:0165 aCity              db 'city',0,0,0,0,0
dseg:016E aCountry        db 'country',0,0

so in C that would be "char[9] background[5]" and arg0 is then 0-4

dseg:0177                db    0
dseg:0178                db    0
dseg:0179                db    0

in C++ that would be exactly (and 100% binary equal)

using background_name_t = char[9];
const background_name_t background_names[5] // the missing 0 is implicitly added due to beeing a c-string and a global var
{
 "desert",
 "tropical",
 "alpine",
 "city",
 "country"
};

so C/C++ knows that every entry in background_names is a 9 byte string - so
the arithmetic of multiplying by 9 is done implicit - based on the type definition

ptr to background_names is equal to background_names at position of "desert"
thats the reason that IDA thinks the code offsets aDesert directly
but the code just referes the whole table

and then just

file_load_shape2d_fatal_thunk(background_names[arg0]);

the same as

ax = 9 * cbw(arg0) + offset aDesert
push ax
call file_load_shape2d_fatal_thunk

as you can see the complexity reduce is big, comparing C with asm :)

compiling this C/C++ code with the original Stunts 16bit compiler "Microsoft C 5.1" reveals this code

#include <string.h>

typedef char background_name_t[9];

const background_name_t background_names[5] =
{
 "desert",
 "tropical",
 "alpine",
 "city",
 "country"
};

int main(int argc, char* argv[])
{
 return strlen(background_names[argc]);
}

the generated assembler code for this small snipped looks very much like the original code (or can be tuned to look exact the same)

seg000:0010 ; int __cdecl main(int argc, const char **argv, const char **envp)
seg000:0010 _main          proc near              ; CODE XREF: start+8D␙p
seg000:0010
seg000:0010 arg_0          = word ptr  4
seg000:0010
seg000:0010                push    bp
seg000:0011                mov    bp, sp
seg000:0013                xor    ax, ax
seg000:0015                call    __chkstk

here is your original assembler code (ignoring cbw) as a result from my C/C++ code
seg000:0018                mov    ax, [bp+arg_0]
seg000:001B                mov    cx, ax
seg000:001D                shl    ax, 1
seg000:001F                shl    ax, 1
seg000:0021                shl    ax, 1
seg000:0023                add    ax, cx
seg000:0025                add    ax, offset aDesert ; "desert"
seg000:0028                push    ax              ; char *

seg000:0029                call    _strlen
seg000:002C                add    sp, 2
seg000:002F                pop    bp
seg000:0030                retn
seg000:0030 _main          endp

also the data-segment part of the background tables is 100% binary identical

dseg:003C                db  43h ; C
dseg:003D                db  6Fh ; o
dseg:003E                db  72h ; r
dseg:003F                db  70h ; p
dseg:0040                db  11h
dseg:0041                db    0
dseg:0042 aDesert        db 'desert',0          ; DATA XREF: _main+15␘o
dseg:0049                db    0
dseg:004A                db    0
dseg:004B                db  74h ; t
dseg:004C                db  72h ; r
dseg:004D                db  6Fh ; o
dseg:004E                db  70h ; p
dseg:004F                db  69h ; i
dseg:0050                db  63h ; c
dseg:0051                db  61h ; a
dseg:0052                db  6Ch ; l
dseg:0053                db    0
dseg:0054                db  61h ; a
dseg:0055                db  6Ch ; l
dseg:0056                db  70h ; p
dseg:0057                db  69h ; i
dseg:0058                db  6Eh ; n
dseg:0059                db  65h ; e
dseg:005A                db    0
dseg:005B                db    0
dseg:005C                db    0
dseg:005D                db  63h ; c
dseg:005E                db  69h ; i
dseg:005F                db  74h ; t
dseg:0060                db  79h ; y
dseg:0061                db    0
dseg:0062                db    0
dseg:0063                db    0
dseg:0064                db    0
dseg:0065                db    0
dseg:0066                db  63h ; c
dseg:0067                db  6Fh ; o
dseg:0068                db  75h ; u
dseg:0069                db  6Eh ; n
dseg:006A                db  74h ; t
dseg:006B                db  72h ; r
dseg:006C                db  79h ; y
dseg:006D                db    0
dseg:006E                db    0
dseg:006F                db    0
dseg:0070 word_105D0      dw 0                    ; DATA XREF: start+4A␘w
#30
Quote from: llm on October 14, 2022, 01:56:05 PMThere is a function that loads horizons. That function gets its filenames from Dseg.

how is that function called? be always precise :)