Reconstructing Wizard101 Source Code Using IDAPython

By AmJayden

Introduction

IDAPython is a powerful tool that can be used to automate tasks in IDA. In this post, I will demonstrate how you can use IDAPython to automate your reverse-engineering workflow. To do this, I will be using IDAPython to reconstruct the file hierarchy of Wizard101, a popular MMORPG developed by KingsIsle Entertainment, as well as rename all of the functions in the game.

Setting Up IDAPython

To do this, I’ll be writing my IDAPython scripts in Visual Studio Code using the Python extension. This extension allows you to have intellisense for IDAPython specifying the path of your IDA installation’s “python” folder.

Once we have Visual Studio Code and the Python extension, we can create a new folder for our Wizard101 scripts, and we’ll open it in Visual Studio Code.

Setting up our scripts folder

Now that the folder is open, we’ll need to configure the Python extension to use IDAPython. To do this, we need to create a new folder named “.vscode” in our project directory, then we need to make a new file named “settings.json” under the new .vscode folder. In this file, we’ll add the following:

{
    "python.analysis.extraPaths": [
        "{INSERT YOUR IDA PATH HERE}/python/3"
    ]
}

Once we’ve done this, we can start writing our IDAPython scripts. Configuring the Python extension

Now that we’ve done this, the Python extension will be able to provide intellisense and code completion for IDAPython.

So, let’s get started! First, we’ll make a new file named “reconstructor.py” in our project directory. In this file, we’ll add the following skeleton code:

import idaapi
import idautils
import idc

class Reconstructor:
    def __init__(self):
        pass

    def reconstruct_files(self, base_directory):
        import os
        pass

    def rename_functions(self, comment_filenames=True):
        pass

if __name__ == "__main__":
    reconstructor = Reconstructor()
    reconstructor.rename_functions()
    reconstructor.reconstruct_files("D:\\wizard101-src\\")

As of now, this code does nothing. It simply provides the interface for us to use our reconstructor and gives us a base implementation of the functions.

If you’re following along, at this point your file structure should look like this:

    • settings.json
  • reconstructor.py
  • Now that we have an idea of how our code will be structured, let’s get to reversing so we can do some actual work!

    Reversing Wizard101

    Upon opening Wizard101 in IDA, it shouldn’t take very long for you to notice strings that look like this: Wizard101 strings

    They’re what appear to be a file path, function name, and assertion message. What we’ve found are strings that are passed to Wizard101’s debug assertion function! Call to debug_assert

    This is useful for us as all function’s that call debug_assert pass their full name, as well as the file that they’re located in to the function. This means that if we look at all functions that call debug_assert, we can get a list of all their names and the files they’re located in, and use those to reconstruct the source code and file hierarchy of Wizard101.

    Getting a List of Functions

    Now that we know what we need to do, let’s get started writing our script.

    Firstly, we need the address of the debug_assert function, simply double click the function call in IDA, go to the disassembly, and copy the address on the left. Getting the address of debug_assert

    Now, let’s modify the __init__ constructor of our Reconstructor class to take the address of the debug_assert function as a parameter.

    class Reconstructor:
        def __init__(self, debug_assert_fn):
            pass

    And the call to the constructor:

    if __name__ == "__main__":
        DEBUG_ASSERT_ADDRESS = 0x141130E70
        reconstructor = Reconstructor(DEBUG_ASSERT_ADDRESS)
        reconstructor.rename_functions()
        reconstructor.reconstruct_files("D:\\wizard101-src\\")

    Now, within the constructor we can use IDAPython’s idautils module to get a list of all functions that reference debug_assert:

    class Reconstructor:
        def __init__(self, debug_assert_fn):
            # The following code iterates over all xrefs to `debug_assert` and if the xref is within a function, maps the start address of the function to the xref's address.
            # This is okay and works even if `debug_assert` is called multiple times in the function because the function will always only have one name and file path.
            referencing_fns = {
                fn.start_ea: x.frm
                for x in idautils.XrefsTo(debug_assert_fn)
                if (fn := idaapi.get_func(x.frm))
            }

    Now that we have a dictionary storing a map of each function’s start address to the address of the debug_assert call, we can use this to get a list of function names in the game, let’s do that now.

    Getting the Function Names

    First, we have to walk through all of the calls and parse the parameters in the call, if they’re a static string then we’ll add them to the list of function and file names.

    To simplify this, we’ll make 2 helper functions under Reconstructor.

    One is named _map_files and takes a dictionary of function xrefs and maps them into a dictionary of filenames and function names, and the other is named _get_string and takes the address of an instruction that moves a string and returns the string being moved.

    class Reconstructor:
        def _get_string(ea):
            # decodes the instruction at the given address
            instr = idautils.DecodeInstruction(ea)
            string_address = None
    
            # gets the name of the instruction
            match instr.get_canon_mnem():
                case "mov":
                    match instr.Op2.type:
                        case 0x2:  # Memory operand
                            # string_address is the value of the memory operand
                            string_address = instr.Op2.value
                        case 0x1:  # Register
                            # since the address of the string is stored in a register, we need to find the instruction that moves the string into the register
                            # we'll do this by iterating over all instructions before the current instruction and checking if they move a string into the register
                            reg_number = instr.Op2.reg
    
                            instructions = []
                            prev_ea = ea
                            fn_start = idaapi.get_func(ea).start_ea
    
                            while True:
                                # decodes the instruction before the current instruction
                                instruction = idautils.DecodePreviousInstruction(prev_ea)
                                
                                # if it fails to decode, or the instruction is before the start of the function, we'll break out of the loop
                                # this means the instruction is not in the function and the address of the string comes from somewhere else
                                if not instruction or instruction.ea < fn_start:
                                    break
    
                                prev_ea = instruction.ea
    
                                instructions.append(instruction)
    
                            for instr in instructions:
                                # if the instruction is a lea to our register number and the second operand is a memory operand, then the memory operand is the string address
                                if (
                                    instr.get_canon_mnem() == "lea"
                                    and instr.Op1.reg == reg_number
                                    and instr.Op2.type == 0x2
                                ):
                                    string_address = instr.Op2.addr
                                    break
                case "lea":
                    string_address = instr.Op2.addr
    
            if not string_address:
                print(f"Failed to locate string reference at {hex(ea)}")
                return None
    
            # reads the string at the string address and converts it into a python string
            return idaapi.get_strlit_contents(
                string_address,
                idaapi.get_max_strlit_length(string_address, idaapi.STRTYPE_C),
                idaapi.STRTYPE_C,
            ).decode("utf-8")
    
        def _map_files(xrefs):
            files = {}
            for fn, xref in xrefs.items():
                # gets a list of addresses to the last instructions that modified the parameters
                arg_locations = idaapi.get_arg_addrs(xref)
    
                # make sure that the function succeeded and there are at least 7 parameters, as debug_assert has many parameters.
                if arg_locations and len(arg_locations) > 6:
                    # the name of the function is the 6th parameter
                    function_name = Reconstructor._get_string(arg_locations[5])
    
                    # the file path is the 7th parameter
                    file_path = Reconstructor._get_string(arg_locations[6])
    
                    if function_name and file_path:
                        if not file_path in files:
                            files[file_path] = {}
    
                        files[file_path][function_name] = fn
            
            return files

    Reconstructing the File Hierarchy

    Now that we have the core logic for our reconstructor, lets actually implement the reconstruction.

    class Reconstructor:
        def __init__(self, debug_assert_fn):
            # The following code iterates over all xrefs to `debug_assert` and if the xref is within a function, maps the start address of the function to the xref's address.
            # This is okay and works even if `debug_assert` is called multiple times in the function because the function will always only have one name and file path.
            referencing_fns = {
                fn.start_ea: x.frm
                for x in idautils.XrefsTo(debug_assert_fn)
                if (fn := idaapi.get_func(x.frm))
            }
    
            self.files = Reconstructor._map_files(referencing_fns)
    
        def reconstruct_files(self, base_directory):
            import os
            for file, functions in self.files.items():
                # when looking at calls to debug_assert, you can see that all of the file paths are prefixed with "C:\Code\Wizard101\WizardDev\"
                # we don't want their junk in the file path so we'll make it relative.
                rel_path = file.replace("C:\\Code\\Wizard101\\WizardDev\\", "")
    
                file_path = os.path.join(base_directory, rel_path)
    
                # gets the directory of the file path
                file_directory = file_path[: file_path.rfind("\\")]
    
                # creates the directory if it doesn't exist
                if not os.path.exists(file_directory):
                    os.makedirs(file_directory)
    
                with open(file_path, "w") as f:
                    # list of decompiled function sources
                    funcs = []
                    for fn in functions.values():
                        try:
                            # decompiles the function
                            decompiled = idaapi.decompile(fn)
    
                            if decompiled:
                                # add it to the list of decompiled funcs
                                funcs.append(str(decompiled))
                        except idaapi.DecompilationFailure:
                            print(f"Failed to decompile function at {hex(fn)}")
                            continue
    
                        # write all decompiled functions to the file, separating them by 2 newlines.
                        f.write("\n\n".join(funcs))
    
        def rename_functions(self, comment_filenames=True):
            for file, functions in self.files.items():
                for fn_name, fn_ea in functions.items():
                    # renames the function
                    idc.set_name(fn_ea, fn_name, idaapi.SN_NOCHECK | idaapi.SN_FORCE)
    
                    if comment_filenames:
                        # comments the function with the file name
                        idc.set_func_cmt(fn_ea, file, 1)

    Conclusion

    Now that we have our reconstructor, we can run it and see what happens!

    Reconstructing Wizard101

    As you can see, we’ve successfully reconstructed the file hierarchy of Wizard101, and renamed most of the function’s in the IDB. Stay tuned for the next post where we’ll use these named functions to reverse Wizard101’s networking, and enable their debug Lua API.

    Source Code

    You can find the source code to this post’s IDAPython script here.