Reconstructing Wizard101 Source Code Using IDAPython

reverse-engineering Wizard101 Python IDA

Introduction

IDAPython is a powerful tool that can be used to automate tasks in IDA. In this post, I will demonstrate how you can use IDAPython to automate your reverse-engineering workflow. To do this, I will be using IDAPython to reconstruct the file hierarchy of Wizard101, a popular MMORPG developed by KingsIsle Entertainment, as well as rename all of the functions in the game.

Setting Up IDAPython

To do this, I’ll be writing my IDAPython scripts in Visual Studio Code using the Python extension. This extension allows you to have intellisense for IDAPython specifying the path of your IDA installation’s “python” folder.

Once we have Visual Studio Code and the Python extension, we can create a new folder for our Wizard101 scripts, and we’ll open it in Visual Studio Code.

Now that the folder is open, we’ll need to configure the Python extension to use IDAPython. To do this, we need to create a new folder named “.vscode” in our project directory, then we need to make a new file named “settings.json” under the new .vscode folder. In this file, we’ll add the following:

{
    "python.analysis.extraPaths": [
        "{INSERT YOUR IDA PATH HERE}/python/3"
    ]
}

Once we’ve done this, we can start writing our IDAPython scripts. Configuring the Python extension

Now that we’ve done this, the Python extension will be able to provide intellisense and code completion for IDAPython.

So, let’s get started! First, we’ll make a new file named “reconstructor.py” in our project directory. In this file, we’ll add the following skeleton code:

import idaapi
import idautils
import idc

class Reconstructor:
    def __init__(self):
        pass

    def reconstruct_files(self, base_directory):
        import os
        pass

    def rename_functions(self, comment_filenames=True):
        pass

if __name__ == "__main__":
    reconstructor = Reconstructor()
    reconstructor.rename_functions()
    reconstructor.reconstruct_files("D:\\wizard101-src\\")

As of now, this code does nothing. It simply provides the interface for us to use our reconstructor and gives us a base implementation of the functions.

If you’re following along, at this point your file structure should look like this:

settings.json

reconstructor.py

Now that we have an idea of how our code will be structured, let’s get to reversing so we can do some actual work!

Reversing Wizard101

Upon opening Wizard101 in IDA, it shouldn’t take very long for you to notice strings that look like this: Wizard101 strings

They’re what appear to be a file path, function name, and assertion message. What we’ve found are strings that are passed to Wizard101’s debug assertion function! Call to debug_assert

This is useful for us as all function’s that call debug_assert pass their full name, as well as the file that they’re located in to the function. This means that if we look at all functions that call debug_assert, we can get a list of all their names and the files they’re located in, and use those to reconstruct the source code and file hierarchy of Wizard101.

Getting a List of Functions

Now that we know what we need to do, let’s get started writing our script.

Firstly, we need the address of the debug_assert function, simply double click the function call in IDA, go to the disassembly, and copy the address on the left. Getting the address of debug_assert

Now, let’s modify the __init__ constructor of our Reconstructor class to take the address of the debug_assert function as a parameter.

class Reconstructor:
    def __init__(self, debug_assert_fn):
        pass

And the call to the constructor:

if __name__ == "__main__":
    DEBUG_ASSERT_ADDRESS = 0x141130E70
    reconstructor = Reconstructor(DEBUG_ASSERT_ADDRESS)
    reconstructor.rename_functions()
    reconstructor.reconstruct_files("D:\\wizard101-src\\")

Now, within the constructor we can use IDAPython’s idautils module to get a list of all functions that reference debug_assert:

class Reconstructor:
    def __init__(self, debug_assert_fn):
        # The following code iterates over all xrefs to `debug_assert` and if the xref is within a function, maps the start address of the function to the xref's address.
        # This is okay and works even if `debug_assert` is called multiple times in the function because the function will always only have one name and file path.
        referencing_fns = {
            fn.start_ea: x.frm
            for x in idautils.XrefsTo(debug_assert_fn)
            if (fn := idaapi.get_func(x.frm))
        }

Now that we have a dictionary storing a map of each function’s start address to the address of the debug_assert call, we can use this to get a list of function names in the game, let’s do that now.

Getting the Function Names

First, we have to walk through all of the calls and parse the parameters in the call, if they’re a static string then we’ll add them to the list of function and file names.

To simplify this, we’ll make 2 helper functions under Reconstructor.

One is named _map_files and takes a dictionary of function xrefs and maps them into a dictionary of filenames and function names, and the other is named _get_string and takes the address of an instruction that moves a string and returns the string being moved.

class Reconstructor:
    def _get_string(ea):
        # decodes the instruction at the given address
        instr = idautils.DecodeInstruction(ea)
        string_address = None

        # gets the name of the instruction
        match instr.get_canon_mnem():
            case "mov":
                match instr.Op2.type:
                    case 0x2:  # Memory operand
                        # string_address is the value of the memory operand
                        string_address = instr.Op2.value
                    case 0x1:  # Register
                        # since the address of the string is stored in a register, we need to find the instruction that moves the string into the register
                        # we'll do this by iterating over all instructions before the current instruction and checking if they move a string into the register
                        reg_number = instr.Op2.reg

                        instructions = []
                        prev_ea = ea
                        fn_start = idaapi.get_func(ea).start_ea

                        while True:
                            # decodes the instruction before the current instruction
                            instruction = idautils.DecodePreviousInstruction(prev_ea)
                            
                            # if it fails to decode, or the instruction is before the start of the function, we'll break out of the loop
                            # this means the instruction is not in the function and the address of the string comes from somewhere else
                            if not instruction or instruction.ea < fn_start:
                                break

                            prev_ea = instruction.ea

                            instructions.append(instruction)

                        for instr in instructions:
                            # if the instruction is a lea to our register number and the second operand is a memory operand, then the memory operand is the string address
                            if (
                                instr.get_canon_mnem() == "lea"
                                and instr.Op1.reg == reg_number
                                and instr.Op2.type == 0x2
                            ):
                                string_address = instr.Op2.addr
                                break
            case "lea":
                string_address = instr.Op2.addr

        if not string_address:
            print(f"Failed to locate string reference at {hex(ea)}")
            return None

        # reads the string at the string address and converts it into a python string
        return idaapi.get_strlit_contents(
            string_address,
            idaapi.get_max_strlit_length(string_address, idaapi.STRTYPE_C),
            idaapi.STRTYPE_C,
        ).decode("utf-8")

    def _map_files(xrefs):
        files = {}
        for fn, xref in xrefs.items():
            # gets a list of addresses to the last instructions that modified the parameters
            arg_locations = idaapi.get_arg_addrs(xref)

            # make sure that the function succeeded and there are at least 7 parameters, as debug_assert has many parameters.
            if arg_locations and len(arg_locations) > 6:
                # the name of the function is the 6th parameter
                function_name = Reconstructor._get_string(arg_locations[5])

                # the file path is the 7th parameter
                file_path = Reconstructor._get_string(arg_locations[6])

                if function_name and file_path:
                    if not file_path in files:
                        files[file_path] = {}

                    files[file_path][function_name] = fn
        
        return files

Reconstructing the File Hierarchy

Now that we have the core logic for our reconstructor, lets actually implement the reconstruction.

class Reconstructor:
    def __init__(self, debug_assert_fn):
        # The following code iterates over all xrefs to `debug_assert` and if the xref is within a function, maps the start address of the function to the xref's address.
        # This is okay and works even if `debug_assert` is called multiple times in the function because the function will always only have one name and file path.
        referencing_fns = {
            fn.start_ea: x.frm
            for x in idautils.XrefsTo(debug_assert_fn)
            if (fn := idaapi.get_func(x.frm))
        }

        self.files = Reconstructor._map_files(referencing_fns)

    def reconstruct_files(self, base_directory):
        import os
        for file, functions in self.files.items():
            # when looking at calls to debug_assert, you can see that all of the file paths are prefixed with "C:\Code\Wizard101\WizardDev\"
            # we don't want their junk in the file path so we'll make it relative.
            rel_path = file.replace("C:\\Code\\Wizard101\\WizardDev\\", "")

            file_path = os.path.join(base_directory, rel_path)

            # gets the directory of the file path
            file_directory = file_path[: file_path.rfind("\\")]

            # creates the directory if it doesn't exist
            if not os.path.exists(file_directory):
                os.makedirs(file_directory)

            with open(file_path, "w") as f:
                # list of decompiled function sources
                funcs = []
                for fn in functions.values():
                    try:
                        # decompiles the function
                        decompiled = idaapi.decompile(fn)

                        if decompiled:
                            # add it to the list of decompiled funcs
                            funcs.append(str(decompiled))
                    except idaapi.DecompilationFailure:
                        print(f"Failed to decompile function at {hex(fn)}")
                        continue

                    # write all decompiled functions to the file, separating them by 2 newlines.
                    f.write("\n\n".join(funcs))

    def rename_functions(self, comment_filenames=True):
        for file, functions in self.files.items():
            for fn_name, fn_ea in functions.items():
                # renames the function
                idc.set_name(fn_ea, fn_name, idaapi.SN_NOCHECK | idaapi.SN_FORCE)

                if comment_filenames:
                    # comments the function with the file name
                    idc.set_func_cmt(fn_ea, file, 1)

Conclusion

Now that we have our reconstructor, we can run it and see what happens!

As you can see, we’ve successfully reconstructed the file hierarchy of Wizard101, and renamed most of the function’s in the IDB. Stay tuned for the next post where we’ll use these named functions to reverse Wizard101’s networking, and enable their debug Lua API.

Source Code

You can find the source code to this post’s IDAPython script here.