Reconstructing Wizard101 Source Code Using IDAPython
By AmJayden
Introduction
IDAPython is a powerful tool that can be used to automate tasks in IDA. In this post, I will demonstrate how you can use IDAPython to automate your reverse-engineering workflow. To do this, I will be using IDAPython to reconstruct the file hierarchy of Wizard101, a popular MMORPG developed by KingsIsle Entertainment, as well as rename all of the functions in the game.
Setting Up IDAPython
To do this, I’ll be writing my IDAPython scripts in Visual Studio Code using the Python extension. This extension allows you to have intellisense for IDAPython specifying the path of your IDA installation’s “python” folder.
Once we have Visual Studio Code and the Python extension, we can create a new folder for our Wizard101 scripts, and we’ll open it in Visual Studio Code.
Now that the folder is open, we’ll need to configure the Python extension to use IDAPython. To do this, we need to create a new folder named “.vscode” in our project directory, then we need to make a new file named “settings.json” under the new .vscode folder. In this file, we’ll add the following:
{
"python.analysis.extraPaths": [
"{INSERT YOUR IDA PATH HERE}/python/3"
]
}
Once we’ve done this, we can start writing our IDAPython scripts.
Now that we’ve done this, the Python extension will be able to provide intellisense and code completion for IDAPython.
So, let’s get started! First, we’ll make a new file named “reconstructor.py” in our project directory. In this file, we’ll add the following skeleton code:
import idaapi
import idautils
import idc
class Reconstructor:
def __init__(self):
pass
def reconstruct_files(self, base_directory):
import os
pass
def rename_functions(self, comment_filenames=True):
pass
if __name__ == "__main__":
reconstructor = Reconstructor()
reconstructor.rename_functions()
reconstructor.reconstruct_files("D:\\wizard101-src\\")
As of now, this code does nothing. It simply provides the interface for us to use our reconstructor and gives us a base implementation of the functions.
If you’re following along, at this point your file structure should look like this:
- settings.json
Now that we have an idea of how our code will be structured, let’s get to reversing so we can do some actual work!
Reversing Wizard101
Upon opening Wizard101 in IDA, it shouldn’t take very long for you to notice strings that look like this:
They’re what appear to be a file path, function name, and assertion message. What we’ve found are strings that are passed to Wizard101’s debug assertion function!
This is useful for us as all function’s that call debug_assert
pass their full name, as well as the file that they’re located in to the function. This means that if we look at all functions that call debug_assert
, we can get a list of all their names and the files they’re located in, and use those to reconstruct the source code and file hierarchy of Wizard101.
Getting a List of Functions
Now that we know what we need to do, let’s get started writing our script.
Firstly, we need the address of the debug_assert
function, simply double click the function call in IDA, go to the disassembly, and copy the address on the left.
Now, let’s modify the __init__
constructor of our Reconstructor
class to take the address of the debug_assert
function as a parameter.
class Reconstructor:
def __init__(self, debug_assert_fn):
pass
And the call to the constructor:
if __name__ == "__main__":
DEBUG_ASSERT_ADDRESS = 0x141130E70
reconstructor = Reconstructor(DEBUG_ASSERT_ADDRESS)
reconstructor.rename_functions()
reconstructor.reconstruct_files("D:\\wizard101-src\\")
Now, within the constructor we can use IDAPython’s idautils
module to get a list of all functions that reference debug_assert
:
class Reconstructor:
def __init__(self, debug_assert_fn):
# The following code iterates over all xrefs to `debug_assert` and if the xref is within a function, maps the start address of the function to the xref's address.
# This is okay and works even if `debug_assert` is called multiple times in the function because the function will always only have one name and file path.
referencing_fns = {
fn.start_ea: x.frm
for x in idautils.XrefsTo(debug_assert_fn)
if (fn := idaapi.get_func(x.frm))
}
Now that we have a dictionary storing a map of each function’s start address to the address of the debug_assert
call, we can use this to get a list of function names in the game, let’s do that now.
Getting the Function Names
First, we have to walk through all of the calls and parse the parameters in the call, if they’re a static string then we’ll add them to the list of function and file names.
To simplify this, we’ll make 2 helper functions under Reconstructor
.
One is named _map_files
and takes a dictionary of function xrefs and maps them into a dictionary of filenames and function names, and the other is named _get_string
and takes the address of an instruction that moves a string and returns the string being moved.
class Reconstructor:
def _get_string(ea):
# decodes the instruction at the given address
instr = idautils.DecodeInstruction(ea)
string_address = None
# gets the name of the instruction
match instr.get_canon_mnem():
case "mov":
match instr.Op2.type:
case 0x2: # Memory operand
# string_address is the value of the memory operand
string_address = instr.Op2.value
case 0x1: # Register
# since the address of the string is stored in a register, we need to find the instruction that moves the string into the register
# we'll do this by iterating over all instructions before the current instruction and checking if they move a string into the register
reg_number = instr.Op2.reg
instructions = []
prev_ea = ea
fn_start = idaapi.get_func(ea).start_ea
while True:
# decodes the instruction before the current instruction
instruction = idautils.DecodePreviousInstruction(prev_ea)
# if it fails to decode, or the instruction is before the start of the function, we'll break out of the loop
# this means the instruction is not in the function and the address of the string comes from somewhere else
if not instruction or instruction.ea < fn_start:
break
prev_ea = instruction.ea
instructions.append(instruction)
for instr in instructions:
# if the instruction is a lea to our register number and the second operand is a memory operand, then the memory operand is the string address
if (
instr.get_canon_mnem() == "lea"
and instr.Op1.reg == reg_number
and instr.Op2.type == 0x2
):
string_address = instr.Op2.addr
break
case "lea":
string_address = instr.Op2.addr
if not string_address:
print(f"Failed to locate string reference at {hex(ea)}")
return None
# reads the string at the string address and converts it into a python string
return idaapi.get_strlit_contents(
string_address,
idaapi.get_max_strlit_length(string_address, idaapi.STRTYPE_C),
idaapi.STRTYPE_C,
).decode("utf-8")
def _map_files(xrefs):
files = {}
for fn, xref in xrefs.items():
# gets a list of addresses to the last instructions that modified the parameters
arg_locations = idaapi.get_arg_addrs(xref)
# make sure that the function succeeded and there are at least 7 parameters, as debug_assert has many parameters.
if arg_locations and len(arg_locations) > 6:
# the name of the function is the 6th parameter
function_name = Reconstructor._get_string(arg_locations[5])
# the file path is the 7th parameter
file_path = Reconstructor._get_string(arg_locations[6])
if function_name and file_path:
if not file_path in files:
files[file_path] = {}
files[file_path][function_name] = fn
return files
Reconstructing the File Hierarchy
Now that we have the core logic for our reconstructor, lets actually implement the reconstruction.
class Reconstructor:
def __init__(self, debug_assert_fn):
# The following code iterates over all xrefs to `debug_assert` and if the xref is within a function, maps the start address of the function to the xref's address.
# This is okay and works even if `debug_assert` is called multiple times in the function because the function will always only have one name and file path.
referencing_fns = {
fn.start_ea: x.frm
for x in idautils.XrefsTo(debug_assert_fn)
if (fn := idaapi.get_func(x.frm))
}
self.files = Reconstructor._map_files(referencing_fns)
def reconstruct_files(self, base_directory):
import os
for file, functions in self.files.items():
# when looking at calls to debug_assert, you can see that all of the file paths are prefixed with "C:\Code\Wizard101\WizardDev\"
# we don't want their junk in the file path so we'll make it relative.
rel_path = file.replace("C:\\Code\\Wizard101\\WizardDev\\", "")
file_path = os.path.join(base_directory, rel_path)
# gets the directory of the file path
file_directory = file_path[: file_path.rfind("\\")]
# creates the directory if it doesn't exist
if not os.path.exists(file_directory):
os.makedirs(file_directory)
with open(file_path, "w") as f:
# list of decompiled function sources
funcs = []
for fn in functions.values():
try:
# decompiles the function
decompiled = idaapi.decompile(fn)
if decompiled:
# add it to the list of decompiled funcs
funcs.append(str(decompiled))
except idaapi.DecompilationFailure:
print(f"Failed to decompile function at {hex(fn)}")
continue
# write all decompiled functions to the file, separating them by 2 newlines.
f.write("\n\n".join(funcs))
def rename_functions(self, comment_filenames=True):
for file, functions in self.files.items():
for fn_name, fn_ea in functions.items():
# renames the function
idc.set_name(fn_ea, fn_name, idaapi.SN_NOCHECK | idaapi.SN_FORCE)
if comment_filenames:
# comments the function with the file name
idc.set_func_cmt(fn_ea, file, 1)
Conclusion
Now that we have our reconstructor, we can run it and see what happens!
As you can see, we’ve successfully reconstructed the file hierarchy of Wizard101, and renamed most of the function’s in the IDB. Stay tuned for the next post where we’ll use these named functions to reverse Wizard101’s networking, and enable their debug Lua API.
Source Code
You can find the source code to this post’s IDAPython script here.