Interz0ne 5
From MRL Wiki
Interzone 5 - Stage 3
[edit] Overview
stage3 consisted of a binary running on the target host, 192.168.51.20. The binary listened on port 3000 and insecurely implemented a custom protocol.
[edit] Binary Analysis
Loading the binary into HTE and IDA indicated
that the binary would require IDA's graphical analysis tools to
understand the binary execution flow. Selecting the main
function and hitting F12 resulted in the following: Media:graph1.gdl
The critical instruction in the first block is cmp
[ebp+arg_0], which is testing the value of argc [1]. The
following instruction, jg short loc_80498ec, will print a
usage message if argc is <= 1. The use of jg (jump if greater)
instead of ja (jump if above) indicates that argc is defined as a
signed integer, which is to be expected.
The line between the usage message and the third block at the bottom seems to indicate that execution continues after the usage message is output. However, selecting the usage function and hitting F12 again allows us to examine the usage function: Media:graph2.gdl. We find that the exit function is called, which will terminate the program. We then call the usage function a terminal block, and continue our analysis of loc_80498EC.
Examining this basic block, we find that the numblock and block symbols are assigned to. However, selecting them and hitting the x key does not show any other references to them. The adjacent symbols, environ and children, are also not referenced via pointer (which would allow indirect access to numblock and block).
.bss:0804A200 public input_buffer .bss:0804A200 input_buffer db 800h dup(?) ; DATA XREF: authenticate+16�o .bss:0804A200 ; authenticate+40�o ... .bss:0804AA00 public environ .bss:0804AA00 environ dd ? ; DATA XREF: _start+15�w .bss:0804AA04 public block .bss:0804AA04 block dd ? ; DATA XREF: main+41�w .bss:0804AA08 public numblock .bss:0804AA08 numblock dd ? ; DATA XREF: main+36�w .bss:0804AA0C public children .bss:0804AA0C children dd ? .bss:0804AA10 public sig .bss:0804AA10 sig dd ? ; DATA XREF: _sighandler+6�w .bss:0804AA10 ; init:loc_8048F67�w ...
This means they are probably "write only" variables and will not affect the behavior of the program.
Then we examine this block:
.text:08049902 sub esp, 4 .text:08049905 push 0Ah .text:08049907 push 0 .text:08049909 mov eax, [ebp+arg_4] .text:0804990C add eax, 4 .text:0804990F push dword ptr [eax] .text:08049911 call _strtoul
We work backwards from the call, because function arguments are pushed onto the stack in reverse order [2]. Working backwards will allow us to reconstruct the arguments on the stack.
The first three instructions before the call affect eax, which is pushed onto the stack as the first argument to strtoul(). We reverse them as follows:
mov eax, [ebp+arg_4]
ebp+arg_4 is the second argument to main():
char * argv[]. This is a pointer to character pointers.
The brackets dereference the pointer, giving us argv[0], a pointer to
the name of the binary [3]. dia1.png This value goes into eax.
add eax, 4
This adds 4 to eax. Because argv[0] itself is a pointer, and a pointer is four bytes, eax is now argv[1], which is exactly four bytes after argv[0] in memory.
push dword ptr [eax]
This pushes eax onto the stack as the first argument to
strtoul(). We already established that eax is argv[1],
so the first argument to strtoul is argv[1]. The "dword
ptr" part is required by the assembly language as a reminder that eax
is a pointer.
The second and third push instructions push constant values onto the
stack as the second and third arguments to strtoul
(remember, we're working backwards from the call).
sub esp, 4
This is the compiler making space on the stack for something. To figure out why, we must compute the stack changes from the beginning of the basic block.
We do this by analyzing the block in the forward direction. dia3.png We find that up until the call, 0x10 == decimal 16 bytes have been allocated on the stack. Because the calling convention on this platform is "callee cleanup", the calling code is responsible for releasing any temporary stack storage allocated for arguments. Therefore, we expect at least 0xc == decimal 12 bytes to be released from the stack after the call.
.text:08049916 add esp, 10h
0x10 == 16 decimal bytes are released. This is to compensate for the
extra sub esp, 4 instruction: the compiler is "balancing
the books" by returning the stack to its original state.
But why did the compiler subtract those four bytes from esp in the first place? There are many possible reasons -- it could be trying to enforce 8 byte alignment or simply generating unoptimized code -- but figuring out which one isn't productive for reverse engineering. Analyzing what happens to the newly allocated areas of the stack, we find that those four bytes are never assigned to -- they contain garbage. We say that the stack has returned to its original state and move on. dia4.png
We now check to make sure that all expectations of a function call have been fulfilled:
Expectations for a function call:
- Valid and expected arguments are pushed onto the stack in the correct order.
- A call to the desired function is present.
- The stack is cleaned up after the call.
You will learn to apply these checks automatically when reading disassembly with practice. When these expectations are violated, you must double-check your work. If you still find the expectations are violated, there may be obfuscation present. When this occurs, you must go through each instruction in detail, keeping the processor state in mind, to determine the true behavior of the code.
So, the call looks like: strtoul(argv[1], 0, 0x0a).
Reading the strtoul documentation ("man strtoul"), we
find this converts the text pointed to by argv[1] into a number. The
0x0a == decimal 10 argument tells strtoul to treat
argv[1] as a base-10 number (as opposed to an octal or other base).
The return value of the call is the numeric version of argv[1], and
will be found in eax [4].
mov [ebp+var_E], ax
What's going on here? We first consider the source part of the
opcode: ax. ax is the lowest 16 bits of eax, and eax currently
contains a numeric version of argv[1]. But what about the high 16
bits? We look down several instructions, and see eax is only written
to by the lea eax, [ebp+var_28] instruction. So, the
high 16 bits get thrown away. The words "thrown away" here tell us
that the programmer has cast the unsigned long returned by
strtoul() to a 16 bit quantity.
Knowing that argv[1] is expected to fit in 16 bits, can you figure out what the argument is? Hint: look at what the usage message tells the user to give as the first argument to the program.
Applying these analysis techniques to the next several instructions, we find that some constants are put into some of the stack space "above" main().
[1] argc being the first argument to int main(int argc, char *argv[]).
[2] For FreeBSD ELF on x86. This may or may not be true on other platforms.
[3] argv[0] being a pointer to the name of the command, by convention. See the execve() man page for details.
[4] The return value is in eax on most x86-based platforms, unless the return value is more than four bytes. In this case, you should consult the ABI or calling convention documentation to determine how to interpret the return value.