Format String Bugs

By now, its time for a break from buffer overflows. In this tutorial, we'll be discussing a different sort of vulnerability called format string bugs. A format string is an array of characters (buffer) used internally by a program to format information. For example, if you were writing a spreadsheet-style program, it would be nice to format the output in clean rows and columns. Format strings and their associated functions in stdio.h allow programmers to do this sort of thing in comfort. However, if the coder makes a mistake and allows that format string to fall under user (and thus attacker) control, very bad things can happen.

The vulnerable program source code


Function Arguments

As usual we'll start with a little theory and put it to use later. You should have a pretty good grasp on how stack frames and function calls work by now. If not, then have another look at the other tutorials and stare at those pretty diagrams a bit more. There is an element of those stack frames that I haven't talked about much: arguments. Arguments are addresses and values that are fed to a function when it is called. The easiest way to think about these is that they are pushed onto the top of the stack in right-to-left order (when looking at the call in C). This happens right before the function is called. Once the function is given control of the program's execution flow, it pops these values/addresses off the stack and uses them for whatever purpose was intended.

The truth about how this works is just slightly more complicated. Lets say funcA calls funcB. When the code compiles, the compiler examines all of the function calls made in funcA and notes the call with the most arguments. This is used to determine where the top of the stack should be when funcA is called in the first place. By doing this, it makes room for any function call arguments without them having to push /pop and move the ESP(pointer to top of the stack). Instead of pushing arguments, the values/addresses are simply moved to locations on the stack in reference to ESP. My guess is that doing it this way is less computationally expensive than actually pushing/popping everything to facilitate function calls. Also, be aware that there will likely be some padding between this argument space and the function's local variables.



The Target

Today's vulnerable program is pretty similar to the one in the first tutorial. First, it generates a random number between 0 and 99 inclusive called magicNumber. Next, it asks the user for their name and says hello. Finally, it asks for an "access code", converts what the user types into an integer, and stores it in the variable userCode. At the end, it compares the user supplied access code with the random number to see if they match. An attacker has a 1 in 100 chance of guessing the random number and winning; lets try to do better.

You also may have noticed that this program is a little beefed up in terms of security. The order in which the local variables are declared means that we can't use that nice buffer to overflow and change magicNumber. Also, the keyboard input function is different: fgets asks the user for input but checks the bounds of it. So that means, we won't be able to overflow anything here. We'll have to use a different trick to break this app.



Format Strings

Before we go hacking this code up, we have to go over what format strings are. From now on, I'll be discussing them as they relate to the printf function. Be aware that other function use format strings, but printf is a very common context for them. Printf takes at least one argument: a pointer to the format string. It then uses this string to determine if it needs more arguments. The printf function walks through the string one character at a time and outputs what it finds to screen. For example, if the string is "Hello", then it will write that the screen. There are some special characters that make the format string what it is. Whenever it encounters a "%" (percent) symbol, it will the following character non-literally. There are several of these formatting characters and each one performs a different function.

For example, "%s" tells printf to pop another argument off the stack as a pointer and print the string found at that address. If it encounters "%i" in the format string, printf will pop an integer off the stack and print it to the screen. Other characters like "%d", "%n", %x", etc.. are used for popping different kinds of values and addresses off the stack and displaying them to the user in various ways. This function is very handy for debugging and showing the values stored in variables as well. So if we declare the local variable test as an integer and fill it with the value 42 and call printf("Hello %i", test); the program will proudly display Hello 42.



An Easy Mistake

Remember that as long as there are no formatting characters, printf will just literally print the characters in the format string. This means that if the programmer filled a buffer called testStr with Hello World and then called printf(testStr); it would print Hello World as expected. But just because it works, that doesn't mean its secure. What if the user were asked for keyboard input that was used to fill testStr? Because testStr is being used by the printf function as a format string, the user could insert his own formatting characters with malicious intent.

What if the user typed "Hello %i" when asked for input used to fill the testStr buffer? There are no more arguments left in the printf function call. The printf function will just pop an integer value off the stack and display it! This process can be repeated by adding more "%i" characters to the format string. This means that an attacker can keep reading values on the stack until he runs out of buffer space for his malicious format string. Remember that the stack frame for a function that calls printf lies directly beyond printf's arguments (and associated padding). This makes a function that calls printf using a format string under attacker control is particularly vulnerable to having its local variables accessed by that attacker.



Guessing the Access Code

When you're asked for a username, try typing the following:

AAAA %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x

This will cause the program to say Hello followed by AAAA, of course. Next, the printf function will encounter all those "%08x" formatting characters. There aren't any arguments left in the printf argument list, so it has no choice but just keeping popping values off the stack anyway. Each time, it pops off 4 bytes and displays them as a hexadecimal number. By looking at the program source, you can probably guess that the magicNumber lies right between userCode(value: 0xBBBBBBBB) and localStr(the above buffer) on the stack. So check out the output you generated with that format string and look for those values. Toward the end of the output, you'll see something like bbbbbbbb 0000000c 41414141. You might know that 0x41414141 is the hexadecimal representation of "AAAA". This combined with the presence of those 0xBB bytes makes it likely that our random number this time around is 0x0C. So convert it to decimal(12) and try it as the access code! It was correct and the program tells us that we win. There are even more powerful ways to use format strings, but being able read values from the stack without overflowing anything is a pretty good addition to our toolbox for now.