Decoding Corrupt BASE64 Strings, (Sun, Sep 27th)

I was asked to help with the decoding of a BASE64 string that my tool could not handle.

The issue was the following: this particular BASE64 string was corrupt, its length was not a multiple of 4. In BASE64, 64 characters are used to do the encoding: each BASE64 character represents 6 bits. When one byte (8 bits) is encoded, 2 BASE64 characters are needed (6 + 2 bytes). To indicate that the last 4 bits of the second BASE64 character should be discarded, 2 padding characters are added (==).

For example, the ASCII character I (8 bits) is represented by 2 BASE64 characters (SQ) followed by 2 padding characters (==). This gives SQ==: 4 bytes long.

When 2 bytes are encoded (16 bits), 3 BASE64 characters are needed (3 * 6 = 18 bits) and 2 bits should be discarded (one padding character =), thus 4 characters are used.

And when 3 bytes are encoded (24 bits), 4 base64 characters are needed (4 * 6 = 24 bits).

Conclusion: valid BASE64 strings have a length that is a multiple of 4.


My tool can handle BASE64 strings that have a length that is not a multiple of 4.

Here is an example. BASE64 string 12345678 is 8 characters long: is able to recognize this BASE64 string, and decode it.

Let’s add one character, resulting in a BASE64 string with a length that is not a multiple of 4 (length of 9 characters): does not recognize this BASE64 string.

We can help to recognize this string, by using option -p. This option takes a Python function, that will be used to process the detected strings before they are decoded. In this case, we will use Python function L4, a function I defined: it truncates strings to a length that is a multiple of 4.

Using this function L4 with option -p, we can decode the corrupt string:

Didier Stevens
Senior handler
Microsoft MVP

(c) SANS Internet Storm Center. Creative Commons Attribution-Noncommercial 3.0 United States License.

Reposted from SANS. View original.