Tuesday, December 3, 2013

Binwally: Directory tree diff tool using Fuzzy Hashing

For this post, I'll discuss about the concept of directory tree and binary diffing and how it could be used to find potential vulnerabilities and security issues that were (silently) patched on firmware images.

Silent patching is a big deal as we don't have many security researchers like Spender around. This is a common practice among companies that create software and firmwares for embedded devices. Changelogs from new firmwares often contains few information about security issues, outlining the changes as "bugfixes" or "enhancements": we get no CVE's and we don't know how critical the flaws are.

In addition to that, you may occasionally find some reference for the string 'Ac1db1tch3z' on your code (which means that you got a free vulnerability assessment) or your employee Joel might forget to remove a backdoor from the firmware. Diffing the content from previous firmwares may be useful to find out when these backdoors were first installed, modified and/or removed.

I introduce you to Binwally: a simple script to perform directory tree diffing using the concept of Fuzzy Hashing (ssdeep) to define a matching score between binaries.

Binwally says "no" to Silent Patching

Fuzzy Hashing

Fuzzy Hashing, also know as context triggered piecewise hashes (CTPH), can match inputs that have homologies. Such inputs have sequences of identical bytes in the same order, although bytes in between these sequences may be different in both content and length. The concept was introduced by Andrew Tridgell and the most well-known tool is ssdeep, created by Jesse Kornblum.

The usage example outlined on ssdeep's homepage summarizes it well:
$ ls -l foo.txt
-rw-r--r--   1 jessekor  jessekor  240 Oct 25 08:01 foo.txt
$ cp foo.txt bar.txt
$ echo 1 >> bar.txt

A cryptographic hashing algorithm like MD5 can't be used to match these files; they have wildly different hashes.
$ md5deep foo.txt bar.txt
7b3e9e08ecc391f2da684dd784c5af7c  /Users/jessekornblum/foo.txt
32436c952f0f4c53bea1dc955a081de4  /Users/jessekornblum/bar.txt

But fuzzy hashing can! We compute the fuzzy hash of one file and use the matching mode to match the other one.
$ ssdeep -b foo.txt > hashes.txt
$ ssdeep -bm hashes.txt bar.txt
bar.txt matches foo.txt (64)

The number at the end of the line is a match score, or a weighted measure of how similar these files are. The higher the number, the more similar the files.

Binwally

Binwally is a simple Python script that uses this concept to diff directory trees in order to find different, unique and matching files, displaying an overall score of the results. It was based on diffall.py from the book Programming Python (4th Edition) and it requires python-ssdeep, a wrapper for ssdeep (which is coded in C). You can download the script from my Github, following the link below:


The code is pretty straightforward, it takes two dirs/files as arguments and displays which files are unique, the ones that matches and the ones that differs and their match score. It still needs some improvement (the matching score is based on the number of files and don't consider the filesizes for example) but it works fine for what it purposes to accomplish.


Comparing two directory trees from a firmware unsquased using Binwalk and firmware-mod-kit:


You can already achieve this using Winmerge, but the tool does not display a matching score, it's not command line based and not scriptable. You can check my previous post describing how to use it to differ firmware images.

Binwally is best used with Binwalk, that's why I'll talk to devttys0 to merge it with his tool (maybe a new command line switch under the Binary Diffing options). Binwalk already supports binary diffing (-H switch), but it will just compare files and firmware images. The problem is that firmware images are usually packed, encrypted and/or compressed. When you unpack and compare the extracted files and their directory tree, you have much more valuable information. If you disassemble the code and compare the results again, you get even better data - this is what bindiff from Zynamics/Google does pretty well. The Insinuator blog has a nice example on how to use bindiff for RE.

Binwally Usage: Dissecting DLink Backdoor Patch

So you may have heard recently that some DLink routers had a backdoor and that a security update was issued to address the vulnerability.

According to Bruce Schneier, we should "Trust but verify": that's what we are going to do here. First let's download the backdoored version (v1.13) and the patched version (1.14) from DLink's FTP. Next step is to extract the firmware images (binwalk -e DIR100A1_FW114WWB02.bix DIR100_v5.0.0EUb3_patch02.bix) and compare the directory trees using Binwally:

$ python binwally.py _DIR100_v5.0.0EUb3_patch02.bix.extracted/ _DIR100A1_FW114WWB02.bix.extracted/


I removed the matching files and symlinks for better reading, but the analysis is now narrowed to a small set of files. According to the release notes, a minor PPoE dial up issue was also fixed, that may be the reason why "/bin/pppd" had differences.

Some files like the "/www/Home/bsc_lan.htm" have a matching score of 100 even though they have different content and MD5, for example. This is due to the nature of Fuzzy Hashing, as the small modification was not enough to change the fuzzy hash value. It's important to note that files with a "match" result do actually have the same content and also have a matching score of 100.


There's a new Shell script on the patched 1.14 firmware, located at "/etc/wdhttp.sh". It seems that Joel "do not know how to write sash loop command ugly code":


Busybox was another binary that had a different pattern. Running them using QEMU shows that they still have the same version (v1.0.0-pre2) and different compile dates (2011.09.15 and 2013.10.31).



According to the analysis from devttys0, the binary "/bin/webs" had the backdoor function (if you did not read his analysis yet, read it here). Binwally returned a match score of 0 because it was unable to find similar patterns. The binaries have different sizes and were probably compiled using different toolchains, containing different offsets, as displayed on the diff from Winmerge:


Binwalk from v1.3.0 beta on now displays 3D binary data visualization, so let's have a look on how they differ in a 3D plane:


This is time to use an approach other than byte comparison and fuzzy hashing. Bindiff uses graph-theoretical approach to compare executables by identifying identical and similar functions. We first need to analyze both files using IDA to create the needed IDB files. After inputting both files on bindiff, we notice a high level of similarity on the Call Graphs:


Let's focus on the previously backdoored function "alpha_auth_check":


We can easily spot the difference displaying the flow graph:


Zooming in (courtesy of NSA):



It seems that Joel's "xmlset_roodkcableoj28840ybtide" is gone, say hello to "iNteLalsEtvaLuewitHoutnAme". And yes, it seems that Joel (and the binaries that can re-configure the device's settings) can only access the device from 127.0.0.1 now  =)


Conclusion

Binary and directory tree diffing is a powerful tool for reverse engineering and to find potential compromise of a system as long as you have a "known template". In the context of Embedded Systems, it reveals modified files, settings and directories, narrowing the analysis to a small set of data when analyzing different firmware images.

To all the vendors out there it's important to be transparent on what's being fixed, alerting the end-users about how critical the issues are. And please, leave the backdooring job to the guys who "read the constitution" and are paid for that, OK?