Finding Bugs in the Steem Blockchain with Fuzz Testing

in #steem5 years ago (edited)
  • Using American Fuzzy Lop on the JSON parsing library contained in the Steem blockchain implementation found a latent bug.
  • Fortunately, this bug is not exploitable in practice, though it may cause Steem to incorrectly report end-of-file on nonconforming JSON input, or crash on a malformed configuration file.

What is Fuzz Testing?

Fuzz testing applies many different inputs to a program or function in order to explore its behavior, potentially inducing crashes or memory access violations. Some fuzz testers are coverage-driven and seek to maximize the number of paths through the code that they explore by instrumenting the code, and taking this coverage information into account when generating new inputs. American Fuzzy Lop (AFL) is a well-known fuzz testing framework, with many production bugs to its credit. It is often the standard by which new fuzz testing techniques are measured.

Steem is a "social blockchain" which allows users to upvote content that has been uploaded to the blockchain in the form of posts or comments. It provides a JSON-RPC based API that was the focus of this investigation; the actual peer-to-peer blockchain traffic uses a different representation. The Steem backend is written in C++ and makes heavy use of Boost libraries.

This article will explain the process of using AFL on a portion of the Steem source code, and investigate the bug discovered.

Fuzz-Testing the Steem JSON library

The source code for steemd, the Steem blockchain server implementation, is available at https://github.com/steemit/steem. Parsing libraries usually provide a fruitful entry point for fuzzing, so I located the json library, which resides in fc ("Fast-Compiling"), a library of utilities originally written by Dan Larimer. The original code resides here: https://github.com/bytemaster/fc, though many modifications have been made to the version in the Steem repository.

This test covers only a small, relatively accessible portion of the steemd implementation: only one file of about 800 lines, in a total code base of more than 200,000 lines.

1. Create an instrumented library

For this first investigation, we instrumented only the FC library. To create a build directory pointing at the FC subdirectory:

mkdir fuzz-testing
cd fuzz-testing
cmake ../steem/libraries/fc/

Within that directory, the C++ compiler should be switched to the instrumenting compiler provided with AFL:

//CXX compiler
CMAKE_CXX_COMPILER:FILEPATH=/usr/local/bin/afl-g++
//Flags used by the compiler during all build types.
CMAKE_CXX_FLAGS:STRING=-std=gnu++14

The library was successfully compiled in isolation. (Optional features of AFL that check for memory overruns were not used in this investigation.)

2. Create Harness Program

AFL operates on a program that accepts standard input, or a filename to read. The following test harness accepts a filename and uses the FC JSON parser on that file. No validation of the resulting object is attempted, and a parsing exception is acceptable--  as many of the examples AFL produces will not be valid JSON!

#include <fc/variant.hpp>
#include <fc/io/json.hpp>
#include <fc/filesystem.hpp>
#include <fc/exception/exception.hpp>
#include <iostream>

using fc::path;
using fc::json;
using fc::variant;

int main( int argc, char *argv[] ) {
  if ( argc < 2 ) {
      return 1;
  }
 
  path inputFile( argv[1] );
  try {
    variant v = json::from_file( inputFile );
  } catch ( fc::exception &e ) {
    std::cout << e.to_string() << "\n";
    return 1;
  }
  return 0;
}

Harness program for fc::json::from_file.

This program was compiled and linked against the instrumented libfc.a

3. Run AFL With Seed Example and Dictionary

AFL comes with a "dictionary" of JSON elements which help it make modifications to the input which are likely to be well-formed JSON. It must be provided with a corpus of examples to start testing. For this test, the corpus was a single package.json file from a Node library.

While AFL is running, it provides an update on the number of code paths explored, and how many times the program under test crashed or hung. As you can see, AFL found a crash within less than a minute of run time:


AFL output (not in its original glorious color)

Inputs that cause crashes are saved by AFL in a directory so that they can be re-used for analysis.

The Crash

The crash occurred due to excessive memory usage; AFL runs the test harness with a 50MB memory limit by default.
Examination of the program behavior under the test case found by AFL revealed that the JSON parser was stuck in the following loop:

   template<typename T>
   fc::string stringFromStream( T& in )

...

         while( true )
         {

            switch( c = in.peek() )
            {
               case '\\':
                  token << parseEscape( in );
                  break;
               case 0x04:
                  FC_THROW_EXCEPTION( parse_error_exception, "EOF before closing '\"' in string '${token}'",
                                                   ("token", token.str() ) );
               case '"':
                  in.get();
                  return token.str();
               default:
                  token << c;
                  in.get();
            }
         }

Direct link to Steem source: https://github.com/steemit/steem/blob/7ebe3f8bddf9e58c943618f55136db6330dd95a0/libraries/fc/src/io/json.cpp#L104

The test case inducing this behavior consisted of an opening quote followed by no closing quote before end of file, for example:

 "Y

The loop above erroneously checks for 0x04 rather than EOF, and as a result the token buffer grows without bound, until the memory limit causes the program to crash. It appears that the intent was to catch "Control-D" from interactive streams, but this does not catch premature end-of-file from file-based streams.

Analysis

Providing an invalid string of this form as a JSON-RPC call to a running instance of steemd resulted in a correct error message in the response. This is because in the steemd implementation, the method above is used only with a string stream implementation provided by the FC library, which uses an exception rather than a special return value to signal end-of-file. As a result, malformed input from a HTTP request is correctly handled.

The JSON parser will incorrectly report an unexpected end of file if the the value 0x04 appears in the string. According to the JSON definition, control characters with values less than 0x20 are not allowed within a quoted string, so this appears to be harmless. The FC JSON parser does not enforce this restriction, though, for other control characters such as embedded nulls.

Other uses of the JSON parser within Steem call the same method as the test harness, and so may cause a crash. But these uses are only local, such as configuration or wallet files. A fix is probably appropriate in case a different stream implementation is used in the future which does not share the exception-on-EOF behavior.

In this case, fuzz testing found a bug which had been latent in the code for many years. The original version of FC from 5 years ago does not contain the check for 0x04, but does exhibit the same behavior when used with an input stream that does not signal EOF with an exception: https://github.com/bytemaster/fc/blob/6ac0085e4046de02cee85aca14ffb29307dae0a4/src/io/json.cpp#L64

While other code within the JSON parsing library makes the same assumption that an exception will signal end of file, it appears the quoted string parsing code above is the only case where the -1 reported by in.peek() or in.get() is treated as a valid character.

This bug was reported to Steem's security team on December 2, 2018. Further investigation of the Steem blockchain found a different vulnerability (now patched) which will be discussed in a later article.

Fuzz.ai

Fuzz.ai is an early-stage startup dedicated to making software correctness tools easier to use. Fuzzers, model checkers, and property-based testing can make software more robust, expose security vulnerabilities, and speed development.

While in this case the bug was harmless, similar bugs can lead to security vulnerabilities such as denial-of-service attacks. For high-value software such as blockchain implementations, it is critical to make tools like AFL part of the engineering workflow. Attackers will certainly run them, so defenders must too.