Developers' Guide: How Wax Lost Weight 🏋️

in #hivedev20 hours ago
Authored by @mtyszczak

Developers' Guide: How Wax Lost Weight 🏋️


squeezing-wax-small.png

Developers' Guide: How Wax Lost Weight 🏋️

Hey Hive community! 👋

As promised in this article below is detailed info how we did it. Enjoy !

🤓 For the Nerds: Technical Deep Dive

We imagine you might be interested in how we achieved such a significant size improvement in a project as complex as the Wax library (which mixes core Hive C++ code with TypeScript/JavaScript interfaces). Below, we share the key steps and ideas behind this "magic" change.

Our old motto is: It's not a big deal to achieve complex goals by using lots of source code and dependencies—the real trick is simplicity and a minimal amount of code (both source and generated binary) to limit maintenance, resource consumption, and execution costs...

Since the size of the Wax NPM package was mostly dominated by the internally held WASM code, focusing on reducing this part was the obvious decision.

Step 1: Improved emSDK Toolchain

The Emscripten toolchain we use to build the C++ portion of Wax targeting WASM has been improving significantly lately. No surprise there—WASM is one of the cutting-edge technologies these days.
The obvious approach was to upgrade from the 3.0 series and early 4.0 releases we used in the past, which hadn't provided any significant improvements. But...

Yes—there were some changes that allowed us to go further, where previously it was very troublesome.

How did we do it? Well, our infrastructure is mostly based on Docker images, which allow us to efficiently manage our development environments by combining external tool upgrades with our own changes.

We updated the common-ci-configuration module to use a new emsdk Docker image (version 4.0.22 series). Although it didn't immediately lead us to low-hanging fruit and instant benefits after the upgrade, it offered—in my humble opinion—something much more important: the ability to analyze WASM binaries using other WASM tools. We had tried a few months ago and it was impossible—the produced WASM binary was simply incompatible with WASM targets built by Rust or other tools (like WABT).


Step 2: emsdk 4.0.22 with -Oz Optimized System Libraries

What changed in common-ci-configuration:

  • Now all system libraries (especially Boost, OpenSSL) are compiled with the -Oz optimization level. Some of these libraries are built on our side to provide better compatibility with the underlying C++ code shared directly with the Hive protocol library.
  • This propagates size savings across ALL linked code, not just our source.

The above trivial step—which, honestly, we had missed when we first discovered how much space could be saved (in the direct Wax code build) by using the -Oz switch instead of the default -O3 in Release WASM builds—gave us benefits: close to hundreds of kilobytes. But the results were still disappointing. We still had a binary exceeding 5MB. 😥


Step 3: Additional wasm-opt Binaryen Passes

HA ‼️ The new compiler, thanks to newer underlying Binaryen tools (responsible for binary code generation), also offered the ability to perform post-build optimizations at low risk. This is the big one! We added a custom set of Binaryen optimization passes that run after the initial WASM compilation:

# ts/wasm/src/CMakeLists.txt

DEFINE_WASM_TARGET_FOR( wax
  TARGET_ENVIRONMENT "web"
  LINK_LIBRARIES hive_protocol
  LINK_OPTIONS -sNO_DYNAMIC_EXECUTION=1 -sALLOW_MEMORY_GROWTH=1
+ -sBINARYEN_EXTRA_PASSES="--all-features,-Oz,--strip-debug,--strip-dwarf,--strip-producers,--remove-unused-names,--remove-unused-module-elements,--vacuum,--duplicate-function-elimination,--merge-similar-functions"
)

DEFINE_WASM_TARGET_FOR( wax
  TARGET_ENVIRONMENT "node"
  LINK_LIBRARIES hive_protocol
  LINK_OPTIONS -sALLOW_MEMORY_GROWTH=1
+ -sBINARYEN_EXTRA_PASSES="--all-features,-Oz,--strip-debug,--strip-dwarf,--strip-producers,--remove-unused-names,--remove-unused-module-elements,--vacuum,--duplicate-function-elimination,--merge-similar-functions"
)

That was the first breath of fresh spring! (Even though winter was coming. 😊) 500 kB less! 🍾

Breakdown of each pass:

PassDescription
--all-featuresEnable all WASM features for maximum optimization opportunities
-OzOptimize aggressively for size
--strip-debugRemove debug information
--strip-dwarfRemove DWARF debugging data
--strip-producersRemove producer section (compiler metadata)
--remove-unused-namesStrip unnecessary function/variable names
--remove-unused-module-elementsDead code elimination at module level
--vacuumRemove obviously unneeded code
--duplicate-function-eliminationMerge identical functions into one
--merge-similar-functionsMerge functions that differ only slightly

Result: ~500 kB reduction from this single change! 🎯

But what to do next? The black box.

Yes, this was one of our biggest problems: how could we look inside the WASM binary package, how could we split it logically, and—most importantly—how could we associate it with the source code to answer the question: what led the compiler to produce so much binary code?


Step 4: Black Magic: What's Under the Black Box Cover? Simple Answer: The Evil.

Here we could offer some not-so-kind words about the code generated by the Emscripten compiler—it was guilty, producing tons of trash.
Not exactly. We needed an exorcist—a.k.a. a hacker. :-) OK, let's get back down to earth: we needed tools. And we got them—thanks to the compiler upgrade mentioned above.

Due to the standardized format of the produced WASM binary, analysis by third-party tools became possible. twiggy is our hero...

It was the tool that allowed us to peer into the produced binary. And what we saw... was a big surprise. 😃

🔬 How to Analyze WASM Binary Size (Developer Guide)

To make things easier for everyone doing Wax development in our group, we added professional binary analysis tools to our emsdk Docker image to help with optimization efforts.

Available Tools

ToolDescriptionBest For
twiggyWASM-specific size profilerFunction-level analysis, dominators
bloatyGoogle's binary size profilerCompile units, DWARF analysis
wasm-objdumpWABT disassemblerSection breakdown, raw details
wasm-optBinaryen optimizerMetrics, optimization passes

Investigation Workflow

Step 1: Get section breakdown

wasm-objdump -h wax.common.wasm

Step 2: Get overall metrics

wasm-opt --all-features --metrics wax.common.wasm

Example output:

[funcs]        : 9405
[memory-data]  : 1777632    # 🔴 Static data - often the biggest culprit!
Try            : 31448      # 🔴 Exception handling blocks
Rethrow        : 24515      # 🔴 Exception rethrowing

Step 3: Find top size consumers

twiggy top -n 50 wax.common.wasm

Example output:

 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼──────────────────────
       111432424.42% ┊ data[83]          # 🔴 1.1MB data segment!
        1855554.07% ┊ data[0]
         283930.62%code[174]         # Largest function
         144750.32%code[4568]

Step 4: Analyze dominators (what removal would save)

twiggy dominators -n 50 wax.common.wasm

Step 5: Look for common bloat patterns

# Search for exception handling, RTTI, Boost, fc library
twiggy top wax.common.wasm | grep -E "(cxa_|typeinfo|boost|fc::|std::)"

# Extract strings to find embedded data
strings wax.common.wasm | sort | uniq -c | sort -rn | head -100

# Find type names, error messages
strings wax.common.wasm | grep -E "hive::|fc::|error|exception" | head -50

YEAH—we're not blind anymore!

Now it was time to figure out what was contained in this huge data segment...

Interpreting Data Segments

You can see large data[N] segments in twiggy output (the example below dumps part of the biggest one):

# Check if it's strings or binary data
wasm-opt --all-features --print wax.common.wasm > wax.wat
grep -A 5 "(data (;83;)" wax.wat | head -20

The question was: what kind of data could be there? Potential answers:

  • Readable strings → Error messages, type names, JSON keys
  • Binary data (escape sequences like \a8\cde) → Crypto tables, lookup tables

A quick conversation with an AI chat about potential sources of such large data segments led us to secp256k1, which has precompiled lookup tables—and fortunately, options to reduce its size:

For crypto tables (like secp256k1), the build step shown below allows reducing precomputation data size by ~1 MB:

# When building secp256k1
./configure --with-ecmult-window=4 --with-ecmult-gen-precision=2

We also created a small local script to improve the workflow and speed up iterations during the optimizations:

Quick Size Check Script

#!/bin/bash
WASM_FILE="$1"

echo "=== Section Sizes ==="
wasm-objdump -h "$WASM_FILE"

echo -e "\n=== Metrics ==="
wasm-opt --all-features --metrics "$WASM_FILE"

echo -e "\n=== Top 20 Size Consumers ==="
twiggy top -n 20 "$WASM_FILE"

echo -e "\n=== Potential Bloat Patterns ==="
twiggy top "$WASM_FILE" | grep -E "(cxa_|typeinfo|boost|fc::|exception)" || echo "None found"

We gained this magical skill to look into the produced binary. But we still needed to associate the binary data with C++ source code parts and calculate/estimate their impact on WASM size. To do this, we simply re-enabled debug symbol generation in our WASM (which made it huge again, but also allowed us to find the sources matching each binary code section).

Here are the results:

Common Bloat Sources in C++ → WASM

SourceTypical ImpactHow to Detect
secp256k1 precomputation tables500 kB–1.5 MBLarge data[N] segments with binary data
C++ exceptions100 kB–500 kB+Look for __cxa_throw, high Try/Rethrow counts
RTTI (dynamic_cast)50 kB–200 kBLook for typeinfo, __dynamic_cast
iostream/formatting200 kB–500 kBLook for std::ostream, basic_string
FC library reflection200 kB–1 MBLook for fc:: namespace, string literals
Template instantiationsVariesRepeated patterns with different types
Debug symbols/names500 kB–2 MBCheck "name" section with wasm-objdump -h

The above analysis and further digging into the C++ code brought us a few other optimization ideas...

Step 5: Centralized Exception Handling in the fc Library

We already knew that enabling exception handling in Emscripten builds led to a significant binary size increase. It was easy to verify by simply removing the dedicated switches from the compiler command line. The problem was that we needed exceptions to make the code work correctly.
To understand why these changes helped, some deeper knowledge is needed—specifically about the ways the compiler chooses to generate exception handler code and perform code generation for C++ templates. Using another great tool, Clang Build Analyzer, showed us exactly how many template instantiations are performed by the compiler. This led us to fc::raw::pack/fc::raw::unpack and the reflection visitor, which had thousands of instantiations. Some of them are eliminated at the end by the linker, but a lot of binary code was still generated from them.

Further quick tests (done in a trivial way by simply eliminating parts of code from such template functions) allowed us to identify the culprit. 👉

The Problem:
The FC_RETHROW_EXCEPTIONS macro was being used inline in template code, specifically in fc::raw::unpack. Since templates are instantiated for every type, this meant:

  • ~3 catch handlers generated per template instantiation
  • Hundreds of template instantiations = thousands of duplicate exception handlers
  • Massive code bloat in the final binary

The Solution:
We introduced the unpack_error_handler class, which moves the exception handling logic from inline template code to a single non-template function whose code is generated only once. You can find its definition here.

Below is an illustration of the idea: how to preserve runtime behavior while eliminating binary code bloat.

// Before: Exception handling duplicated in every template instantiation,
// leading to significant code bloat
template<typename Stream, typename T>
void unpack(Stream& s, T& value, uint32_t, bool limit_is_disabled ) {
    try {
        // unpack logic
        unpack_object_visitor<Stream,T> visitor( v, s, limit_is_disabled );
    }
    FC_RETHROW_EXCEPTIONS(warn, "error unpacking ${type}",
                          ("type", fc::get_typename<T>::name()))
}

// After: Exception handling centralized—a single exception handler implemented
// inside the `call` method wraps client code passed as a functor (lambda expression)
template<typename Stream, typename T>
  void unpack( Stream& s, T& v, uint32_t, bool limit_is_disabled ) {
    unpack_object_visitor<Stream,T> visitor( v, s, limit_is_disabled );
    unpack_error_handler error_handler( fc::get_typename<T>::name());
    error_handler.call(
      /// Actual code to call: real unpack body
      [&visitor]() { fc::reflector<T>::visit( visitor ); },
      /// Helper code to improve error messaging and provide the name of the
      /// field being processed when the exception was thrown
      [&visitor]() { return visitor.get_field_name(); }
      );
    }

Result: Significant reduction in binary size by eliminating redundant exception handling code across all protocol type serialization.


Step 6: Excluding Virtual Operations via HIVE_PROTOCOL_SKIP_VOPS

The above template code generation analysis led us to another idea: cutting things down by reducing the set of blockchain operations to include only those that can be part of a transaction.

Virtual operations (vops) are operations generated by the blockchain itself—they're never submitted by users. Examples include author_reward_operation, curation_reward_operation, fill_order_operation, etc.

The Problem:
The hive::protocol::operation type is an fc::static_variant containing ALL operation types. Each type in the variant generates:

  • Serialization/deserialization code
  • Visitor handlers
  • Type metadata

With 43 virtual operation types that Wax never needs to create or sign, that's a lot of dead code!

The Solution:
We added a compile definition that conditionally excludes virtual operations from the operation variant:

# ts/wasm/src/CMakeLists.txt (line 19)
add_compile_definitions(HIVE_PROTOCOL_SKIP_VOPS)

This triggers the #ifndef HIVE_PROTOCOL_SKIP_VOPS guards in the protocol headers:

// core/operations_fwd.hpp

// Regular operations (always included)
transfer_operation,
vote_operation,
comment_operation,
// ... 49 regular operations total

#ifndef HIVE_PROTOCOL_SKIP_VOPS
        ,
        /// virtual operations below this point (EXCLUDED from WASM build!)
        fill_convert_request_operation,
        author_reward_operation,
        curation_reward_operation,
        comment_reward_operation,
        // ... 43 virtual operations total
#endif

Result: Removing 43 types from the static variant eliminates all their associated template instantiations, significantly reducing binary size.


Step 7: Minimal OpenSSL Build for WASM

As is typical in an optimization process, eliminating larger bottlenecks (large binary code sections in our case) uncovered other culprits and made it worthwhile to try optimizing them as well.

OpenSSL is notoriously large, but we only need a small subset of its functionality for cryptographic operations in Wax.

Commit: 3fc2ac36 - Optimize OpenSSL build for minimal WASM size

Disabled Features:

CategoryDisabled
ProtocolsSSL, TLS, DTLS, SSL3
AlgorithmsDSA, DH, RC2, RC4, RC5, Blowfish
FeaturesCMS, OCSP, SRP, PSK, timestamps, CT, compression
I/Ostdio, sockets
DebugError strings, auto error init, filenames

Compiler Flags Applied:

-fvisibility=hidden      # Hide internal symbols
-ffunction-sections      # Place each function in its own section
-fdata-sections          # Place each data item in its own section
--gc-sections            # Garbage collect unused sections at link time

Result: Dramatically smaller OpenSSL footprint—only the essential crypto primitives remain!


🔮 Future Optimization Ideas (Internal Notes)

We're not done yet! Here are potential optimizations we're exploring for future releases:

1. Strip fc Error Messages & File References

Potential savings: ~150 kB

The fc library embeds error messages with __FILE__ and __func__ macros for debugging. Stripping these could save significant space, but we're not ready yet—we don't have all error hashes properly handled for production error reporting.

Note: Just removing __FILE__ and __func__ alone saves only ~1 kB—the real savings come from stripping the full error message strings.

2. LTO (Link-Time Optimization)

Status: ❌ Not viable currently

We tested -flto=thin but it actually increased binary size. This is counterintuitive but can happen when LTO inlines aggressively or when the optimizer makes different trade-offs. This needs more investigation.

3. Replace std::string with a Lightweight Alternative

Potential approach: std::string → Custom class → fc::string

The standard library string implementation carries overhead. A minimal string class tailored for our use cases could reduce both code size and memory footprint.

4. Replace fc::static_variant with Virtual Visitor Pattern

Proposed by: @small.minion

Instead of using fc::static_variant (which generates template code for every type combination), we could:

  • Create a standard abstract base class
  • Implement the classical virtual visitor pattern
  • Trade some runtime performance for significant code size reduction

This would eliminate the template instantiation explosion that static variants cause.

5. Non-Template Stream Operations

Idea: Replace template<typename Stream> with void*-based streams

Current stream operations are templated, generating separate code for each stream type. Using type-erased streams (via void* or abstract base) would reduce code duplication at the cost of some type safety.

6. Separate binary_view + fc::raw into a Dedicated WASM Module

Potential savings: ~400 kB (estimated, may overlap with other optimizations)

The binary view functionality and raw serialization could be split into a separate WASM module that's loaded on demand. This would:

  • Reduce initial bundle size
  • Allow lazy loading for features not always needed
  • Enable independent optimization of each module

7. Explicit FC_REFLECT Instantiation

Approach: Define all reflection code in a single .cpp file

Currently, FC_REFLECT macros may instantiate template code in multiple translation units. Consolidating all reflection definitions into one compilation unit could enable better dead code elimination and reduce duplicate instantiations.

8. Awaiting Other Compiler and WASM Improvements

This is the most tedious part for us. 🤣 I'm joking, of course. 😊 But honestly, our hopes are not unfounded. The last few months brought another WASM standard: WASM 3.0. There is an important note related to exception handling, which is a crucial part of every modern language:

Previously, there was no efficient way to compile exception handling to Wasm

The introduction of new standardized WASM features gives us hope that this will be done much better. The deeper integration with the JS stack that they promise also looks great.


This programming adventure that our task turned into finally resulted in reducing the Wax package size from 5.6 MB to 3.4 MB (uncompressed size).

Happy building! 🐝

@mtyszczak from @thebeedevs