Debugging with Sanitizers

Warning

Sanitizers can only be used with the upstream wasm backend. If you are using fastcomp, you need to upgrade before sanitizers will work. See here for instructions.

Undefined Behaviour Sanitizer

Clang’s undefined behavior sanitizer (UBSan) is available for use with Emscripten. This makes it much easier to catch bugs in your code.

To use UBSan, simply pass -fsanitize=undefined to emcc or em++.

Catching Null Dereference

By default, with Emscripten, deferencing a null pointer does not immediately cause a segmentation fault, unlike traditional platforms. Instead, it checks a magic cookie stored at address 0 at the end of the program execution.

This only detects null pointer writes, and not reads, and it’s rather difficult to find where the null pointer write occurred.

Emscripten provides a SAFE_HEAP mode, which can be activated by running emcc with -s SAFE_HEAP=1. This will catch both null pointer reads and writes, and causes an exception, but it is slower, and does not tell you the exact line numbers unless you compile with -g.

UBSan will tell you exactly where the null deference happened, and works for both reads and writes, with much less performance penalty than SAFE_HEAP.

Consider the following program, null-assign.c:

int main(void) {
    int *a = 0;
    *a = 0;
}

Without UBSan or SAFE_HEAP, you get an error when the program exits:

$ emcc null-assign.c
$ node a.out.js
Runtime error: The application has corrupted its heap memory area (address zero)!

With UBSan, you get the exact line number where this happened:

$ emcc -fsanitize=undefined null-assign.c
$ node a.out.js
null-assign.c:3:5: runtime error: store to null pointer of type 'int'
Runtime error: The application has corrupted its heap memory area (address zero)!

Consider the following program, null-read.c:

int main(void) {
    int *a = 0, b;
    b = *a;
}

Without UBSan or SAFE_HEAP, there is no feedback:

$ emcc null-read.c
$ node a.out.js
$

With UBSan, you get the exact line number where this happened:

$ emcc -fsanitize=undefined null-assign.c
$ node a.out.js
null-read.c:3:9: runtime error: load of null pointer of type 'int'

Minimal Runtime

UBSan’s runtime is non-trivial, and its use can unnecessarily increase the attack surface. For this reason, there is a minimal UBSan runtime that is designed for production uses.

The minimal runtime is supported by Emscripten. To use it, pass the flag -fsanitize-minimal-runtime in addition to your -fsanitize flag.

$ emcc -fsanitize=null -fsanitize-minimal-runtime null-read.c
$ node a.out.js
ubsan: type-mismatch
$ emcc -fsanitize=null -fsanitize-minimal-runtime null-assign.c
$ node a.out.js
ubsan: type-mismatch
Runtime error: The application has corrupted its heap memory area (address zero)!

Address Sanitizer

Clang’s address sanitizer (ASan) is also available for use with Emscripten. This makes it much easier to catch buffer overflows, memory leaks, and other related bugs in your code.

To use ASan, simply pass -fsanitize=address to emcc or em++.

You probably need to increase TOTAL_MEMORY to at least 64 MB or pass -s ALLOW_MEMORY_GROWTH so that ASan has enough memory to start. Otherwise, you will receive an error message that looks something like:

Cannot enlarge memory arrays to size 55152640 bytes (OOM). Either (1) compile with -s TOTAL_MEMORY=X with X higher than the current value 50331648, (2) compile with -s ALLOW_MEMORY_GROWTH=1 which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with -s ABORTING_MALLOC=0

ASan fully supports multi-thread environments.

Examples

Here are some examples of how AddressSanitizer can be used to help find bugs.

Buffer Overflow

Consider buffer_overflow.c:

#include <string.h>

int main(void) {
  char x[10];
  memset(x, 0, 11);
}
$ emcc -g4 -fsanitize=address -s ALLOW_MEMORY_GROWTH buffer_overflow.c
$ node a.out.js
=================================================================
==42==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x02965e5a at pc 0x000015f0 bp 0x02965a30 sp 0x02965a30
WRITE of size 11 at 0x02965e5a thread T0
    #0 0x15f0 in __asan_memset+0x15f0 (a.out.wasm+0x15f0)
    #1 0xc46 in __original_main stack_buffer_overflow.c:5:3
    #2 0xcbc in main+0xcbc (a.out.wasm+0xcbc)
    #3 0x800019bc in Object.Module._main a.out.js:6588:32
    #4 0x80001aeb in Object.callMain a.out.js:6891:30
    #5 0x80001b25 in doRun a.out.js:6949:60
    #6 0x80001b33 in run a.out.js:6963:5
    #7 0x80001ad6 in runCaller a.out.js:6870:29

Address 0x02965e5a is located in stack of thread T0 at offset 26 in frame
    #0 0x11  (a.out.wasm+0x11)

  This frame has 1 object(s):
    [16, 26) 'x' (line 4) <== Memory access at offset 26 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (a.out.wasm+0x15ef)
...

Use After Free

Consider use_after_free.cpp:

int main() {
  int *array = new int[100];
  delete [] array;
  return array[0];
}
$ em++ -g4 -fsanitize=address -s ALLOW_MEMORY_GROWTH use_after_free.cpp
$ node a.out.js
=================================================================
==42==ERROR: AddressSanitizer: heap-use-after-free on address 0x03203e40 at pc 0x00000c1b bp 0x02965e70 sp 0x02965e7c
READ of size 4 at 0x03203e40 thread T0
    #0 0xc1b in __original_main use_after_free.cpp:4:10
    #1 0xc48 in main+0xc48 (a.out.wasm+0xc48)

0x03203e40 is located 0 bytes inside of 400-byte region [0x03203e40,0x03203fd0)
freed by thread T0 here:
    #0 0x5fe8 in operator delete[](void*)+0x5fe8 (a.out.wasm+0x5fe8)
    #1 0xb76 in __original_main use_after_free.cpp:3:3
    #2 0xc48 in main+0xc48 (a.out.wasm+0xc48)
    #3 0x800019b5 in Object.Module._main a.out.js:6581:32
    #4 0x80001ade in Object.callMain a.out.js:6878:30
    #5 0x80001b18 in doRun a.out.js:6936:60
    #6 0x80001b26 in run a.out.js:6950:5
    #7 0x80001ac9 in runCaller a.out.js:6857:29

previously allocated by thread T0 here:
    #0 0x5db4 in operator new[](unsigned long)+0x5db4 (a.out.wasm+0x5db4)
    #1 0xb41 in __original_main use_after_free.cpp:2:16
    #2 0xc48 in main+0xc48 (a.out.wasm+0xc48)
    #3 0x800019b5 in Object.Module._main a.out.js:6581:32
    #4 0x80001ade in Object.callMain a.out.js:6878:30
    #5 0x80001b18 in doRun a.out.js:6936:60
    #6 0x80001b26 in run a.out.js:6950:5
    #7 0x80001ac9 in runCaller a.out.js:6857:29

SUMMARY: AddressSanitizer: heap-use-after-free (a.out.wasm+0xc1a)
...

Memory Leaks

Consider leak.cpp:

int main() {
  new int[10];
}
$ em++ -g4 -fsanitize=address -s ALLOW_MEMORY_GROWTH -s EXIT_RUNTIME leak.cpp
$ node a.out.js

=================================================================
==42==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x5ce5 in operator new[](unsigned long)+0x5ce5 (a.out.wasm+0x5ce5)
    #1 0xb24 in __original_main leak.cpp:2:3
    #2 0xb3a in main+0xb3a (a.out.wasm+0xb3a)
    #3 0x800019b8 in Object.Module._main a.out.js:6584:32
    #4 0x80001ae1 in Object.callMain a.out.js:6881:30
    #5 0x80001b1b in doRun a.out.js:6939:60
    #6 0x80001b29 in run a.out.js:6953:5
    #7 0x80001acc in runCaller a.out.js:6860:29

SUMMARY: AddressSanitizer: 40 byte(s) leaked in 1 allocation(s).

Note that since leak checks take place at program exit, you must use -s EXIT_RUNTIME, or invoke __lsan::DoLeakCheck manually.

You can detect that AddressSanitizer is enabled and run __lsan::DoLeakCheck by doing:

#if defined(__has_feature)
#if __has_feature(address_sanitizer)
  // code for ASan-enabled builds
  __lsan::DoLeakCheck();
#endif
#endif

Also, if you only want to check for memory leaks, you may use -fsanitize=leak instead of -fsanitize=address. -fsanitize=leak does not instrument all memory accesses, and as a result is much faster than -fsanitize=address.

Use After Return

Consider use_after_return.c:

#include <stdio.h>

const char *__asan_default_options() {
  return "detect_stack_use_after_return=1";
}

int *f() {
  int buf[10];
  return buf;
}

int main() {
  *f() = 1;
}

Note that to do this check, you have to use the ASan option detect_stack_use_after_return. You may enable this option by declaring a function called __asan_default_options like the example, or you can define Module['ASAN_OPTIONS'] = 'detect_stack_use_after_return=1' in the generated JavaScript. --pre-js is helpful here.

This option is fairly expensive because it converts stack allocations into heap allocations, and these allocations are not reused so that future accesses can cause traps. Hence, it is not enabled by default.

$ emcc -g4 -fsanitize=address -s ALLOW_MEMORY_GROWTH=1 use_after_return.c
$ node a.out.js
=================================================================
==42==ERROR: AddressSanitizer: stack-use-after-return on address 0x02a95010 at pc 0x00000d90 bp 0x02965f70 sp 0x02965f7c
WRITE of size 4 at 0x02a95010 thread T0
    #0 0xd90 in __original_main use_after_return.c:13:10
    #1 0xe0a in main+0xe0a (a.out.wasm+0xe0a)

Address 0x02a95010 is located in stack of thread T0 at offset 16 in frame
    #0 0x11  (a.out.wasm+0x11)

  This frame has 1 object(s):
    [16, 56) 'buf' (line 8) <== Memory access at offset 16 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-return (a.out.wasm+0xd8f)
...

Configuration

ASan can be configured via a --pre-js file:

Module.ASAN_OPTIONS = 'option1=a:option2=b';

For example, put the above snippet with your options into asan_options.js, and compile with --pre-js asan_options.js.

For standalone LSan, use Module.LSAN_OPTIONS instead.

For a detailed understanding of the flags, see the ASan documentation. Please be warned that most flag combinations are not tested and may or may not work.

Disabling malloc/free Stack Traces

In a program that uses malloc/free (or their C++ equivalent, operator new/operator delete) very frequently, taking a stack trace at all invocations to malloc/free can be very expensive. As a result, if you find your program to be very slow when using ASan, you can try using the option malloc_context_size=0, like this:

Module.ASAN_OPTIONS = 'malloc_context_size=0';

This prevents ASan from reporting the location of memory leaks, as well as offer insight into where the memory for a heap-based memory error originated, but may provide tremendous speed ups.