This article provides information that is relevant to people who want to contribute to Emscripten. We welcome contributions from anyone that is interested in helping out!
The information will be less relevant if you’re just using Emscripten, but may still be of interest.
For contributing to core Emscripten code, such as
emcc.py, you don’t need to
build any binaries as
emcc.py is in Python, and the core JS generation is
get using the emsdk:
emsdk install tot emsdk activate tot
This with install the latest “tip-of-tree” binaries needed to run Emscripten.
You can use these emsdk-provided binaries with a git checkout of the Emscripten
repository. To do this, you can either edit your local
file, or set
EM_CONFIG=/path/to/emsdk/.emscripten in your environment.
If you do want to contribute to LLVM or Binaryen, or to test modifications to them, you can build them from source.
The Emscripten main repository is https://github.com/emscripten-core/emscripten.
Aside from the Emscripten repo, the other codebases of interest are LLVM and Binaryen, which Emscripten invokes, and have their own repos.
Patches should be submitted as pull requests in the normal way on GitHub.
When submitting patches, please:
Add an automatic test if you add any new functionality or fix a bug. Search
test/*.py for related tests, as often the simplest thing is to add to
an existing one. If you’re not sure how to test your code, feel free to ask
Pay attention to our coding style specified in .clang-format.
We normally squash and merge PRs, which means the PR turns into a single commit on the target branch. Because of that, it’s ok to have merge commits in the PR itself, as they get removed. Please put a good description for the final commit in the PR description, and we’ll use it when squashing.
One of the core developers will review a pull request before merging it. If several days pass without any comments on your PR, please comment in the PR which will ping them. (If that happens, sorry! Sometimes things get missed.)
The Emscripten Compiler Frontend (emcc) is a python script that manages the entire compilation process:
emcc calls Clang to compile C++ and
wasm-ld to link it. It
builds and integrates with the Emscripten system libraries, both the
compiled ones and the ones implemented in JS.
emcc then calls emscripten.py
which performs the final transformation to Wasm (including invoking
wasm-emscripten-finalize from Binaryen) and calls the JS compiler
src/compiler.js and related files) which emits the JS.
If optimizing Wasm, emcc will then call wasm-opt, run meta-dce, and other useful things. It will also run JS optimizations on the JS that is emitted alongside the Wasm.
Emscripten has a comprehensive test suite, which covers virtually all Emscripten functionality. These tests are run on CI automatically when you create a pull request, and they should all pass. If you run into trouble with a test failure you can’t fix, please let the developers know.
If you find a regression, bisection is often the fastest way to figure out what went wrong. This is true not just for finding an actual regression in Emscripten but also if your project stopped working when you upgrade, and you need to investigate if it’s an Emscripten regression or something else. The rest of this section covers bisection on Emscripten itself. It is hopefully useful for both people using Emscripten as well as Emscripten developers.
If you have a large bisection range - for example, that covers more than one version of Emscripten - then you probably have changes across multiple repos (Emscripten, LLVM, and Binaryen). In that case the easiest and fastest thing is to bisect using emsdk builds. Each step of the bisection will download a build produced by the emscripten releases builders. Using this approach you don’t need to compile anything yourself, so it can be very fast!
To do this, you need a basic understanding of Emscripten’s release process The key idea is that:
emsdk install [HASH]
can install an arbitrary build of emscripten from any point in the past (assuming the build succeeded). Each build is identified by a hash (a long string of numbers and characters), which is a hash of a commit in the releases repo. The mapping of Emscripten release numbers to such hashes is tracked by emscripten-releases-tags.json in the emsdk repo.
With that background, the bisection process would look like this:
Find the hashes to bisect between. You may already know them if you found
the problem on
tot builds. If instead you only know Emscripten version
emscripten-releases-tags.json to find the hashes.
Using those hashes, do a normal
git bisect on the
In each step of the bisection, download the binary build for the current
commit hash (in the
emscripten-releases repo that you are bisecting on)
emsdk install HASH. Then test your code and do
git bisect good or
git bisect bad accordingly, and keep bisecting
until you find the first bad commit.
The first bad commit is a single change in the releases repo. That commit will generally update a single sub-repo (Emscripten, LLVM, or Binaryen) to add one or more new changes. Often that list will be very short or even a single commit, and you can see which actual commit caused the problem. When filing a bug, mentioning such a bisection result can greatly speed things up (even if that commit contains multiple changes).
If that commit contains multiple changes then you can optionally bisect further on the specific repo (as all the changes will normally be in just one of them, with the others kept fixed). Doing this will require rebuilding locally, which was not needed in the main bisection described in this section.
If you change the layout of C structs or modify C defines that are used in
time that file is modified or a struct layout is changed you will need to run
./tools/gen_struct_info.py to re-generate the information used by
test_gen_struct_info test will fail if you forget to do this.