An exploration of ES bindings
The things that JavaScript ecosystem makes you do... You know, for fun!
If today was the best day ever, what would you be doing? I bet that "Importing CJS scripts in ESM modules", was the answer on top of your mind. Loved the redundancy on "ESM modules", by the way.
So, let's paint this picture: You've got the daunting task of developing a web bundler akin to webpack
or esbuild
. Yup, you fancy a challenge. You've made the executive decision to leave all CJS behind, supporting only ESM moving forward. After all, who wouldn't yearn for the browser to start pulling its weight in resolving scripts?
By "supporting only ESM", you mean that you'd want to be spec-compliant and support all ESM-compatible import statements. This means that you'd want this to work:
// some-module.mjs
import { foo, baz, jazz } from 'bar';
import * as Bar from 'bar';
import Bar from 'bar'; // what the spec says should work
That is effectively importing from a CJS script, that doesn't export
anything —which is also CJS-compliant:
// bar.js
module.exports = {
foo: 'foo',
baz: 'baz',
jazz: 'jazz'
}
As you are also pretty deeply aware, ESM semantics doesn't really allow you to do that, as that was the moment the ecosystem started its "Great-Divide" (there was no such thing but I dig a drama). You can only import things that were exported from another module. The TC39 proposal adds the context and motivation behind why. But comparing this to Cargo (or even Go?) made evident how much a language's module and package management systems are crucial parts of its success.
Back to our problem, consider that a particular CJS package exports have the following index.js
:
// some_pkg/index.js
exports.someValue = Object.defineProperty(exports, 'someValue', { value: 42 });
exports.setSomeValue = function (value) {
exports.someValue = value;
};
How would you make those exported names available for an ESM import? Bundlers often wrap the CJS module by adding a runtime function that collects all the exported names and binds them to a local variable, which is then exported as the default export:
import { __commonJS } from './esbuild-runtime.js';
var require_some_pkg = __commonJS({
"node_modules/some_pkg/index.js": (exports, module) {
exports.someValue = Object.defineProperty(exports, 'someValue', { value: 42 });
exports.setSomeValue = function (value) {
exports.someValue = value;
};
}
})
export default require_some_pkg();
This is the approach esbuild
takes, and it works pretty well. Primarily because default imports work "for free", and namespace imports kinda work. "Kinda" because the module is nested in a default
property:
import somepkg from 'some_pkg';
console.dir(somepkg);
// Object{setSomeValue: [Function: setSomeValue], default: {someValue: 42}}
import * as somepkg from 'some_pkg';
console.dir(somepkg);
// Module{default{Object{setSomeValue: [Function: setSomeValue], default: {someValue: 42}}}}
Well, the job is done! (Checks scrollbar and is halfway through...) Let's try this now:
import { someValue } from 'some_pkg';
// Uncaught SyntaxError: The requested module '/src/somepkg/index.js' does not provide an export named 'someValue'
🤡
A SyntaxError
? That is what is thrown when a parsing error happens. Right?
This exposes a bit of how ESM imports are implemented; during script parsing, it already performs module resolution to figure out whether an import was successfully resolved or not. This particular error is likely an early error, as this is not a dynamic import i.e. only resolvable at runtime.
In other words, during parsing, the JS' executing context parses all dependencies to determine all the exported names from a given module. In our case, someValue
was not exported from some_pkg/index.js
, and that is incorrect syntax, rather than incorrect (module) semantics, which would be detected at a later stage.
Back to our problem, how could we fix this? To make things even more interesting, let's add some constraints:
We can't edit the importer, only the exporter i.e., we can't change the ESM code that is importing stuff, only the code that exports stuff
We don't have control over how the entire imported module's package is being bundled e.g., whether all of its modules are bundled in a single chunk or not
One naive, but perhaps good enough, approach:
While traversing all source code, for each module, look for imports to `some_pkg`
List all named imports found from `some_pkg`
While building (specifically, the transformation step) `some_pkg`, for each named import, explicitly add export statements to each of its named import found, e.g `export { namedImport, fooImport, barImport }`
Extending our previous example, this would work out like this:
import { __commonJS } from './esbuild-runtime.js';
var require_some_pkg = __commonJS({
"node_modules/some_pkg/index.js"(exports, module) {
exports.someValue = Object.defineProperty(exports, 'someValue', { value: 42 });
exports.setSomeValue = function (value) {
exports.someValue = value;
};
}
})
--- export default require_some_pkg();
+++ var __tmp = require_some_pkg();
+++ export default tmp;
+++ var someValue = __tmp.someValue;
+++ var setSomeValue = __tmp.setSomeValue;
+++ export { someValue, setSomeValue };
And that works! Well, as long as we bundle all content of some_pkg
in the same chunk.
import { someValue } from 'somepkg';
console.log(someValue); // 42
Yeah. That works! I think we are done here. Or are we?
import { someValue, setSomeValue } from 'somepkg';
console.log(someValue); // 42
setSomeValue(43)
console.log(someValue); // 42 :(
Oops. Because we assigned someValue
to a local variable (dereferenced it), the local variable is a different reference than the reference that setSomeValue
is effectively mutating. In our example, the former is someValue
and the latter is __tmp.someValue
.
If only we could do this...
export { __tmp.someValue as someValue, __tmp.setSomeValue as setSomeValue };
But that is not legal JavaScript.
To recap, when we call setSomeValue
, we are mutating the local variable someValue
(in the bundled output's module). Not the exports.someValue
, which is what setSomeValue
references. That binding is accessible through __tmp.someValue
.
One way to solve this is to try to intercept an object's property [[Getter]]
access using Object.defineProperty
:
import { __commonJS } from './chunk-2J7J7Y4O.js';
var require_some_pkg = __commonJS({/* omitted for brevity */})
var __tmp = require_some_pkg();
Object.defineProperty(__tmp, 'someValue', {
get() {
return __tmp.someValue;
},
set(v) {
__tmp.someValue = v;
}
})
var someValue = __tmp.someValue;
var setSomeValue = __tmp.setSomeValue;
export { someValue, setSomeValue };
That works! Until you bump into packages that are also using Object.defineProperty
to define their exports. And, as you may be aware, Object.defineProperty
throws a TypeError
if you try to redefine a property that is already defined.
Considering that aliasing here simply won't work, because we need to be able to mutate the original binding, and we don't own it, we need a different approach.
I ended up using a Proxy
to intercept property access at the time that the exports
names (properties) are being extracted. That happens in esbuild's __commonJS
util.
If you were to try to read that function obfuscated, it can take a few minutes to understand what it is doing.
We already know where it is being used: it is a wrapper around CJS exports. Let's unpack it:
function __commonJS_r(cb, mod) {
const modulePath = __getOwnPropNames(cb)
return () => {
if (mod) {
// bail early if we already collected the exports of this module
return mod
}
var firstProp = __getOwnPropNames(cb)[0]
// in our example, firstProp resolves to "node_modules/some_pkg/index.js"
var importCallback = cb[firstProp]
// the import callback calls the "wrapper" function, passing `mod.exports` to bind all named exports
// this means that all properties defined in CJS `exports` name, will be bound to `mod.exports`
return importCallback((mod = { exports: {} }).exports, mod), mod.exports
}
}
We would need to intercept that mod.exports
object. We can do that by using a Proxy
:
// live bindings that are being exported
let __someValue, __setSomeValue;
var genProxy = () => new Proxy(__tmp, {
// whenever we try to read a property from a known named export,
// we return the value of the local variable that holds the live binding
get(target, prop) {
if (prop === 'someValue') {
return __someValue;
}
return target[prop];
},
set(target, prop, value) {
if (prop === 'someValue') {
__someValue = value;
return true;
}
target[prop] = value;
return true;
}
})
And we hook that proxy whenever we are about to extract the named exports from the CJS module:
var __tmpCommonJS = (callback, module) => () => {
const modulePath = __getOwnPropNames(cb)
return () => {
if (mod) return mod
var firstProp = __getOwnPropNames(cb)[0]
var importCallback = cb[firstProp]
return importCallback((mod = { exports: genProxy() }).exports, mod), mod.exports
// ^ instead of `mod.exports`, we bind the named exports to the proxy
}
};
Stitching everything together, the final solution looks a bit like this -- simplified for the sake of "brevity", considering that we'd have dynamic named exports, things can be more generic though less readable.
// live bindings that are being exported
let __someValue, __setSomeValue;
var genProxy = () => new Proxy(__tmp, {
// whenever we try to read a property from a known named export,
// we return the value of the local variable that holds the live binding
get(target, prop) {
if (prop === 'someValue') {
return __someValue;
}
return target[prop];
},
set(target, prop, value) {
if (prop === 'someValue') {
__someValue = value;
return true;
}
target[prop] = value;
return true;
}
})
var __tmpCommonJS = (callback, module) => () => {
const modulePath = __getOwnPropNames(cb)
return () => {
if (mod) return mod
var firstProp = __getOwnPropNames(cb)[0]
var importCallback = cb[firstProp]
return importCallback((mod = { exports: genProxy() }).exports, mod), mod.exports
// ^ instead of `mod.exports`, we bind the named exports to the proxy
}
};
var require_some_pkg = __tmpCommonJS({/* omitted for brevity*/})
var __tmp = require_some_pkg();
// initialize the live bindings from the proxy
__someValue = __tmp.someValue;
__setSomeValue = __tmp.setSomeValue;
export { __someValue as someValue, __setSomeValue as setSomeValue };
Can you spot a few problems with the solution above? Yeah, a few drawbacks:
Aliasing (or renaming) when re-exporting is not supported.
Consider the scenario where somepkg
imported someValue
from somepkg/a.js
and somepkg.b.js
Assume that it re-exports the name `someValue` from `somepkg/a.js`
Assume that it re-exports the name `someValue` from `somepkg/b.js` aliasing it to `anotherValue`
Our proxy solution would shadow the first `someValue` binding with the second one, and we would end up with `undefined` when trying to access `import { someValue } from 'somepkg'`
Direct imports to code split packages are not supported.
Consider the scenario where somepkg
code was split into three bundles, sp-chunk-a.js
, sp-chunk-b.js
and sp-index.js
- Assume that someValue
was defined in sp-chunk-a.js
and imported on sp-index.js
(to be re-exported as a somepkg
's module export)
- Assume that we have, on user source (say src-a.js
), a direct import to import { someValue } from 'somepkg/chunk-a.js'
- Assume that we have, on another user source (say src-b.js
), a package import to import { someValue, setSomeValue } from 'somepkg'
- Now also assume that, in src-b.js
called setSomeValue('foo')
, and called console.log(someValue)
, our output would be foo
- Now assume that in src-a.js
, we call console.log(someValue)
, our output would be 42
(initial value)
Because of these, this is not the final solution. And is left as an exercise to the reader 🙂
A fun exploration of how bindings work in JS, and how bundlers need to go out of their way to make them work in a way that is compatible with the multitude of module specs in the wild.