Optimizing Graphing Performance on the Web
My personal quest for the ultimate Mandelbrot set viewer
For me to stay engaged as an engineer, I need to try new things. Programming is fun when there’s a puzzle to solve or a new way to express an idea. When I’m trying to learn a new tool, I’ll often go searching for a problem that I can use it on. Needless to say, when I try to find a project for an unfamiliar tool (rather than vice-versa), I often end up building something for which the only positive description is “creative”. Other times though, I have a project that I’ve wanted to do for a while, and a technology that I’ve always wanted to learn, and they serendipitously work well together. Hopefully this is one of those stories.
When I decided to build a Mandelbrot set viewer, it was not because I felt the world needed one more implementation on top of the many that already exist. Instead, it was because I had a Rust/WebAssembly/Web Worker-shaped peg that I wanted to work with, and a feeling that the Mandelbrot set was the right shaped hole to hammer it into. Luckily for me, the Mandelbrot set was actually a great fit, and through my project I learned a lot about these new technologies and web performance in general.
Tl;dr If you want to skip ahead to the final result, here’s the viewer: http://mandelbrot.rk0.xyz/ and here’s the associated GitHub repo: https://github.com/RyanKadri/mandelbrot-viewer.
What is the Mandelbrot Set?
Disclaimer: I am not a mathematician. For a more in-depth overview of the Mandelbrot set, check out https://www.youtube.com/watch?v=FFftmWSzgmk.
The Mandelbrot set is a mathematically-defined construct that is based on a very simple formula.
It can be plotted to produce an infinitely complex shape that looks like this:
If you were to progressively zoom in on the border of the black region, you would find an infinitely deep world of interesting topological features. With the idea of exploration in mind, I wanted to build a web-based tool, inspired by Google Maps, that lets the user move around the plot, zoom in or out, and play with some of the algorithm parameters to see how they affect the rendering. Like Google Maps, I wanted the user to be able to click and drag to pan, or scroll to zoom. I’d suggest using the viewer above to try it out.
How Does the Mandelbrot Set Work?
For all the complexity in the Mandelbrot plot, it is defined by this simple function:
and the following two details:
- The function is applied iteratively for every pixel in the graph and the pixel color is determined by the result.
- The function is evaluated using complex numbers rather than plain real numbers. Each pixel in the image corresponds to a point in the complex plane.
The function and these two details are the main source of visual complexity in the plot above. If you think that some additional “information” must be needed to calculate a shape with so much visual detail, you would not be alone. The fact that such simple rules can generate such interesting structures is one of the early inspirations for the study of Chaos Theory.
Complex Numbers
The following section is an overview of complex numbers. If you are already familiar with the topic, feel free to skip ahead to the next section.
Complex numbers are numbers that have a Real component and an Imaginary component.
- Real numbers are the normal numbers of everyday life. They include everything from negative infinity to positive infinity and all the fractions / decimals / irrational numbers in between.
- Imaginary numbers are a bit different. They allow you to answer the question “What is the square root of a negative number?” If it is unclear why the answer to that question is not a real number, try to think of a number that, when multiplied by itself, is -4. The basic “unit” of imaginary numbers is i, the square root of -1. Other imaginary numbers are expressed as multiples of i.
A complex number is the sum of a real number and an imaginary number.
In a graphical sense, a complex number can be visualized as a point in space. Whereas the real numbers have a “number line,” the complex numbers have a number plane. A complex number can be represented as the point (a,b) in the complex plane. The horizontal axis represents the size of the real part and the vertical axis represents the size of the imaginary part.
Complex numbers can be added, subtracted, and multiplied, but it takes a bit more thought than plain real numbers. To add complex numbers, separately add the real parts and the imaginary parts. To multiply complex numbers, use the FOIL Method to calculate the new real and complex parts.
Finally, complex numbers can have an absolute value. Graphically, this is the length of the line on the complex plane and can be calculated using the standard formula for the length of a 2D vector:
Calculating the Mandelbrot Set
To plot the Mandelbrot set, you iterate through each pixel in your plotting space and calculate its color. The plotting space represents a rectangle in the complex plane and every pixel in the plot represents an individual complex number. To decide the color of a pixel, you can use our function from above, plug in 0 for z, and plug in the coordinates of your pixel for c. Evaluate the result as a complex value. Do this calculation again, replacing z with the previous result and leaving c as your original pixel’s coordinates. The color of a pixel is determined by how fast the absolute value of z grows as you repeat this process infinitely. For instance, if c = 1, you will calculate the following partial sequence:
In this case, z grows quickly and would get a color that represents fast growth. Doing the same calculation with c = -1 gives a sequence where z grows slowly and would be given a contrasting color.
For our viewer, rather than iterate to infinity, we might iterate through the sequence at most 500 times and if the absolute value ever gets larger than 2, we can guess that the input grows quickly and is colored blue. Slow growth (z never gets larger than 2) will be associated with black.
There are a few important takeaways from what this algorithm will look like when converted to code:
- Complex number multiplication takes 4 multiplications and 3 additions. Pixels that correspond to a diverging number will take ~500 iterations of our function.
- A 640x640 plot has ~409k pixels. For some plotting regions, most coordinates will not diverge and will require all 500 iterations to confirm that. Pixels that diverge (and diverge quickly) are much cheaper to calculate.
- Therefore, some renderings of the Mandelbrot set may require hundreds of millions or even billions of arithmetic calculations.
- Most modern processors run a maximum of ~3 billion instructions per second in a single execution thread / core. Therefore, plotting the Mandelbrot set may be a computational challenge.
A Simple (Naive?) Mandelbrot Set Implementation
To get started on this project, I decided to plot a still image of the Mandelbrot set synchronously and without thinking about performance. Like a good programmer, I modeled complex numbers as an object with one field for the real part and one for the imaginary. For complex number arithmetic, I wrote pure functions that take in complex numbers and return new complex numbers. I used the Canvas API to plot the Mandelbrot set since I wanted the ability to draw pixels directly. I ended up not pulling in any dependencies for complex arithmetic or graphing because my use case was simple to implement and I wanted full control to tune for performance.
Here’s an abridged sample of my initial Mandelbrot set implementation. If you want to see the full plotting logic, check out my GitHub page above:
function plotChunkNaiveJS(/* Params */) {
// ... Setup
// Iterate through each coordinate in the plot. Determine how many iterations
// a pixel takes to diverge. Choose a pixel color based
for(let imagStep = 0; imagStep < height; imagStep ++) {
// Canvas orders pixels from top left to bottom right in reading order.
// The imaginary coordinate part is max_imag - (row_num x height_per_row)
const imagComp = minImag + imagRange - imagStep * imagInc;
for(let realStep = 0; realStep < width; realStep ++) {
const realComp = minReal + realStep * realInc;
const zInit = z(realComp, imagComp);
const iterations = iterateMandelbrotNaive(zInit, options);
// Choose a pixel color based on the number of iterations it took for
// iterateMandelbrotNaive to diverge
drawPixel(buffer, realStep, imagStep, iterations, chunkSize, options);
}
}
}
function iterateMandelbrotNaive(coord: Complex, options: PlotOptions): number {
let zn = z(0,0);
const { maxIterations, divergenceBound } = options;
for(let n = 0; n < maxIterations; n++) {
zn = add(mult(zn,zn), coord);
if(absSq(zn) > divergenceBound) {
return n;
}
}
return maxIterations
}
function z(real: number, imag: number): Complex {
return { real, imag }
}
function mult(c1: Complex, c2: Complex): Complex {
return {
real: c1.real * c2.real - c1.imag * c2.imag,
imag: c1.real * c2.imag + c1.imag * c2.real
}
}
function add(c1: Complex, c2: Complex): Complex {
// ...
}
function absSq(n: Complex) {
// ...
}
This leaves out some of the plotting logistics but captures the main idea of calculating the Mandelbrot set. The final drawPixel function call takes the iteration count and uses it to decide the final pixel color. Also note that the iterateMandelbrot function will short circuit when the absolute value of the complex number gets too large.
So how well does this initial attempt at plotting the Mandelbrot set work? The code above is functionally correct but takes a decent amount of time to compute. If I use this approach to plot the Mandelbrot set on my 1080x1920 monitor and a MacBook Pro, it takes around 7.5 seconds to render after page load has completed. Timings will vary depending on your specific hardware but it at least shows me that I’ll need to pay attention to performance. Remember that in this simple implementation, all of this calculation is done synchronously. A 7 second synchronous wait is always going to feel slow.
Improving the Naive Approach
So how do we improve the slow naive approach? A good place to start troubleshooting web performance problems is in the developer tools. Here is a screenshot of the flame chart for what the browser is doing while calculating the Mandelbrot set.
If you want to follow along with the performance troubleshooting, try using the live Mandelbrot viewer and select “Naive JS” and then uncheck “Calculate on Worker Threads”. Finally, open the developer tools, start recording a session in the Performance tab, and hit redraw. This should give a good performance metric since it is purely dependent on already-downloaded client-side code.
As a sanity check, it is a good sign that the main bulk of the rendering time is spent in functions that I’ve written (vs browser rendering). The next thing we’ll want to figure out is where the browser is spending the majority of its time. For this, I can look at the “Bottom-Up” view in the same performance tab:
This “Bottom-Up” view is pretty unusual. The browser is spending more than 3 seconds doing garbage collection (Minor GC in the screenshot) for a 7 second calculation. Garbage collection in modern browsers is pretty efficient and doesn’t normally take this long. Thinking a bit longer on the naive implementation above though, this result makes a lot of sense. Calculating the Mandelbrot set involves a huge number of complex number operations. By trying to adhere to functional programming practices and making each complex number a new object, I hit a huge performance trap. Every complex math operation created a new object that eventually needed to be garbage collected. In normal applications, object allocation is cheap and fast but since I was creating (up to) hundreds of millions of objects, it could add up.
So how could I reduce object allocation and therefore garbage collection? One option would be to drop the idea of immutability in my math operations. For instance, my math operations could modify one of the objects that I pass in. That felt sloppy though. Which operand would I modify? Would that choice be safe as I refactor my code? Besides, it just felt wrong to have a math operation called “add” and have it modify one of the numbers. Another operation might have been to use a somewhat exotic JS memory management technique like Object Pools. That felt like overkill for a fairly simple project and would’ve required additional dependencies and/or restructuring my code.
The optimization approach I decided to go with was not the cleanest but I felt like it was the best compromise in simplicity and readability. I opted to break up the complex number object back into primitive parts and pass them around as two separate parameters. If I wanted to reduce object allocation, I could just avoid objects altogether!
Here’s an example of what my code looked like with no objects:
function plotChunkJs(/* Params */) {
const { minReal, realRange, minImag, imagRange } = plot;
const { height, width } = chunkSize;
const realInc = realRange / width;
const imagInc = imagRange / height;
for(let imagStep = 0; imagStep < height; imagStep ++) {
const imagComp = (minImag + imagRange) - imagStep * imagInc;
for(let realStep = 0; realStep < width; realStep ++) {
const realComp = minReal + realStep * realInc;
const iterations = iterateMandlebrot(realComp, imagComp, options);
drawPixel(buffer, realStep, imagStep, iterations, chunkSize, options);
}
}
}
function iterateMandlebrot(realComp: number, imagComp: number, options: PlotOptions): number {
const { maxIterations, divergenceBound } = options;
const div2 = divergenceBound ** 2;
let real = 0, imag = 0;
for(let n = 0; n < maxIterations; n++) {
let newReal = real * real - imag * imag + realComp
imag = 2 * imag * real + imagComp;
real = newReal;
if(real ** 2 + imag ** 2 > div2) {
return n;
}
}
return maxIterations
}
The Mandelbrot set algorithm is simple enough that using separate primitives doesn’t hurt the code readability too much. Using primitives directly does, however, have a large effect on performance. Here’s another view of the developer tools using this optimized JavaScript:
All of the yellow frills at the bottom of the flame chart are gone and the Bottom-Up view shows no time spent in Garbage Collection at all!
No Garbage Collection in a long-running blocking task may seem surprising for developers who work in memory-managed languages, but it makes sense in this case. I am not creating any JavaScript objects so nothing needs to be collected. Rather, all of my primitive variables are stored directly on the stack and removed when a given stack frame is completed.
The last sentence is not necessarily 100% true, but does seem to be correct in the current version of the Chrome browser. JavaScript is a memory managed language and the language specification does not actually say anything about how JavaScript lays out memory. A sufficiently advanced interpreter / compiler would actually be free to analyze my code and decide to allocate my original Complex object type on the stack. At the moment, it does not seem like browsers do this, but if they ever get smart enough to do that, I would expect that my “naive” implementation of the Mandelbrot set would become much closer to my “optimized” version. For now, we can say that my optimized JavaScript saves garbage collection by allocating on the stack.
More Exotic Solutions: WebAssembly
Because this whole project started out as a way to learn about performance optimization, I did not want to stop at writing optimized JavaScript. I heard whisperings of a new language on the web. One that would right many of JavaScript’s wrongs. One that would open the Web to all the other languages that had been subjugated under JavaScript’s harsh rule. That language was WebAssembly.
WebAssembly is a lot like the JVM’s bytecode. It is not something that you would generally write by hand but it can be a compilation target from many different languages. WebAssembly, like bytecode or assembly language, is much closer to the actual machine code that executes on your computer. It has access to a linear block of memory, uses no garbage collection, and has few higher-level programming language features.
On the other hand, like JavaScript, WebAssembly in the browser does not give access to the underlying operating system and can be bound by standard browser sandboxing. Also, unlike machine code generated in C, WebAssembly guarantees safety from certain memory vulnerabilities like buffer overflows and control-flow hijacking.
Based on these characteristics, I guessed that WebAssembly might fit nicely into my Mandelbrot viewer project. I was looking for a really efficient way to do mathematical calculations and almost every casual comparison of programming languages I’ve read has said that compiled, statically typed languages tend to be faster than interpreted languages (think Java vs Python). Likewise, languages with data structures and programming constructs that are closer to the real memory and CPU architecture of a computer tend to be faster than those with more abstract structures (think C vs Haskell). Modern JavaScript fits into a weird in-between space between compiled and interpreted languages (check out JIT compilers for more info) and it definitely has some higher-level data structure complexity with closures and prototypes. Maybe WebAssembly can do better.
To create a WebAssembly version of the Mandelbrot viewer, I had to pick a language that would compile to WebAssembly. Luckily, Rust fit the bill perfectly. Diving into what Rust is all about or why it is awesome would require another few blog posts but at a high level, Rust is statically typed, low-level enough to write performant code, and most interestingly, provides memory management without a garbage collector. Also, with a tool called wasm-bindgen, you can compile Rust into WebAssembly with automatically generated wrappers to help call it from TypeScript. To compare WebAssembly and JavaScript “fairly”, I mostly just ported my TypeScript code directly to equivalent Rust. Here’s what that looked like:
impl Plot {
pub fn calc_pixels(&mut self) {
let mut index = 0;
let real_step = self.real_range / self.pixel_width as f64;
let imag_step = self.imag_range / self.pixel_height as f64;
for row in 0..self.pixel_height {
let imag_start = self.min_imag + self.imag_range - imag_step * row as f64;
for col in 0..self.pixel_width {
let real_start = self.min_real + real_step * col as f64;
let iterations = self.iterate_mandelbrot(real_start, imag_start);
self.draw_pixel(index, iterations);
index += 1;
}
}
}
fn iterate_mandelbrot(&self, real_start: f64, imag_start: f64) -> u32 {
let mut real = 0.0;
let mut imag = 0.0;
for n in 0..self.max_iterations {
let new_real = real * real - imag * imag + real_start;
imag = 2.0 * real * imag + imag_start;
real = new_real;
if real * real + imag * imag > self.divergence_bound {
return n;
}
}
self.max_iterations
}
}
Disclaimer: I am pretty new to Rust and this is intentionally a 1-1 port. Please be gentle.
Once I finished porting my Mandelbrot algorithm to Rust, I used wasm-bindgen to compile it to WebAssembly and Webpack to bundle it up with my JavaScript. This was the moment of truth. I ran my Mandelbrot calculation through the performance tools again and saw… not much. WebAssembly ran at about the same speed as my optimized JavaScript in Firefox and actually ran about 10% slower in Chrome1. This was very surprising! I was only context-switching between the languages once and I was not doing any unnecessary memory copying. I also ran the WebAssembly code multiple times after the page had completely loaded so there should not have been any sneaky network/startup-related issues. Unsure what was going on, I looked deeper at the performance tools. The Bottom Up view showed me mostly the same profile as my optimized JavaScript. There was almost no garbage collection. Almost all the computation time after I hit the redraw button to start the plotting process was spent doing the actual task of running code.
So why would my compiled, low-level, statically typed code be doing worse than the interpreted, high-level, dynamically typed wilderness that is JavaScript?
1 In Safari, the WebAssembly implementation ran 4 to 5 times faster than the JavaScript but the JavaScript ran 4 to 5 times slower than it did in Chrome and Firefox so the top-speed results were similar.
Long Live JavaScript
While looking at the performance tab for my WebAssembly code, I realized that Rust-generated WebAssembly is not slow. Rather, JavaScript is really fast. Going back to my point earlier, it is not totally accurate to call JavaScript an interpreted language. In modern browsers, JavaScript is interpreted when quick startup time is important. When performance is critical, the browser will transparently compile bits of JavaScript into highly optimized machine code. Also, while JavaScript is a dynamically typed language, my optimized code was highly static in nature. I did not create any Franken-Objects with tacked on properties or rely on deep prototype trees. My optimized code did what computers have been optimized to do since they were created. It ran mathematical functions with primitive, simple arguments. Code similar to what I wrote was probably some of the first to be optimized in browsers’ optimizing compilers.
JavaScript is a somewhat unique programming language in that it automatically was granted a huge market share by being the only client-side language that is directly supported by browsers. Because the web is so ubiquitous, there has been a huge demand to aggressively optimize JavaScript wherever possible. Some of the most popular JavaScript runtimes have large teams of expert engineers who have been optimizing the language for many years.
That is not to say that WebAssembly will not catch up to JavaScript. WebAssembly is a fairly new technology that has not had a chance to be heavily optimized in the same way. It also has characteristics that genuinely could give it an edge up on JavaScript. Some of the features of JavaScript that I avoided (dynamically changing object shapes for instance) can show up anywhere in a JavaScript codebase and can really hurt the compiler’s ability to optimize. JavaScript engines also have a number of performance “deoptimizations” where unusual code structures can stop optimizing compilers from producing efficient code. A much stricter language like Rust can often stop those situations from coming up in the first place.
Rust also has a few additional performance benefits that were relevant to my Mandelbrot plotting algorithm as well. Because Rust is statically typed and has more control over memory layout, it can already take advantage of a few optimizations that modern JavaScript engines like V8 miss. For instance, I should be able to port my “naive” JavaScript approach to Rust and have it run just as efficiently as my primitive-math Rust (although unfortunately not as fast as optimized JavaScript). If I define a “Complex” type in Rust, I have language-level guarantees that the object will only ever have one floating point field for the real part of the number and one floating point field for the imaginary part. With two simple fields, Rust can put my complex number directly on the stack just like JavaScript could for primitives. This means in Rust, I can have my cake and eat it too. I can write clean code that uses objects and not lose performance to Garbage Collection. In a simple project like my Mandelbrot set grapher, this might not be a huge deal. In projects with more complicated logic, being able to write clean code that still avoids garbage collection might be a huge win.
Takeaways
Over the course of this project, I have had the chance to break a few of the normal standard rules of clean coding. Rather than group related data into an object, I represented it as primitives and passed them in separate variables. Rather than writing single-responsibility helper functions to do calculations, I did my math inline. I also brought a new programming language and compilation target into my toolbox to try to squeeze as much single-threaded performance from the browser that I could. For most “normal” web applications, this amount of jumping through hoops for performance is not really necessary. Premature optimization is the root of many problems and apps are normally slowed down way more by I/O than computation. The web is constantly taking on more responsibilities though, and it is becoming possible to run more computationally intensive applications in the browser. 3D games, client-side image processing and other tasks that normally are reserved for native applications seem to be gaining a foothold in the browser. This is where tuning for computational performance may become important again.
Optimized JavaScript has a lot of potential for fast computation, but its new partner WebAssembly might open the door for a whole new set of ported applications written in Rust and other previously native-only languages. Although WebAssembly may not currently give us enormous gains against highly optimized JavaScript at the moment, it will only get faster with time. More importantly, for a large class of complex programs, statically typed, performance-focused languages like Rust will force developers to write code that ports naturally to highly performant WebAssembly. That development pathway may open the door for many new browser capabilities in the years to come. It’s exciting to see what we will build.