23 C
New York
Wednesday, May 1, 2024

Response-Diffusion Compute Shader in WebGPU


WebGPU Reaction-Diffusion Compute Shader demo screenshot

WebGPU represents a big step ahead in net graphics expertise, enabling net pages to make the most of a tool’s GPU for enhanced rendering capabilities. It’s a sensible improve that enhances the efficiency of net graphics, constructing upon the inspiration laid by WebGL.

Initially launched in Google Chrome in April 2023, WebGPU is steadily increasing to different browsers equivalent to Safari and Firefox. Whereas nonetheless in growth, its potential is obvious.

With WebGPU, builders can create compelling 3D graphics on HTML canvases and carry out GPU computations effectively. It comes with its personal language, WGSL, simplifying growth processes.

On this tutorial we’ll soar straight to a really particular WebGPU approach: utilizing compute shaders for picture results. Should you’d wish to get a stable understanding of WebGPU first, I extremely advocate the next introductory tutorials earlier than persevering with this one: Your first WebGPU app and WebGPU Fundamentals.

Should you’d wish to be taught in regards to the specifics of the reaction-diffusion algorithm, take a look at these assets: Response-Diffusion Tutorial by Karl Sims and Response Diffusion Algorithm in p5.js by The Coding Practice.

In the meanwhile the demo solely runs in Chrome, so here’s a brief video of what it ought to seem like:

Browser Assist:

  • ChromeSupported from model 113+
  • FirefoxNot supported
  • Web ExplorerNot supported
  • SafariNot supported
  • OperaNot supported

Overview

On this tutorial, we’ll discover a key facet of WebGPU which is leveraging compute shaders for picture results. Coming from a WebGL background, it was relatively troublesome for me to know how you can effectively use compute shaders for picture results that contain a convolution with a filter kernel (e.g., a gaussian blur). So on this tutorial, I’ll give attention to one methodology of utilizing compute shaders for such functions. The tactic I current is predicated on the picture blur pattern from the good WebGPU samples web site.

Programme Construction

On this tutorial we’ll solely go into the main points of some attention-grabbing elements of the demo utility. Nonetheless, I hope that you’ll find your manner across the supply code with the assistance of the inline feedback.

The primary constructing blocks are two WebGPU pipelines:

  • A compute pipeline which runs a number of iterations of the reaction-diffusion algorithm (js/rd-compute.js and js/shader/rd-compute-shader.js).
  • A render pipeline which takes the results of the compute pipeline and creates the ultimate composition by rendering a fullscreen triangle (js/composite.js and js/shader/composite-shader.js).

WebGPU is a really chatty API and to make it just a little simpler to work with, I exploit the webgpu-utils library by Gregg Tavares. Moreover, I’ve included the float16 library by Kenta Moriuchi which is used to create and replace the storage textures for the compute pipeline.

Compute Workflow

A typical methodology of working a reaction-diffusion simulation on the GPU is to make use of one thing that I imagine is named “texture ping-ponging”. This includes creating two textures. One texture holds the present state of the simulation to be learn, and the opposite shops the results of the present iteration. After every iteration the textures are swapped.

This methodology can be carried out in WebGL utilizing a fraction shader and framebuffers. Nonetheless, in WebGPU we will obtain the identical factor utilizing a compute shader and storage textures as buffers. The benefit of that is that we will write on to any pixel inside the texture we would like. We additionally get the efficiency advantages that include compute shaders.

Initialisation

The very first thing to do is to initialise the pipeline with all the required structure descriptors. As well as, all buffers, textures, and bind teams have to be arrange. The webgpu-utils library actually saves lots of work right here.

WebGPU doesn’t assist you to change the dimensions of buffers or textures as soon as they’ve been created. So we’ve to tell apart between buffers that don’t change in dimension (e.g., uniforms) and buffers that change in sure conditions (e.g., textures when the canvas is resized). For the latter, we’d like a way to recreate them and dispose the previous ones if vital.

All textures used for the reaction-diffusion simulation are a fraction of the dimensions of the canvas (e.g., 1 / 4 of the canvas dimension). The decrease quantity of pixels to course of frees up computing assets for extra iterations. Subsequently, a quicker simulation with comparatively little visible loss is feasible.

Along with the 2 textures concerned within the “texture ping-ponging”, there may be additionally a 3rd texture within the demo which I name the seed texture. This texture accommodates the picture knowledge of an HTML canvas on which the clock letters are drawn. The seed texture is used as a form of affect map for the reaction-diffusion simulation to visualise the clock letters. This texture, in addition to the corresponding HTML canvas, should even be recreated/resized when the WebGPU canvas will get resized.

Operating the Simulation

With all the required initialisation executed, we will give attention to truly working the reaction-diffusion simulation utilizing a compute shader. Let’s begin by reviewing some normal features of compute shaders.

Every invocation of a compute shader processes numerous threads in parallel. The variety of threads is outlined by the compute shader’s workgroup dimension. The variety of invocations of the shader is outlined by the dispatch dimension (complete variety of threads = workgroup dimension * dispatch dimension).

These dimension values are laid out in three dimensions. So a compute shader that processes 64 threads in parallel would possibly look one thing like this:

@compute @workgroup_size(8, 8, 1) fn compute() {}

Operating this shader 256 instances, which makes 16,384 threads, requires a dispatch dimension like this:

go.dispatchWorkgroups(16, 16, 1);

The reaction-diffusion simulation requires us to adress each pixel of the textures. One method to obtain that is to make use of a workgroup dimension of 1 and a dispatch dimension equal to the whole variety of pixels (which might one way or the other imitate a fraction shader). Nonetheless, this may not be very performant as a result of a number of threads inside a workgroup are quicker than particular person dispatches.

Alternatively, one would possibly recommend to make use of a workgroup dimension equal to the variety of pixels and solely name it as soon as (dispatch dimension = 1). But, this isn’t potential as a result of the utmost workgroup dimension is proscribed. A normal recommendation for WebGPU is to decide on a workgroup dimension of 64. This requires that we divide the variety of pixels inside the texture into blocks the dimensions of a workgroup (= 64 pixels) and dispatch the workgroups typically sufficient to cowl your entire texture. This may hardly ever work out precisely, however our shader can deal with that.

So now we’ve a relentless worth for the dimensions of a workgroup and the flexibility to search out the suitable dispatch dimension to run our simulation. However, there may be extra we will optimise.

Pixels per Thread

To make every workgroup cowl a bigger space (extra pixels) we introduce a tile dimension. The tile dimension defines what number of pixels every particular person thread processes. This requires us to make use of a nested for loop inside the shader, so we would need the maintain the tile dimension very small (e.g., 2×2).

Pixel Cache

An important step for working the reaction-diffusion simulation is the convolution with the laplacian kernel which is a 3×3 matrix. So, for every pixel we course of, we’ve to learn all 9 pixels that the kernel covers as a way to carry out the calculation. Because of the kernel overlap from pixel to pixel, there shall be lots of redundant texture reads.

Happily, compute shaders enable us to share reminiscence throughout threads. So we will create what I name a pixel cache. The concept (from the picture blur pattern) is that every thread reads the pixels of its tile and writes them to the cache. As soon as each thread of the workgroup has saved its pixels within the cache (we guarantee this with a workgroup barrier), the precise processing solely wants to make use of the prefetched pixels from the cache. Therefore it doesn’t require any additional texture reads. The construction of the compute perform would possibly look one thing like this:

// the pixel cache shared accross all threads of the workgroup
var<workgroup> cache: array<array<vec4f, 128>, 128>;

@compute @workgroup_size(8, 8, 1)
fn compute_main(/* ...builtin variables */ ) {

  // add the pixels of this thread's tiles to the cache
  for (var c=0u; c<2; c++) {
    for (var r=0u; r<2; r++) {
      // ... calculate the pixel coords from the builtin variables
      // retailer the pixel worth within the cache
      cache[y][x] = worth;
    }
  }

  // do not proceed till all threads have reached this level
  workgroupBarrier();

  // course of each pixel of this threads tile
  for (var c=0u; c<2; c++) {
    for (var r=0u; r<2; r++) {
        // ...carry out reaction-diffusion algorithm
        textureStore(/* ... */);
      }
    }
  }
}

However there’s one other difficult facet we’ve to be careful for: the kernel convolution requires us to learn extra pixels than we finally course of. We may prolong the pixel cache dimension. Nonetheless, the dimensions of the reminiscence shared by the threads of a workgroup is proscribed to 16,384 bytes. Subsequently we’ve to lower the dispatch dimension by (kernelSize - 1)/2 on either side. Hopefully the next illustration will make these steps clearer.

UV Distortion

One drawback of utilizing the compute shader in comparison with the fragment shader answer is that you simply can not use a sampler for the storage textures inside a compute shader (you may solely load integer pixel coordinates). If you wish to animate the simulation by shifting the feel area (i.e., distorting the UV coordinates in fractional increments), it’s a must to do the sampling your self.

One method to cope with that is to make use of a handbook bilinear sampling perform. The sampling perform used within the demo is predicated on the one proven right here, with some changes to be used inside a compute shader. This enables us to pattern fractional pixel values:

fn texture2D_bilinear(t: texture_2d<f32>, coord: vec2f, dims: vec2u) -> vec4f {
    let f: vec2f = fract(coord);
    let pattern: vec2u = vec2u(coord + (0.5 - f));
    let tl: vec4f = textureLoad(t, clamp(pattern, vec2u(1, 1), dims), 0);
    let tr: vec4f = textureLoad(t, clamp(pattern + vec2u(1, 0), vec2u(1, 1), dims), 0);
    let bl: vec4f = textureLoad(t, clamp(pattern + vec2u(0, 1), vec2u(1, 1), dims), 0);
    let br: vec4f = textureLoad(t, clamp(pattern + vec2u(1, 1), vec2u(1, 1), dims), 0);
    let tA: vec4f = combine(tl, tr, f.x);
    let tB: vec4f = combine(bl, br, f.x);
    return combine(tA, tB, f.y);
}

That is how the pulsating motion of the simulation from the centre that may be seen within the demo was created.

Parameter Animation

One of many issues I actually like about reaction-diffusion is the number of completely different patterns you will get by altering only a few parameters. Should you then animate these adjustments over time or in response to consumer interplay, you will get actually attention-grabbing results. Within the demo, for instance, some parameters change relying on the gap from the centre or the pace of the pointer.

Composition Rendering

With the reaction-diffusion simulation executed, the one factor left is to attract the outcome to the display screen. That is the job of the composition render pipeline.

I simply wish to give a short overview of the steps concerned within the demo utility. Nonetheless, these rely very a lot on the fashion you wish to obtain. Listed below are the principle changes made in the course of the composition go of the demo:

  1. Bulge distortion: Earlier than sampling the reaction-diffusion outcome texture, a bulge distortion is utilized to the UV coordinates (based mostly on this shadertoy code). This provides a way of depth to the scene.
  2. Color: A color palette is utilized (from Inigo Quilez)
  3. Emboss filter: A easy emboss impact provides the “veins” some quantity.
  4. Faux iridescence: This refined impact is predicated on a special color palette, however is utilized to the unfavourable area of the embossing outcome. The pretend iridescence makes the scene appear just a little bit extra vibrant.
  5. Vignette: A vignette overlay is used to darken the sides.

Conclusion

So far as efficiency is anxious, I’ve created a really primary efficiency check between a fraction variant and the compute variant (together with bilinear sampling). At the very least on my machine the compute variant is quite a bit quicker. The efficiency checks are in a separate folder within the repository – solely a flag within the fundamental.js must be modified to match fragment with compute (GPU time measured with timestamp-query API).

I’m nonetheless very new to WebGPU growth. If something in my tutorial will be improved or isn’t appropriate, I might be completely satisfied to listen to about it.

Sadly, I couldn’t go into each element and will solely clarify the thought behind utilizing a compute shader for working a reaction-diffusion simulation very superficially. However I hope you loved this tutorial and that you simply would possibly be capable of take just a little one thing away with you in your personal initiatives. Thanks for studying!



Supply hyperlink

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles