Skip to content

Inefficient compute pipeline caching mechanism in webGPU #32735

@DomenicoBruzzese

Description

@DomenicoBruzzese

Description

I tried to understand all the relevant bits of the codebase but please do let me know if I'm missing something here.

Objective: I need to run the same compute shader multiple times, each time with different input / output buffers. In webGPU, ideally you would batch as many calls as possible before submitting the encoder / passes to the device.

Problem: Three.js makes this pattern inefficient or impossible because Compute Pipelines are cached based on the ComputeNode instance ID, not the underlying shader logic + binding layout.

If I want to run the same compute logic on 1,000 different objects (buffers), I must create 1,000 ComputeNode instances, and because Pipelines.js uses computeNode.id as part of the cache key, Three.js compiles and creates 1,000 separate GPUComputePipeline objects, which can be really wasteful especially for long and complex compute shaders

renderer.compute(arrayOfNodes); does help with batching since a single encoder is used across all passes and will result in a single device.submit, however it does not solve the pipeline duplication issue described above

Solution

The renderer should detect that multiple ComputeNode instances share the same TSL logic (and thus the same WGSL source and BindGroupLayout). It should reuse the existing GPUComputePipeline regardless of the ComputeNode.id.
This would allow me to:

  1. define a TSL shader function once.
  2. instantiate it multiple times with different input buffers.
  3. Batch execution with renderer.compute([node1, node2, ...]).
  4. and finally, this would result in 1 Pipeline creation and N Dispatches within a single command encoder (optimal batching for performance)

Alternatives

To my understanding, there is no alternative other than patching the library and / or reaching for the device internals to effectively write it out in raw webGPU.

If you re-use the same ComputeNode but change the array of the storage buffers and then set .needsUpdate = true, you can't batch multiple calls with different inputs, and if you instead create gigantic inputBuffers to store the inputs and outputs of multiple passes, you might run out of memory if the amount of work you're doing is non-trivial, which makes this avenue a non-option in a lot of circumstances

Please do let me know if I'm missing something

Additional context

More context: the issue lies in the caching mechanism:

_getComputeCacheKey( computeNode, stageCompute ) {
     return computeNode.id + ',' + stageCompute.id;
}

which will always create a new pipeline even if the shader code & bindings layout didn't change, and this unfortunately prevents the optimal batching strategy since we'll end up creating the same pipeline many times (imagine 300 passes of a very complex shader, which potentially requires re-compilation each time, even if the pipeline + layout is identical)

A potential solution could be to use instead:

_getComputeCacheKey( stageCompute, bindings ) {
     return stageCompute.id + ',' +  this.backend.getComputeBindingsLayoutKey( bindings );
}

stageCompute is already indexing on the wgsl source itself thus the id wouldn't change if the code is the same, and creating a key from the layout of the bindings (being very careful to remove the id of the binding itself, and just use the generated layout key) would create the correct caching mechanism for pipelines and give us the potential to optimally batch / schedule multiple passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions