Reduce的并行加速

CUDA

1. 采用Divergence的支持和Block同步来支持

image.pngimage.png

2. 其他的深度优化:https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf

DSA/ASIC


Revision #5
Created 2025-01-13 03:34:38 UTC by Colin
Updated 2026-04-29 07:33:25 UTC by Colin