1. 采用Divergence的支持和Block同步来支持
2. 其他的深度优化:https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
No Comments