Triton

高层次Kernel开发语言Triton 
 The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with higher flexibility than other existing DSLs. 
 https://github.com/openai/triton 
 https://triton-lang.org/main/index.html 
 Triton更像是一个面向AI加速器算子开发的领域开发语言，为了能够将用户使用Triton语言开发的kernel映射到具体硬件上的执行码，需要设计开发相应的Triton compiler来完成这层映射。所以当我们说Triton的时候，其实隐指了 Triton语言+Triton编译器 这两个事物的综合体。 
 Triton的核心设计思想---- Block-wise编程，Block上面的归用户，Block内部的归Triton compiler自动化处理 。相应地，Block内部的优化细节，也交由Triton compiler处理了。 
 优化Passes在Triton当前的实现里主要包括了NV GPU计算kernel优化的一些常见技巧，包括用于辅助向量化访存的 coalescing 、用于缓解计算访存差异的 pipeline / prefetch ，用于避免shared memory访问bank-conflict的 swizzling 。