- Implemented a baseline many-core parallel algorithm for triangle counting using General Matrix Multiplication
- Optimized the algorithm by GPU specialized sparse matrix application, reducing runtime by 2-3 times
- Optimized the algorithm further by thread coarsening and privatization, accelerated runtime by 25%
- Counted the triangles with 200K nodes within 50s