High Performance GPU Computing

Posted by Susie on May 15, 2019
  • Implemented a baseline many-core parallel algorithm for triangle counting using General Matrix Multiplication
  • Optimized the algorithm by GPU specialized sparse matrix application, reducing runtime by 2-3 times
  • Optimized the algorithm further by thread coarsening and privatization, accelerated runtime by 25%
  • Counted the triangles with 200K nodes within 50s