1. Parsec Benchmark 2. Hybrid programs with mpi cuda ... 3. Start to look at MPI 3 4. - Traditional multi-core weak memory : non blocking data structure - Disjoint memory space (CPU, GPU): CUDA 5. Lock-free data structure benchmarks http://www.cse.iitk.ac.in/users/mainakc/lockfree.html http://htor.inf.ethz.ch/publications//img/hoefler-dsde-protocols.pdf Look at Algorithm 2 : Nonblocking consensus - uses MPI-3 non-blocking collectives 6. Hybrid Examples from courses: UW course: CSS 534: Parallel Programming in Grid and Cloud - Programming Tasks HW 2 is a good example: courses.washington.edu/css534/prog/prog2.pdf A Georgia Tech course: CSE 6230: HPC Tools and Apps. — CSE 6230: HPC Tools and Apps and the relevant assignment: stumptown.cc.gt.atl.ga.us:8080/cse6230-hpcta-fa09/hw3.pdf A course at cornell: http://www.cac.cornell.edu/education/Training/Intro/Hybrid-090529.pdf Another course ITCS 4145 Cluster Computing and assignment 4: coitweb.uncc.edu/~abw/ITCS4145S13/Assignments/assign4S13.pdf