1. Parsec Benchmark http://parsec.cs.princeton.edu 2. Hybrid benchmarks (with mpi cuda ...) - CORAL https://asc.llnl.gov/CORAL-benchmarks - Multi-zone NAS Parallel Benchmarks https://www.nas.nasa.gov/assets/pdf/techreports/2003/nas-03-010.pdf https://www.nas.nasa.gov/cgi-bin/software/download 3. Start to look at MPI 3 4. Memory - Traditional multi-core weak memory : non blocking data structure http://htor.inf.ethz.ch/publications//img/hoefler-dsde-protocols.pdf Algorithm 2 : Nonblocking consensus - uses MPI-3 non-blocking collectives - Disjoint memory space (CPU, GPU): CUDA 5. Lock-free data structure benchmarks http://www.cse.iitk.ac.in/users/mainakc/lockfree.html 6. (Matt) Hybrid Examples from courses: - UW course: CSS 534: Parallel Programming in Grid and Cloud - Programming Tasks HW 2 is a good example: http://courses.washington.edu/css534/prog/prog2.pdf - A Georgia Tech course: CSE 6230: HPC Tools and Apps. — CSE 6230: HPC Tools and Apps and the relevant assignment: http://stumptown.cc.gt.atl.ga.us:8080/cse6230-hpcta-fa09/hw3.pdf - A course at cornell: http://www.cac.cornell.edu/education/Training/Intro/Hybrid-090529.pdf - Another course ITCS 4145 Cluster Computing and assignment 4: http://coitweb.uncc.edu/~abw/ITCS4145S13/Assignments/assign4S13.pdf - A course in Sweden: http://www.pdc.kth.se/education/tutorials/mpi/hybrid-lab/advanced-programming-lab-hybrid-openmp-mpi-programming 7. (Steve) Paper on Breadth first search using MPI+OpenMP: http://dx.doi.org/10.1109/CLUSTER.2012.29 authors mention they might release their software