Case study: integer sort from CORAL benchmarks
https://asc.llnl.gov/CORAL-benchmarks/#intsort
The benchmark exercises OpenMP threads, MPI-IO, Posix IO, and all-to-all communications with both small and large message sizes. The majority of the time is spent in IO.
It has three major phases:
- Hash all elements to
Nbins, whereN = # of nodes. Note the benchmark uses1MPI rank per node. By the end of this phase, each bin contains all elements falling within a sub range. For example, the first bin contains numbers ranging from0toT/N. Each bin is then stored as a “bin file”. In this phase, each rank performs three steps:
- Read a part (proportional to the DRAM size) of the input file (each rank reads a different part)
- Hash this part locally according to the bins defined above.
- Exchange elements among MPI ranks (all-to-all communication) to send them to the MPI rank that owns the corresponding bin.
- Each rank writes the received elements to its “bin file”.
- Sort each "bin file" individually in each node. (Note: we assume the numbers are uniformly distributed so there is little concern for load balancing). No MPI communications. Each MPI rank treats its bin file as a collection of chunks, each chunk is of
page_size(page_sizeis recommended to be no smaller than the page size of the file system).
- Sort chunks in parallel (with OpenMP)
- Merge every M chunks, repeat until there is only 1 chunk left. This chunk is the sorted bin file.
Note:
M = DRAM_size/page_size, so the number of merging stages islog_M_(T/N/page_size)
- Concatenate all sorted bin files together into a single file.
Last modified
12 years ago
Last modified on 02/12/14 16:48:24
Note:
See TracWiki
for help on using the wiki.
