| [788bdd2] | 1 | ==============================================================================
|
|---|
| 2 | __ __ ___________ _
|
|---|
| 3 | \ \ / // ___| ___ \ | |
|
|---|
| 4 | \ V / \ `--.| |_/ / ___ _ __ ___| |__
|
|---|
| 5 | / \ `--. \ ___ \/ _ \ '_ \ / __| '_ \
|
|---|
| 6 | / /^\ \/\__/ / |_/ / __/ | | | (__| | | |
|
|---|
| 7 | \/ \/\____/\____/ \___|_| |_|\___|_| |_|
|
|---|
| 8 |
|
|---|
| 9 | Version 13
|
|---|
| 10 |
|
|---|
| 11 | ==============================================================================
|
|---|
| 12 | Contact Information
|
|---|
| 13 | ==============================================================================
|
|---|
| 14 |
|
|---|
| 15 | Organization: Center for Exascale Simulation of Advanced Reactors (CESAR)
|
|---|
| 16 | Argonne National Laboratory
|
|---|
| 17 |
|
|---|
| 18 | Development Lead: John Tramm <jtramm@anl.gov>
|
|---|
| 19 | Ron Rahaman <rahaman@anl.gov>
|
|---|
| 20 | Amanda Lund <alund@anl.gov>
|
|---|
| 21 |
|
|---|
| 22 | ==============================================================================
|
|---|
| 23 | What is XSBench?
|
|---|
| 24 | ==============================================================================
|
|---|
| 25 |
|
|---|
| 26 | XSBench is a mini-app representing a key computational kernel of the
|
|---|
| 27 | Monte Carlo neutronics application OpenMC.
|
|---|
| 28 |
|
|---|
| 29 | A full explanation of the theory and purpose of XSBench is provided in
|
|---|
| 30 | docs/XSBench_Theory.pdf.
|
|---|
| 31 |
|
|---|
| 32 | ==============================================================================
|
|---|
| 33 | Quick Start Guide
|
|---|
| 34 | ==============================================================================
|
|---|
| 35 |
|
|---|
| 36 | Download----------------------------------------------------------------------
|
|---|
| 37 |
|
|---|
| 38 | For the most up-to-date version of XSBench, we recommend that you
|
|---|
| 39 | download from our git repository. This can be accomplished via
|
|---|
| 40 | cloning the repository from the command line, or by downloading a zip
|
|---|
| 41 | from our github page. Alternatively, you can download a tar file from
|
|---|
| 42 | the CESAR website directly.
|
|---|
| 43 |
|
|---|
| 44 | Git Repository Clone:
|
|---|
| 45 |
|
|---|
| 46 | Use the following command to clone XSBench to your machine:
|
|---|
| 47 |
|
|---|
| 48 | >$ git clone https://github.com/jtramm/XSBench.git
|
|---|
| 49 |
|
|---|
| 50 | Once cloned, you can update the code to the newest version
|
|---|
| 51 | using the following command (when in the XSBench directory):
|
|---|
| 52 |
|
|---|
| 53 | >$ git pull
|
|---|
| 54 |
|
|---|
| 55 | Git Zip Download:
|
|---|
| 56 |
|
|---|
| 57 | Simply use the "zip download" option on our webpage at:
|
|---|
| 58 |
|
|---|
| 59 | https://github.com/jtramm/XSBench
|
|---|
| 60 |
|
|---|
| 61 | CESAR Tar Download:
|
|---|
| 62 |
|
|---|
| 63 | A tar of the XSBench source code is available
|
|---|
| 64 | on the CESAR website at the following URL:
|
|---|
| 65 |
|
|---|
| 66 | https://cesar.mcs.anl.gov/content/software/neutronics
|
|---|
| 67 |
|
|---|
| 68 | Once downloaded, you can decompress XSBench using the following
|
|---|
| 69 | command on a linux or Mac OSX system:
|
|---|
| 70 |
|
|---|
| 71 | >$ tar -xvf XSBench-11.tar
|
|---|
| 72 |
|
|---|
| 73 | This will decompress the tar file into a directory called
|
|---|
| 74 | XSBench-11.
|
|---|
| 75 |
|
|---|
| 76 | To begin use of the XSBench code, you will have to navigate to
|
|---|
| 77 | the src directory:
|
|---|
| 78 |
|
|---|
| 79 | >$ cd XSBench-11/src
|
|---|
| 80 |
|
|---|
| 81 | Compilation-------------------------------------------------------------------
|
|---|
| 82 |
|
|---|
| 83 | To compile XSBench with default settings, use the following
|
|---|
| 84 | command:
|
|---|
| 85 |
|
|---|
| 86 | >$ make
|
|---|
| 87 |
|
|---|
| 88 | Running XSBench---------------------------------------------------------------
|
|---|
| 89 |
|
|---|
| 90 | To run XSBench with default settings, use the following command:
|
|---|
| 91 |
|
|---|
| 92 | >$ ./XSBench
|
|---|
| 93 |
|
|---|
| 94 | For non-default settings, XSBench supports the following command line
|
|---|
| 95 | options:
|
|---|
| 96 |
|
|---|
| 97 | Usage: ./XSBench <options>
|
|---|
| 98 | Options include:
|
|---|
| 99 | -t <threads> Number of OpenMP threads to run
|
|---|
| 100 | -s <size> Size of H-M Benchmark to run (small, large, XL, XXL)
|
|---|
| 101 | -g <gridpoints> Number of gridpoints per nuclide
|
|---|
| 102 | -l <lookups> Number of Cross-section (XS) lookups
|
|---|
| 103 | Default (no arguments given) is equivalent to: -s large -l 15000000
|
|---|
| 104 |
|
|---|
| 105 | -t <threads>
|
|---|
| 106 |
|
|---|
| 107 | Sets the number of OpenMP threads to run. By default, XSBench
|
|---|
| 108 | will run with 1 thread per hardware core. If the architecture
|
|---|
| 109 | supports hyperthreading, multiple threads will be run per
|
|---|
| 110 | core.
|
|---|
| 111 |
|
|---|
| 112 | If running in MPI mode, this will be the number of threads
|
|---|
| 113 | per MPI rank.
|
|---|
| 114 |
|
|---|
| 115 | -s <size>
|
|---|
| 116 |
|
|---|
| 117 | Sets the size of the Hoogenboom-Martin reactor model. There
|
|---|
| 118 | are four options: 'small', 'large', 'XL', and 'XXL'. By default,
|
|---|
| 119 | the 'large' option is selected.
|
|---|
| 120 |
|
|---|
| 121 | The H-M size corresponds to the number of nuclides present
|
|---|
| 122 | in the fuel region. The small version has 34 fuel nuclides,
|
|---|
| 123 | whereas the large version has 321 fuel nuclides. This
|
|---|
| 124 | significantly slows down the runtime of the program as the
|
|---|
| 125 | data structures are much larger, and more lookups are required
|
|---|
| 126 | whenever a lookup occurs in a fuel material. Note that the
|
|---|
| 127 | program defaults to "Large" if no specification is made.
|
|---|
| 128 |
|
|---|
| 129 | The additional size options, "XL" and "XXL", do not directly correspond
|
|---|
| 130 | to any particular physical model. They are similar to the H-M
|
|---|
| 131 | "large" option, except the number of gridpoints per nuclide
|
|---|
| 132 | has been increased greatly. This creates an extremely
|
|---|
| 133 | large energy grid data structure (XL: 120GB, XXL: 252GB), which is
|
|---|
| 134 | unlikely to fit on a single node, but is useful for experimentation
|
|---|
| 135 | purposes on novel architectures.
|
|---|
| 136 |
|
|---|
| 137 | -g <gridpoints>
|
|---|
| 138 |
|
|---|
| 139 | Sets the number of gridpoints per nuclide. By default, this
|
|---|
| 140 | value is set to 11,303. This corresponds to the average number
|
|---|
| 141 | of actual gridpoints per nuclide in the H-M Large model as run
|
|---|
| 142 | by OpenMC with the actual ACE ENDF cross-section data.
|
|---|
| 143 |
|
|---|
| 144 | Note that this option will override the number of default grid
|
|---|
| 145 | -points as set by the '-s' option.
|
|---|
| 146 |
|
|---|
| 147 | -l <lookups>
|
|---|
| 148 |
|
|---|
| 149 | Sets the number of cross-section (XS) lookups to perform. By
|
|---|
| 150 | default, this value is set to 15,000,000. Users may want to
|
|---|
| 151 | increase this value if they wish to extend the runtime of
|
|---|
| 152 | XSBench, perhaps to produce more reliable performance counter
|
|---|
| 153 | data - as extending the run will decrease the percentage of
|
|---|
| 154 | runtime spent on initialization.
|
|---|
| 155 |
|
|---|
| 156 | ==============================================================================
|
|---|
| 157 | Debugging, Optimization & Profiling
|
|---|
| 158 | ==============================================================================
|
|---|
| 159 |
|
|---|
| 160 | There are also a number of switches that can be set in the makefile.
|
|---|
| 161 |
|
|---|
| 162 | Here is a sample of the control panel at the top of the makefile:
|
|---|
| 163 |
|
|---|
| 164 | COMPILER = gnu
|
|---|
| 165 | OPTIMIZE = yes
|
|---|
| 166 | DEBUG = no
|
|---|
| 167 | PROFILE = no
|
|---|
| 168 | MPI = no
|
|---|
| 169 | PAPI = no
|
|---|
| 170 | VEC_INFO = no
|
|---|
| 171 | VERIFY = no
|
|---|
| 172 | PAUSE = no
|
|---|
| 173 | BENCHMARK = no
|
|---|
| 174 | BINARY_DUMP = no
|
|---|
| 175 | BINARY_READ = no
|
|---|
| 176 |
|
|---|
| 177 | -> Optimization enables the -O3 optimization flag.
|
|---|
| 178 |
|
|---|
| 179 | -> Debugging enables the -g flag.
|
|---|
| 180 |
|
|---|
| 181 | -> Profiling enables the -pg flag.
|
|---|
| 182 |
|
|---|
| 183 | -> MPI enables MPI support in the code.
|
|---|
| 184 |
|
|---|
| 185 | -> The PAPI flag is explained below.
|
|---|
| 186 |
|
|---|
| 187 | -> VEC_INFO enables some additional information regarding the success or
|
|---|
| 188 | failure of the compiler's use of vectorization techniques during
|
|---|
| 189 | compilation.
|
|---|
| 190 |
|
|---|
| 191 | -> VERIFY enables a verification mode, the details of which are explained below.
|
|---|
| 192 |
|
|---|
| 193 | -> Benchmark mode tests all possible thread configurations on the given
|
|---|
| 194 | computer. I.e., if your computer supports 16 threads, XSBench will
|
|---|
| 195 | automatically do 1 <= nthreads <= 16 lookup loops
|
|---|
| 196 |
|
|---|
| 197 | -> Binary dump mode writes a binary file containing a randomized data set
|
|---|
| 198 | of cross sections. This can be used in tandem with the binary read mode
|
|---|
| 199 | to skip generation of cross section data every time the program is run.
|
|---|
| 200 |
|
|---|
| 201 | -> Binary read mode reads the binary file created by the binary dump mode
|
|---|
| 202 | as a (usually) much faster substitution for randomly generating XS
|
|---|
| 203 | data on-the-fly. This mode is particularly useful if running on
|
|---|
| 204 | simulators where walltime minimization is extremely critical for
|
|---|
| 205 | logistical reasons.
|
|---|
| 206 |
|
|---|
| 207 | ==============================================================================
|
|---|
| 208 | MPI Support
|
|---|
| 209 | ==============================================================================
|
|---|
| 210 |
|
|---|
| 211 | While XSBench is primarily used to investigate "on node parallelism" issues,
|
|---|
| 212 | some systems provide power & performance statistics batched in multi-node
|
|---|
| 213 | configurations. To accommodate this, XSBench provides an MPI mode which
|
|---|
| 214 | runs the code on all MPI ranks simultaneously. There is no decomposition
|
|---|
| 215 | across ranks of any kind, and all ranks accomplish the same work. There is
|
|---|
| 216 | only one point of MPI communication (a reduce) at the end, which aggregates
|
|---|
| 217 | the timing statistics and averages them across MPI ranks before printing them
|
|---|
| 218 | out.
|
|---|
| 219 |
|
|---|
| 220 | MPI support can be enabled with the makefile flag "MPI". If you are not using
|
|---|
| 221 | the mpicc wrapper on your system, you may need to alter the makefile to
|
|---|
| 222 | make use of your desired compiler.
|
|---|
| 223 |
|
|---|
| 224 |
|
|---|
| 225 | ==============================================================================
|
|---|
| 226 | Verification Support
|
|---|
| 227 | ==============================================================================
|
|---|
| 228 |
|
|---|
| 229 | XSBench has the ability to verify that consistent and correct results are
|
|---|
| 230 | achieved. This mode is enabled by altering the "VERIFY" setting to 'yes' in
|
|---|
| 231 | the makefile, i.e.:
|
|---|
| 232 |
|
|---|
| 233 | VERIFY = yes
|
|---|
| 234 |
|
|---|
| 235 | Once enabled, the code will generate a hash of the results and display it
|
|---|
| 236 | with the other data once the code has completed executing. This hash can
|
|---|
| 237 | then be verified against hashes that other versions or configurations of
|
|---|
| 238 | the code generate. For instance, running XSBench with 4 threads vs 8 threads
|
|---|
| 239 | (on a machine that supports that configuration) should generate the
|
|---|
| 240 | same hash number. Changing the model / run parameters should NOT generate
|
|---|
| 241 | the same hash number (i.e., increasing the number of lookups, number
|
|---|
| 242 | of gridpoints, etc, will result in different hashes).
|
|---|
| 243 |
|
|---|
| 244 | Verification mode uses a RNG with a static seed. The randomized lookup
|
|---|
| 245 | parameters are generated within a critical region. This ensures that the
|
|---|
| 246 | same set of lookups are performed regardless of the number of threads
|
|---|
| 247 | used. Then, after each lookup is completed, the lookup parameters and
|
|---|
| 248 | the cross section vector are hashed together. This local hash is then
|
|---|
| 249 | atomically added to a global running hash.
|
|---|
| 250 |
|
|---|
| 251 | Note that the verification mode runs much slower, due to the use of
|
|---|
| 252 | atomics within the threading loop.
|
|---|
| 253 |
|
|---|
| 254 | Below are the expected checksums for default runs of each size (-s):
|
|---|
| 255 |
|
|---|
| 256 | small : 74966788162
|
|---|
| 257 | large : 74994938929
|
|---|
| 258 |
|
|---|
| 259 | ==============================================================================
|
|---|
| 260 | PAPI Performance Counters
|
|---|
| 261 | ==============================================================================
|
|---|
| 262 |
|
|---|
| 263 | PAPI performance counters is a performance counting library that can
|
|---|
| 264 | offer information regarding the frequency of specific events (such as
|
|---|
| 265 | memory loads, cache misses, branch prediction failures, etc) that occur
|
|---|
| 266 | when the code is executed. XSBench supports use of these performance
|
|---|
| 267 | counters, although it is left to the user to select the particular
|
|---|
| 268 | performance counters and locations to instrument.
|
|---|
| 269 |
|
|---|
| 270 | By default, PAPI is disabled.
|
|---|
| 271 |
|
|---|
| 272 | To enable PAPI, set in the makefile:
|
|---|
| 273 |
|
|---|
| 274 | PAPI = yes
|
|---|
| 275 |
|
|---|
| 276 | Note that you may need to change the relevant library paths for papi to
|
|---|
| 277 | work (as these are dependent on your machine). The library path can be
|
|---|
| 278 | specified in the makefile, and the header path is specified in the
|
|---|
| 279 | XSBench_header.h file.
|
|---|
| 280 |
|
|---|
| 281 | To select the performance counters you are interested in, open
|
|---|
| 282 | the file papi.c and alter the events[] array to the events
|
|---|
| 283 | you would like to count.
|
|---|
| 284 |
|
|---|
| 285 | ==============================================================================
|
|---|
| 286 | Binary File Support
|
|---|
| 287 | ==============================================================================
|
|---|
| 288 |
|
|---|
| 289 | The flags:
|
|---|
| 290 |
|
|---|
| 291 | BINARY_DUMP = no
|
|---|
| 292 | BINARY_READ = no
|
|---|
| 293 |
|
|---|
| 294 | Can be set to yes in order to write or read a binary file containing
|
|---|
| 295 | a randomized XS data set (both nuclide grids and unionized grids). This
|
|---|
| 296 | feature may be extremely useful for users running on simulators where
|
|---|
| 297 | walltime minimization is critical for logistical purposes, or for users
|
|---|
| 298 | who are doing many sequential runs.
|
|---|
| 299 |
|
|---|
| 300 | Note that identical input parameters (problem size, etc) must be used
|
|---|
| 301 | when reading and writing a binary file. No runtime checks are made
|
|---|
| 302 | to validate that the file correctly corresponds to the selected input
|
|---|
| 303 | parameters.
|
|---|
| 304 |
|
|---|
| 305 | ==============================================================================
|
|---|
| 306 | Running on ANL BlueGene/Q (Vesta & Mira)
|
|---|
| 307 | ==============================================================================
|
|---|
| 308 |
|
|---|
| 309 | Compilation is done using the included makefile, as follows:
|
|---|
| 310 |
|
|---|
| 311 | >$ make MACHINE=bluegene
|
|---|
| 312 |
|
|---|
| 313 | Note that the INFO macro in the XSbench_header.h file should be set to
|
|---|
| 314 | 0 when running on BG/Q to remove the run status portions of the output,
|
|---|
| 315 | which cuts down on unnecessary file I/O, i.e.:
|
|---|
| 316 |
|
|---|
| 317 | #define INFO 0
|
|---|
| 318 |
|
|---|
| 319 | Also, note that you may need to add the following line to your .soft
|
|---|
| 320 | file in order to use the mpicc compiler wrapper:
|
|---|
| 321 |
|
|---|
| 322 | +mpiwrapper-gcc
|
|---|
| 323 |
|
|---|
| 324 | Then, be sure to use the "resoft" command to update your software, i.e.,:
|
|---|
| 325 |
|
|---|
| 326 | >$ resoft
|
|---|
| 327 |
|
|---|
| 328 | When running in c16 mode, the maximum number of gridpoints per nuclide
|
|---|
| 329 | is 900 (when running in "Large" mode). More points will cause the 1GB
|
|---|
| 330 | memory limit to be broken.
|
|---|
| 331 |
|
|---|
| 332 | A basic test run on 1 node can be achieved (assuming you have an allocation)
|
|---|
| 333 | using the makefile and the following command:
|
|---|
| 334 |
|
|---|
| 335 | >$ make bgqrun
|
|---|
| 336 |
|
|---|
| 337 | Further information on queuing can be found at:
|
|---|
| 338 |
|
|---|
| 339 | https://www.alcf.anl.gov/resource-guides/vesta-queuing
|
|---|
| 340 |
|
|---|
| 341 | ==============================================================================
|
|---|
| 342 | Citing XSBench
|
|---|
| 343 | ==============================================================================
|
|---|
| 344 |
|
|---|
| 345 | Papers citing the XSBench program in general should refer to:
|
|---|
| 346 |
|
|---|
| 347 | J. R. Tramm, A. R. Siegel, T. Islam, and M. Schulz, “XSBench - The
|
|---|
| 348 | Development and Verification of a Performance Abstraction for Monte
|
|---|
| 349 | Carlo Reactor Analysis,” presented at PHYSOR 2014 - The Role
|
|---|
| 350 | of Reactor Physics toward a Sustainable Future, Kyoto.
|
|---|
| 351 |
|
|---|
| 352 | A PDF of this paper can be accessed directly at this link:
|
|---|
| 353 |
|
|---|
| 354 | http://www.mcs.anl.gov/papers/P5064-0114.pdf
|
|---|
| 355 |
|
|---|
| 356 | Bibtex Entry:
|
|---|
| 357 |
|
|---|
| 358 | @inproceedings{Tramm:wy,
|
|---|
| 359 | author = {Tramm, John R and Siegel, Andrew R and Islam, Tanzima and Schulz, Martin},
|
|---|
| 360 | title = {{XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis}},
|
|---|
| 361 | booktitle = {PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future},
|
|---|
| 362 | address = {Kyoto}
|
|---|
| 363 | }
|
|---|
| 364 |
|
|---|
| 365 | ==============================================================================
|
|---|