===================================================================== NEW IN VERSION 13 ===================================================================== - (Feature) Added in the ability for XSBench to write out a binary file containing a randomized XS dataset. The code is also capable of reading in this file instead of generating a new XS dataset each time the program is run. This feature may be useful for those running in simulation environments where walltime minimization is key for logistical reasons. - Minor refactoring/reorganization of code to make the code clearer and easier to read. After many updates, the code had become a little bloated and difficult to read, so a cleanup was in order. - Removed synthetic delay injection (via dummy FLOPS or loads). These were not very useful or accurate and had not been used by anyone after the initial analyses were done with them. As they were definitely adding to the code bloat of the program, they were removed. ===================================================================== NEW IN VERSION 12 ===================================================================== - (Bugfix) The XL and XXL runtime options didn't work correctly. The unionized energy grid overflowed the bounds of normal 4 byte integers, and actually required use of 8 byte integers. The variables "n_isotopes" and "n_gridpoints" have been refactored to 8 byte long integers. All variables that use n_isotopes and n_gridpoints as input have also been refactored to 8 byte longs. Note that a simple "patch" from version 11 to version 12 can be manually done by simply changing line 73 of GridInit.c to be a long instead of an int. The more thorough refactoring done in v12 is done to "future proof" the code. ===================================================================== NEW IN VERSION 11 ===================================================================== - Updated & greatly improved the PAPI capability of XSBench. Now events can be tallied during multi-core. See README for more info. - Added in option for thread sleep pause in between macro XS lookups. Very similar to adding dummy flops, but a little cleaner. With as small as a 0.1 ms sleep, we get linear scaling with threads. While this initially appears to confirm our initial suspicions regarding memory contention / latency problems, I think the delays resulting from the sleeps could potentially just be washing out the scaling numbers. Even with just 0.1, over 15 million lookups, the majority of the runtime (>90%) is just sleep, so scaling numbers aren't very expressive anymore. Need to implement timers that ignore the sleep parts. - Specified OpenMP schedule mode as 'dynamic'. This is the default on most systems, but now it's set explicitly since it's a lot faster than 'static' or other modes. - Added in a "benchmarking" mode, which will attempt all possible thread combinations between 1 <= nthreads <= max_threads. This helps to save considerable benchmarking time, as the data structures can be re-used between runs rather than regenerated each time. Benchmarking mode is enabled in the makefile. ===================================================================== NEW IN VERSION 10 ===================================================================== - Changed verification mode to be more portable. The verification strategy introduced in version 9 had discrepancies on different platforms and compilers. This was due to reliance on the compiler provided rand() function producing a different series of random numbers than other implementations. Also, there were some issues with the associativity of floating point arithmetic. These issues have now all been solved, and the verification hash is consistent across all tested platforms. - Revised "XL" size parameters, as well as adding in an "XXL" size option. The XL size now uses 120GB of XS data. The XXL mode uses 252GN of XS data. More details are in the verification section of the readme. ===================================================================== NEW IN VERSION 9 ===================================================================== - Added in new code verification mode. This can be toggled on in the makefile. When code is compiled and run, a hash of the results will be generated which can then be compared to other versions and configurations of XSBench. See readme for more details. - Moved PAPI def to makefile. Makes it easier to toggle. - Added -l command line option to set the number of cross section lookups performed by XSBench. ===================================================================== NEW IN VERSION 8 ===================================================================== - Simplified command line interface (CLI) read in process. XSBench now supports a more traditional CLI, as follows: Usage: ./XSBench Options include: -n Number of OpenMP threads to run -s Size of H-M Benchmark to run (small, large, XL) -g Number of gridpoints per isotope Default is equivalent to: -s large - Updated README with new CLI usage details. - Fixed several typos in the XSBench Theory PDF. ===================================================================== NEW IN VERSION 7 ===================================================================== - Added MPI support. Multithreaded run executes on all ranks. Problem size or data is not subdivided - the exact same problem is solved in parallel by all ranks. Only MPI communication is a single reduce at the end to aggregate timing data. To enable MPi mode, simply change the MPI flag in the makefile to "MPI = yes". Make sure mpicc is available on your system. - Added in "XL" size option for a giant 277 GB energy grid. This is unlikely to fit on a single node, but is useful for experimentation purposes. - Removed "BGQ mode" CLI argument option, as it wasn't being used by anything in the code anymore. ===================================================================== NEW IN VERSION 6 ===================================================================== - Fixed small bug in calculate_micro_xs() function. Occasionally, the index returned would be the last nuclidegridpoint for that nuclide, causing the "high" energy point to be off the end of the grid (likely into the next nuclide's energy grid). Added a check to correct for when this occurs. Note that this bug did not affect performance - only made the calculation of XS's more "correct". ===================================================================== NEW IN VERSION 5 ===================================================================== - Added ChangeLog - Moved source code files to src/ directory. - Updated README.txt file to enhance documentation - Added significant documentation with regards to theory in the docs/XSBench_Theory.pdf file. The README.txt file is now more of a quick-start & users guide, whereas the XSBench_Theory.pdf guide covers the details and theory behind the code. =====================================================================