| 1 |
|
|---|
| 2 | Code Description
|
|---|
| 3 |
|
|---|
| 4 | A. General description:
|
|---|
| 5 |
|
|---|
| 6 | AMG2013 is a parallel algebraic multigrid solver for linear systems arising from
|
|---|
| 7 | problems on unstructured grids.
|
|---|
| 8 |
|
|---|
| 9 | See the following papers for details on the algorithm and its parallel
|
|---|
| 10 | implementation/performance:
|
|---|
| 11 |
|
|---|
| 12 | Van Emden Henson and Ulrike Meier Yang, "BoomerAMG: A Parallel Algebraic
|
|---|
| 13 | Multigrid Solver and Preconditioner", Appl. Num. Math. 41 (2002),
|
|---|
| 14 | pp. 155-177. Also available as LLNL technical report UCRL-JC-141495.
|
|---|
| 15 |
|
|---|
| 16 | Hans De Sterck, Ulrike Meier Yang and Jeffrey Heys, "Reducing Complexity in
|
|---|
| 17 | Parallel Algebraic Multigrid Preconditioners", SIAM Journal on Matrix Analysis
|
|---|
| 18 | and Applications 27 (2006), pp. 1019-1039. Also available as LLNL technical
|
|---|
| 19 | reports UCRL-JRNL-206780.
|
|---|
| 20 |
|
|---|
| 21 | Hans De Sterck, Robert D. Falgout, Josh W. Nolting and Ulrike Meier Yang,
|
|---|
| 22 | "Distance-Two Interpolation for Parallel Algebraic Multigrid", Numerical
|
|---|
| 23 | Linear Algebra with Applications 15 (2008), pp. 115-139. Also available as
|
|---|
| 24 | LLNL technical report UCRL-JRNL-230844.
|
|---|
| 25 |
|
|---|
| 26 | U. M. Yang, "On Long Range Interpolation Operators for Aggressive Coarsening",
|
|---|
| 27 | Numer. Linear Algebra Appl., 17 (2010), pp. 453-472. LLNL-JRNL-417371.
|
|---|
| 28 |
|
|---|
| 29 | A. H. Baker, R. D. Falgout, T. V. Kolev, and U. M. Yang, "Multigrid Smoothers
|
|---|
| 30 | for Ultraparallel Computing", SIAM J. Sci. Comput., 33 (2011), pp. 2864-2887.
|
|---|
| 31 | LLNL-JRNL-473191.
|
|---|
| 32 |
|
|---|
| 33 | The driver provided with AMG2013 builds linear systems for various
|
|---|
| 34 | 3-dimensional problems, which are described in Section D.
|
|---|
| 35 |
|
|---|
| 36 | To determine when the solver has converged, the driver uses the
|
|---|
| 37 | relative-residual stopping criteria,
|
|---|
| 38 |
|
|---|
| 39 | ||r_k||_2 / ||b||_2 < tol
|
|---|
| 40 |
|
|---|
| 41 | with tol = 10^-6.
|
|---|
| 42 |
|
|---|
| 43 | B. Coding:
|
|---|
| 44 |
|
|---|
| 45 | AMG2013 is written in ISO-C. It is an SPMD code which uses MPI. Parallelism is
|
|---|
| 46 | achieved by data decomposition. The driver provided with AMG2013 achieves this
|
|---|
| 47 | decomposition by simply subdividing the grid into logical P x Q x R (in 3D)
|
|---|
| 48 | chunks of equal size.
|
|---|
| 49 |
|
|---|
| 50 | C. Parallelism:
|
|---|
| 51 |
|
|---|
| 52 | AMG2013 is a highly synchronous code. The communications and computations
|
|---|
| 53 | patterns exhibit the surface-to-volume relationship common to many parallel
|
|---|
| 54 | scientific codes. Hence, parallel efficiency is largely determined by the size
|
|---|
| 55 | of the data "chunks" mentioned above, and the speed of communications and
|
|---|
| 56 | computations on the machine. AMG2013 is also memory-access bound, doing only
|
|---|
| 57 | about 1-2 computations per memory access, so memory-access speeds will also have
|
|---|
| 58 | a large impact on performance.
|
|---|
| 59 |
|
|---|
| 60 | D. Test problems
|
|---|
| 61 |
|
|---|
| 62 | Problem 1 (default): The default problem is a Laplace type problem on an
|
|---|
| 63 | unstructured domain with an anisotropy in one part. A 2-dimensional projection
|
|---|
| 64 | of the grid with the corresponding 2-dimensional stencils is illustrated in the
|
|---|
| 65 | file 'mg_grid_labels.pdf'. The problem is made 3-dimensional by extending the
|
|---|
| 66 | domain uniformly in z-direction. The default problem size is 384 unknowns, but
|
|---|
| 67 | this is easily refined on the amg2013 command line (see "Running the Code" for details);
|
|---|
| 68 | Suggestions for test runs are given in Section "Suggested Test Runs".
|
|---|
| 69 |
|
|---|
| 70 | Problem 2 (-laplace): Solves
|
|---|
| 71 |
|
|---|
| 72 | - cx u_xx - cy u_yy - cz u_zz = (1/h)^2
|
|---|
| 73 |
|
|---|
| 74 | with Dirichlet boundary conditions of u = 0, where h is the mesh spacing in each
|
|---|
| 75 | direction on the unit cube. Standard finite differences are used to discretize
|
|---|
| 76 | the equations yielding 7-pt stencils in 3D. This problem can also be used to
|
|---|
| 77 | generate 2D or 1D problems by setting the length in one or two of the directions
|
|---|
| 78 | (<nx>, <ny> or <nz>) to 1.
|
|---|
| 79 |
|
|---|
| 80 | Problem 3 (-27pt): Solves a Laplace type problem using a 27-point stencil.
|
|---|
| 81 |
|
|---|
| 82 | Problem 4 (-jumps): Solves the PDE
|
|---|
| 83 |
|
|---|
| 84 | - a(x,y,z)(u_xx + u_yy = u_zz) = (1/h)^2
|
|---|
| 85 |
|
|---|
| 86 | with Dirichlet boundary conditions of u=0 on the unit cube, and
|
|---|
| 87 |
|
|---|
| 88 | a(x,y,z) = 1000 on [0.1,0.9] x [0.1,0.9] x [0.1,0.9]
|
|---|
| 89 | = 0.01 on the 8 corner cubes of size 0.1 x 0.1 x 0.1
|
|---|
| 90 | = 1 elsewhere
|
|---|
| 91 |
|
|---|
| 92 | %==========================================================================
|
|---|
| 93 | %==========================================================================
|
|---|
| 94 |
|
|---|
| 95 | Important Kernels in this Distribution
|
|---|
| 96 |
|
|---|
| 97 | Here the important files that are used for the linear solver,
|
|---|
| 98 | both preconditioner and solver, are listed. Files that don't
|
|---|
| 99 | take much time such as wrappers (files starting with HYPRE_,
|
|---|
| 100 | files that are not used during the suggested runs or files that are
|
|---|
| 101 | used for the generation of the problems are not included here.
|
|---|
| 102 | A complete listing of all directories and files
|
|---|
| 103 | as well as a short description of each directory
|
|---|
| 104 | can be found in the next section.
|
|---|
| 105 |
|
|---|
| 106 |
|
|---|
| 107 | In the 'krylov' directory:
|
|---|
| 108 |
|
|---|
| 109 | pcg.c functions for the conjugate gradient algorithm
|
|---|
| 110 | gmres.c functions for the GMRES algorithm
|
|---|
| 111 |
|
|---|
| 112 | In the 'parcsr_ls' directory:
|
|---|
| 113 |
|
|---|
| 114 | par_amg.c Setup phase of the AMG preconditioner
|
|---|
| 115 | par_amg_setup.c Setup phase of the AMG preconditioner
|
|---|
| 116 | par_coarsen.c various coarsening algorithms
|
|---|
| 117 | par_strength.c computes a strength matrix for coarsening and
|
|---|
| 118 | interpolation
|
|---|
| 119 | par_indepset.c independent set function needed for coarsening
|
|---|
| 120 | par_interp.c interpolation algorithms for solvers 0 and 3
|
|---|
| 121 | par_lr_interp.c interpolation algorithms for solvers 1 and 4
|
|---|
| 122 | aux_interp.c auxiliary functions needed in par_lr_interp.c
|
|---|
| 123 | par_multi_interp.c interpolation algorithm for fine level
|
|---|
| 124 | in solvers 1 and 4
|
|---|
| 125 | par_rap.c generates coarse grid operator
|
|---|
| 126 | par_rap_communication.c sets up communication in par_rap.c
|
|---|
| 127 | par_amg_solve.c Solve phase of the AMG preconditioner
|
|---|
| 128 | par_cycle.c AMG cycle
|
|---|
| 129 | par_relax.c AMG smoothers
|
|---|
| 130 |
|
|---|
| 131 | In the 'parcsr_mv' directory:
|
|---|
| 132 |
|
|---|
| 133 | par_csr_communication.c communication routines for global partitioning
|
|---|
| 134 | new_commpkg.c communication routines for assumed partitioning
|
|---|
| 135 | par_csr_assumed_part.c communication routines for assumed partitioning
|
|---|
| 136 | par_csr_matrix.c basic parallel matrix operations
|
|---|
| 137 | par_csr_matvec.c parallel matrix vector multiplication
|
|---|
| 138 | par_csr_matop.c additional parallel matrix operations
|
|---|
| 139 | par_vector.c basic vector operations
|
|---|
| 140 |
|
|---|
| 141 | In the 'seq_mv' directory:
|
|---|
| 142 |
|
|---|
| 143 | big_csr_matrix.c basic sequential matrix operations
|
|---|
| 144 | csr_matrix.c basic sequential matrix operations
|
|---|
| 145 | csr_matvec.c sequential matrix vector multiplications
|
|---|
| 146 | csr_matop.c additional sequential matrix operations
|
|---|
| 147 | vector.c basic sequential vector operations
|
|---|
| 148 |
|
|---|
| 149 | %==========================================================================
|
|---|
| 150 | %==========================================================================
|
|---|
| 151 |
|
|---|
| 152 | Files in this Distribution
|
|---|
| 153 |
|
|---|
| 154 | NOTE: The AMG2013 code is derived directly from the hypre library, a large
|
|---|
| 155 | linear solver library that is being developed in the Center for Applied
|
|---|
| 156 | Scientific Computing (CASC) at LLNL.
|
|---|
| 157 |
|
|---|
| 158 | In the amg2013 directory the following files are included:
|
|---|
| 159 |
|
|---|
| 160 | COPYING_LESSER
|
|---|
| 161 | COPYRIGHT
|
|---|
| 162 | HYPRE.h
|
|---|
| 163 | Makefile
|
|---|
| 164 | Makefile.include
|
|---|
| 165 |
|
|---|
| 166 | The following subdirectories are also included:
|
|---|
| 167 |
|
|---|
| 168 | docs Documentation
|
|---|
| 169 | IJ_mv Linear algebraic interface routines
|
|---|
| 170 | krylov Krylov solvers, such as PCG and GMRES
|
|---|
| 171 | parcsr_ls routines needed to generate solvers and preconditioners
|
|---|
| 172 | as well as Problems 2-4
|
|---|
| 173 | parcsr_mv parallel matrix and vector routines
|
|---|
| 174 | (ParCSR data structure)
|
|---|
| 175 | seq_mv sequential matrix and vector routines
|
|---|
| 176 | sstruct_mv semistructured matrix and vector routines - included
|
|---|
| 177 | to generate Problem 1
|
|---|
| 178 | struct_mv structured matrix and vector routines - included to
|
|---|
| 179 | generate Problem 1
|
|---|
| 180 | test driver and input file for Problem 1
|
|---|
| 181 | utilities functions for memory allocation, timing, error codes,
|
|---|
| 182 | sorting, searching, etc.
|
|---|
| 183 |
|
|---|
| 184 | In the 'docs' directory the following files are included:
|
|---|
| 185 |
|
|---|
| 186 | amg2013.readme
|
|---|
| 187 | mg_grid_labels.pdf
|
|---|
| 188 |
|
|---|
| 189 | In the 'IJ_mv' directory the following files are included:
|
|---|
| 190 |
|
|---|
| 191 | aux_parcsr_matrix.c
|
|---|
| 192 | aux_parcsr_matrix.h
|
|---|
| 193 | aux_par_vector.c
|
|---|
| 194 | aux_par_vector.h
|
|---|
| 195 | headers.h
|
|---|
| 196 | HYPRE_IJMatrix.c
|
|---|
| 197 | HYPRE_IJ_mv.h
|
|---|
| 198 | HYPRE_IJVector.c
|
|---|
| 199 | IJMatrix.c
|
|---|
| 200 | IJ_matrix.h
|
|---|
| 201 | IJMatrix_parcsr.c
|
|---|
| 202 | IJ_mv.h
|
|---|
| 203 | IJVector.c
|
|---|
| 204 | IJ_vector.h
|
|---|
| 205 | IJVector_parcsr.c
|
|---|
| 206 | Makefile
|
|---|
| 207 |
|
|---|
| 208 | In the 'krylov' directory the following files are included:
|
|---|
| 209 |
|
|---|
| 210 | all_krylov.h
|
|---|
| 211 | gmres.c
|
|---|
| 212 | gmres.h
|
|---|
| 213 | HYPRE_gmres.c
|
|---|
| 214 | HYPRE_MatvecFunctions.h
|
|---|
| 215 | HYPRE_pcg.c
|
|---|
| 216 | krylov.h
|
|---|
| 217 | Makefile
|
|---|
| 218 | pcg.c
|
|---|
| 219 | pcg.h
|
|---|
| 220 |
|
|---|
| 221 | In the 'parcsr_ls' directory the following files are included:
|
|---|
| 222 |
|
|---|
| 223 | aux_interp.c
|
|---|
| 224 | aux_interp.h
|
|---|
| 225 | gen_redcs_mat.c
|
|---|
| 226 | headers.h
|
|---|
| 227 | HYPRE_parcsr_amg.c
|
|---|
| 228 | HYPRE_parcsr_gmres.c
|
|---|
| 229 | HYPRE_parcsr_ls.h
|
|---|
| 230 | HYPRE_parcsr_pcg.c
|
|---|
| 231 | Makefile
|
|---|
| 232 | par_amg.c
|
|---|
| 233 | par_amg.h
|
|---|
| 234 | par_amg_setup.c
|
|---|
| 235 | par_amg_solve.c
|
|---|
| 236 | par_cg_relax_wt.c
|
|---|
| 237 | par_coarsen.c
|
|---|
| 238 | par_coarse_parms.c
|
|---|
| 239 | parcsr_ls.h
|
|---|
| 240 | par_cycle.c
|
|---|
| 241 | par_difconv.c
|
|---|
| 242 | par_indepset.c
|
|---|
| 243 | par_interp.c
|
|---|
| 244 | par_jacobi_interp.c
|
|---|
| 245 | par_laplace_27pt.c
|
|---|
| 246 | par_laplace.c
|
|---|
| 247 | par_lr_interp.c
|
|---|
| 248 | par_multi_interp.c
|
|---|
| 249 | par_nodal_systems.c
|
|---|
| 250 | par_rap.c
|
|---|
| 251 | par_rap_communication.c
|
|---|
| 252 | par_relax.c
|
|---|
| 253 | par_relax_interface.c
|
|---|
| 254 | par_relax_more.c
|
|---|
| 255 | par_scaled_matnorm.c
|
|---|
| 256 | par_stats.c
|
|---|
| 257 | par_strength.c
|
|---|
| 258 | partial.c
|
|---|
| 259 | par_vardifconv.c
|
|---|
| 260 | pcg_par.c
|
|---|
| 261 |
|
|---|
| 262 | In the 'parcsr_mv' directory the following files are included:
|
|---|
| 263 |
|
|---|
| 264 | headers.h
|
|---|
| 265 | HYPRE_parcsr_matrix.c
|
|---|
| 266 | HYPRE_parcsr_mv.h
|
|---|
| 267 | HYPRE_parcsr_vector.c
|
|---|
| 268 | Makefile
|
|---|
| 269 | new_commpkg.c
|
|---|
| 270 | new_commpkg.h
|
|---|
| 271 | par_csr_assumed_part.c
|
|---|
| 272 | par_csr_assumed_part.h
|
|---|
| 273 | par_csr_communication.c
|
|---|
| 274 | par_csr_communication.h
|
|---|
| 275 | par_csr_matop.c
|
|---|
| 276 | par_csr_matop_marked.c
|
|---|
| 277 | par_csr_matrix.c
|
|---|
| 278 | par_csr_matrix.h
|
|---|
| 279 | par_csr_matvec.c
|
|---|
| 280 | parcsr_mv.h
|
|---|
| 281 | par_vector.c
|
|---|
| 282 | par_vector.h
|
|---|
| 283 |
|
|---|
| 284 | In the 'seq_mv' directory the following files are included:
|
|---|
| 285 |
|
|---|
| 286 | big_csr_matrix.c
|
|---|
| 287 | csr_matop.c
|
|---|
| 288 | csr_matrix.c
|
|---|
| 289 | csr_matrix.h
|
|---|
| 290 | csr_matvec.c
|
|---|
| 291 | genpart.c
|
|---|
| 292 | headers.h
|
|---|
| 293 | HYPRE_csr_matrix.c
|
|---|
| 294 | HYPRE_seq_mv.h
|
|---|
| 295 | HYPRE_vector.c
|
|---|
| 296 | Makefile
|
|---|
| 297 | seq_mv.h
|
|---|
| 298 | vector.c
|
|---|
| 299 | vector.h
|
|---|
| 300 |
|
|---|
| 301 | In the 'sstruct_mv' directory the following files are included:
|
|---|
| 302 |
|
|---|
| 303 | box_map.c
|
|---|
| 304 | box_map.h
|
|---|
| 305 | headers.h
|
|---|
| 306 | HYPRE_sstruct_graph.c
|
|---|
| 307 | HYPRE_sstruct_grid.c
|
|---|
| 308 | HYPRE_sstruct_matrix.c
|
|---|
| 309 | HYPRE_sstruct_mv.h
|
|---|
| 310 | HYPRE_sstruct_stencil.c
|
|---|
| 311 | HYPRE_sstruct_vector.c
|
|---|
| 312 | Makefile
|
|---|
| 313 | sstruct_axpy.c
|
|---|
| 314 | sstruct_copy.c
|
|---|
| 315 | sstruct_graph.c
|
|---|
| 316 | sstruct_graph.h
|
|---|
| 317 | sstruct_grid.c
|
|---|
| 318 | sstruct_grid.h
|
|---|
| 319 | sstruct_innerprod.c
|
|---|
| 320 | sstruct_matrix.c
|
|---|
| 321 | sstruct_matrix.h
|
|---|
| 322 | sstruct_matvec.c
|
|---|
| 323 | sstruct_mv.h
|
|---|
| 324 | sstruct_overlap_innerprod.c
|
|---|
| 325 | sstruct_scale.c
|
|---|
| 326 | sstruct_stencil.c
|
|---|
| 327 | sstruct_stencil.h
|
|---|
| 328 | sstruct_vector.c
|
|---|
| 329 | sstruct_vector.h
|
|---|
| 330 |
|
|---|
| 331 | In the 'struct_mv' directory the following files are included:
|
|---|
| 332 |
|
|---|
| 333 | assumed_part.c
|
|---|
| 334 | assumed_part.h
|
|---|
| 335 | box_algebra.c
|
|---|
| 336 | box_alloc.c
|
|---|
| 337 | box_boundary.c
|
|---|
| 338 | box.c
|
|---|
| 339 | box.h
|
|---|
| 340 | box_manager.c
|
|---|
| 341 | box_manager.h
|
|---|
| 342 | box_neighbors.c
|
|---|
| 343 | box_neighbors.h
|
|---|
| 344 | box_pthreads.h
|
|---|
| 345 | communication_info.c
|
|---|
| 346 | computation.c
|
|---|
| 347 | computation.h
|
|---|
| 348 | grow.c
|
|---|
| 349 | headers.h
|
|---|
| 350 | HYPRE_struct_grid.c
|
|---|
| 351 | HYPRE_struct_matrix.c
|
|---|
| 352 | HYPRE_struct_mv.h
|
|---|
| 353 | HYPRE_struct_stencil.c
|
|---|
| 354 | HYPRE_struct_vector.c
|
|---|
| 355 | Makefile
|
|---|
| 356 | new_assemble.c
|
|---|
| 357 | new_box_neighbors.c
|
|---|
| 358 | project.c
|
|---|
| 359 | struct_axpy.c
|
|---|
| 360 | struct_communication.c
|
|---|
| 361 | struct_communication.h
|
|---|
| 362 | struct_copy.c
|
|---|
| 363 | struct_grid.c
|
|---|
| 364 | struct_grid.h
|
|---|
| 365 | struct_innerprod.c
|
|---|
| 366 | struct_io.c
|
|---|
| 367 | struct_matrix.c
|
|---|
| 368 | struct_matrix.h
|
|---|
| 369 | struct_matrix_mask.c
|
|---|
| 370 | struct_matvec.c
|
|---|
| 371 | struct_mv.h
|
|---|
| 372 | struct_overlap_innerprod.c
|
|---|
| 373 | struct_scale.c
|
|---|
| 374 | struct_stencil.c
|
|---|
| 375 | struct_stencil.h
|
|---|
| 376 | struct_vector.c
|
|---|
| 377 | struct_vector.h
|
|---|
| 378 |
|
|---|
| 379 | In the 'test' directory the following files are included:
|
|---|
| 380 |
|
|---|
| 381 | amg2013.c
|
|---|
| 382 | Makefile
|
|---|
| 383 | sstruct.in.MG.FD
|
|---|
| 384 |
|
|---|
| 385 | In the 'utilities' directory the following files are included:
|
|---|
| 386 |
|
|---|
| 387 | amg_linklist.c
|
|---|
| 388 | amg_linklist.h
|
|---|
| 389 | binsearch.c
|
|---|
| 390 | exchange_data.c
|
|---|
| 391 | exchange_data.h
|
|---|
| 392 | exchange_data.README
|
|---|
| 393 | general.h
|
|---|
| 394 | hypre_error.c
|
|---|
| 395 | hypre_error.h
|
|---|
| 396 | hypre_memory.c
|
|---|
| 397 | hypre_memory.h
|
|---|
| 398 | hypre_qsort.c
|
|---|
| 399 | hypre_smp_forloop.h
|
|---|
| 400 | HYPRE_utilities.h
|
|---|
| 401 | Makefile
|
|---|
| 402 | memory_dmalloc.c
|
|---|
| 403 | mpistubs.c
|
|---|
| 404 | mpistubs.h
|
|---|
| 405 | qsplit.c
|
|---|
| 406 | random.c
|
|---|
| 407 | threading.c
|
|---|
| 408 | threading.h
|
|---|
| 409 | thread_mpistubs.c
|
|---|
| 410 | thread_mpistubs.h
|
|---|
| 411 | timer.c
|
|---|
| 412 | timing.c
|
|---|
| 413 | timing.h
|
|---|
| 414 | umalloc_local.c
|
|---|
| 415 | umalloc_local.h
|
|---|
| 416 | utilities.h
|
|---|
| 417 |
|
|---|
| 418 | %==========================================================================
|
|---|
| 419 | %==========================================================================
|
|---|
| 420 |
|
|---|
| 421 | Building the Code
|
|---|
| 422 |
|
|---|
| 423 | AMG2013 uses a simple Makefile system for building the code. All compiler and
|
|---|
| 424 | link options are set by modifying the file 'amg2013/Makefile.include'
|
|---|
| 425 | appropriately. This file is then included in each of the following makefiles:
|
|---|
| 426 |
|
|---|
| 427 | krylov/Makefile
|
|---|
| 428 | IJ_mv/Makefile
|
|---|
| 429 | parcsr_ls/Makefile
|
|---|
| 430 | parcsr_mv/Makefile
|
|---|
| 431 | seq_mv/Makefile
|
|---|
| 432 | sstruct_mv/Makefile
|
|---|
| 433 | struct_mv/Makefile
|
|---|
| 434 | test/Makefile
|
|---|
| 435 | utilities/Makefile
|
|---|
| 436 |
|
|---|
| 437 | To build the code, first modify the 'Makefile.include' file appropriately, then
|
|---|
| 438 | type (in the amg2013 directory)
|
|---|
| 439 |
|
|---|
| 440 | make
|
|---|
| 441 |
|
|---|
| 442 | Other available targets are
|
|---|
| 443 |
|
|---|
| 444 | make clean (deletes .o files)
|
|---|
| 445 | make veryclean (deletes .o files, libraries, and executables)
|
|---|
| 446 |
|
|---|
| 447 | To configure the code to run with:
|
|---|
| 448 |
|
|---|
| 449 | 1 - MPI only , add '-DTIMER_USE_MPI' to the 'INCLUDE_CFLAGS' line
|
|---|
| 450 | in the 'Makefile.include' file and use a valid MPI.
|
|---|
| 451 | 2 - OpenMP with MPI, add vendor dependent compilation flag for OMP
|
|---|
| 452 | 3 - to use the assumed partition (recommended for several thousand
|
|---|
| 453 | processors or more), add '-DHYPRE_NO_GLOBAL_PARTITION'
|
|---|
| 454 | 4 - to be able to solve problems that are larger than 2^31-1,
|
|---|
| 455 | add '-DHYPRE_LONG_LONG'
|
|---|
| 456 |
|
|---|
| 457 | %==========================================================================
|
|---|
| 458 | %==========================================================================
|
|---|
| 459 |
|
|---|
| 460 | Optimization and Improvement Challenges
|
|---|
| 461 |
|
|---|
| 462 | This code is memory-access bound. We believe it would be very difficult to
|
|---|
| 463 | obtain "good" cache reuse with an optimized version of the code.
|
|---|
| 464 |
|
|---|
| 465 | %==========================================================================
|
|---|
| 466 | %==========================================================================
|
|---|
| 467 |
|
|---|
| 468 | Parallelism and Scalability Expectations
|
|---|
| 469 |
|
|---|
| 470 | AMG2013 has been run on the following platforms:
|
|---|
| 471 |
|
|---|
| 472 | BG/Q - up to over 1,000,000 MPI processes
|
|---|
| 473 | BG/P - up to 125,000 MPI processes
|
|---|
| 474 | Sierra - up to 13,824 MPI processes
|
|---|
| 475 | and more
|
|---|
| 476 |
|
|---|
| 477 | Consider increasing both problem size and number of processors in tandem.
|
|---|
| 478 | On scalable architectures, time-to-solution for AMG2013 will initially
|
|---|
| 479 | increase, then it will level off at a modest numbers of processors,
|
|---|
| 480 | remaining roughly constant for larger numbers of processors. Iteration
|
|---|
| 481 | counts will also increase slightly for small to modest sized problems,
|
|---|
| 482 | then level off at a roughly constant number for larger problem sizes.
|
|---|
| 483 |
|
|---|
| 484 | For example, we get the following timing results (in seconds) for a 3D Laplace
|
|---|
| 485 | problem with cx = cy = cz = 1.0, distributed on a logical P x Q x R processor
|
|---|
| 486 | topology, with fixed local problem size per process given as 40 x 40 x 40:
|
|---|
| 487 |
|
|---|
| 488 | P x Q x R procs solver similar to solver 0
|
|---|
| 489 | ---------------------------------------------------------------
|
|---|
| 490 | 16x16x16 4096 5.75
|
|---|
| 491 | 20x20x20 8000 6.88
|
|---|
| 492 | 32x32x32 32768 8.11
|
|---|
| 493 | 44x44x44 91125 10.48
|
|---|
| 494 | 50x50x50 125000 10.54
|
|---|
| 495 |
|
|---|
| 496 | These results were obtained on BG/P using the assumed partition option
|
|---|
| 497 | -DHYPRE_NO_GLOBAL_PARTITION and -DHYPRE_LONG_LONG.
|
|---|
| 498 |
|
|---|
| 499 | %==========================================================================
|
|---|
| 500 | %==========================================================================
|
|---|
| 501 |
|
|---|
| 502 | Running the Code
|
|---|
| 503 |
|
|---|
| 504 | The driver for AMG2013 is called `amg2013', and is located in the amg2013/test
|
|---|
| 505 | subdirectory. Type
|
|---|
| 506 |
|
|---|
| 507 | mpirun -np 1 amg2013 -help
|
|---|
| 508 |
|
|---|
| 509 | to get usage information. This prints out the following:
|
|---|
| 510 |
|
|---|
| 511 | Usage: amg2013 [<options>]
|
|---|
| 512 |
|
|---|
| 513 | -in <filename> : input file (default is `sstruct.in.AMG.FD')
|
|---|
| 514 |
|
|---|
| 515 | -P <Px> <Py> <Pz> : define processor topology per part
|
|---|
| 516 | Note that for test problem 1, which has 8 parts
|
|---|
| 517 | this leads to 8*Px*Py*Pz MPI processes!
|
|---|
| 518 | For all other test problems, the total amount of
|
|---|
| 519 | MPI processes is Px*Py*Pz.
|
|---|
| 520 |
|
|---|
| 521 | -pooldist <p> : pool distribution to use
|
|---|
| 522 |
|
|---|
| 523 | -r <rx> <ry> <rz> : refine part(s) for default problem
|
|---|
| 524 | -b <bx> <by> <bz> : refine and block part(s) for default problem
|
|---|
| 525 |
|
|---|
| 526 | -n <nx> <ny> <nz> : define size per processor for problems on cube
|
|---|
| 527 | -c <cx> <cy> <cz> : define anisotropies for Laplace problem
|
|---|
| 528 |
|
|---|
| 529 | -laplace : 3D Laplace problem on a cube
|
|---|
| 530 | -27pt : Problem with 27-point stencil on a cube
|
|---|
| 531 | -jumps : PDE with jumps on a cube
|
|---|
| 532 |
|
|---|
| 533 | -solver <ID> : solver ID (default = 0)
|
|---|
| 534 | 0 - PCG with AMG precond
|
|---|
| 535 | 1 - PCG with diagonal scaling
|
|---|
| 536 | 2 - GMRES(10) with AMG precond
|
|---|
| 537 | 3 - GMRES(10) with diagonal scaling
|
|---|
| 538 |
|
|---|
| 539 | -printstats : print out detailed info on AMG preconditioner
|
|---|
| 540 |
|
|---|
| 541 | -printsystem : print out the system
|
|---|
| 542 |
|
|---|
| 543 | -rhsfromcosine : solution is cosine function (default), can be used for
|
|---|
| 544 | default problem only
|
|---|
| 545 | -rhsone : rhs is vector with unit components
|
|---|
| 546 |
|
|---|
| 547 | All of the arguments are optional. The most important option for the AMG2013
|
|---|
| 548 | compact application is the `-P' option. It specifies the MPI process topology
|
|---|
| 549 | on which to run.
|
|---|
| 550 |
|
|---|
| 551 | For the default problem, there are two possible pool distributions, which
|
|---|
| 552 | lead to different partitionings of the problem. Pool distribution 0 will
|
|---|
| 553 | give each process a portion of one of the 8 parts of the test problem, thus
|
|---|
| 554 | assigning disjoint subdomains to each process. Pool distribution 1 uses a
|
|---|
| 555 | more natural partitioning, assigning each process a subdomain in one of
|
|---|
| 556 | the 8 parts, and therefore requires the total number of processes to be a
|
|---|
| 557 | multiple of 8, i.e. it needs to be run as follows:
|
|---|
| 558 | mpirun -np <N> amg2013 -pooldist 1 -P <Px> <Py> <Pz> ...
|
|---|
| 559 | with <N> = 8*<Px>*<Py>*<Pz>.
|
|---|
| 560 | Both partitionings lead to a load balanced distribution of the original problem.
|
|---|
| 561 | The problem size per MPI process can be increased using the `-r' option,
|
|---|
| 562 | which defines the refinement factor for the grid on each process in each
|
|---|
| 563 | direction, or the '-b' option, which increases the number of blocks per process.
|
|---|
| 564 |
|
|---|
| 565 |
|
|---|
| 566 | For the other three problems (laplace, 27pt and jumps) the `-n' option allows
|
|---|
| 567 | one to specify the local problem size per MPI process, leading to a global
|
|---|
| 568 | problem size of <Px>*<nx> by <Py>*<ny> by <Pz>*<nz>.
|
|---|
| 569 |
|
|---|
| 570 | %==========================================================================
|
|---|
| 571 | %==========================================================================
|
|---|
| 572 |
|
|---|
| 573 | Timing Issues
|
|---|
| 574 |
|
|---|
| 575 | If using MPI, the whole code is timed using the MPI timers. If not using MPI,
|
|---|
| 576 | standard system timers are used. Timing results are printed to standard out,
|
|---|
| 577 | and are divided into "Setup Phase" times and "Solve Phase" times. Timings for a
|
|---|
| 578 | few individual routines are also printed out.
|
|---|
| 579 |
|
|---|
| 580 | %==========================================================================
|
|---|
| 581 | %==========================================================================
|
|---|
| 582 |
|
|---|
| 583 | Memory Needed
|
|---|
| 584 |
|
|---|
| 585 | AMG2013 's memory needs are somewhat complicated to describe. They are very
|
|---|
| 586 | dependent on the type of problem solved and the AMG options used. In general,
|
|---|
| 587 | solver 1 and solver 4 will need less memory than solver 0 and 3. When turning
|
|---|
| 588 | on the '-printstats' option, operator complexities <oc> are displayed, which are
|
|---|
| 589 | defined by the sum of nonzeros of the original matrix and all coarse grid
|
|---|
| 590 | matrices divided by the number of nonzeros of the original matrix, i.e. for
|
|---|
| 591 | original matrix and coarse grid operators about <oc> times as much space is
|
|---|
| 592 | needed as for the original matrix. However this does not include memory needed
|
|---|
| 593 | for interpolation operators, communication, etc.
|
|---|
| 594 |
|
|---|
| 595 | %==========================================================================
|
|---|
| 596 | %==========================================================================
|
|---|
| 597 |
|
|---|
| 598 | About the Data
|
|---|
| 599 |
|
|---|
| 600 | AMG2013 requires one input file to generate the default problem, which is
|
|---|
| 601 | located in the test directory. Apart from this all control is on the command
|
|---|
| 602 | line.
|
|---|
| 603 |
|
|---|
| 604 | %==========================================================================
|
|---|
| 605 | %==========================================================================
|
|---|
| 606 |
|
|---|
| 607 | Expected Results
|
|---|
| 608 |
|
|---|
| 609 | Consider the following run, which was compiled using options -DTIMER_USE_MPI -DHYPRE_USING_OPENMP,
|
|---|
| 610 | linking with OpenMP and setting OMP_NUM_THREADS to 2:
|
|---|
| 611 |
|
|---|
| 612 | mpirun -np 8 amg2013 -pooldist 1 P 1 1 1 -r 4 4 4 -printstats
|
|---|
| 613 |
|
|---|
| 614 | This is what AMG2013 prints out:
|
|---|
| 615 |
|
|---|
| 616 | =============================================
|
|---|
| 617 | SStruct Interface:
|
|---|
| 618 | =============================================
|
|---|
| 619 | SStruct Interface:
|
|---|
| 620 | SStruct Interface wall clock time = 0.014211 seconds
|
|---|
| 621 | SStruct Interface cpu clock time = 0.010000 seconds
|
|---|
| 622 |
|
|---|
| 623 | Number of MPI processes: 8 , Number of OpenMP threads: 2
|
|---|
| 624 |
|
|---|
| 625 | BoomerAMG SETUP PARAMETERS:
|
|---|
| 626 |
|
|---|
| 627 | Max levels = 25
|
|---|
| 628 | Num levels = 6
|
|---|
| 629 |
|
|---|
| 630 | Strength Threshold = 0.250000
|
|---|
| 631 | Interpolation Truncation Factor = 0.000000
|
|---|
| 632 | Maximum Row Sum Threshold for Dependency Weakening = 0.900000
|
|---|
| 633 |
|
|---|
| 634 | Coarsening Type = HMIS
|
|---|
| 635 | Hybrid Coarsening (switch to CLJP when coarsening slows)
|
|---|
| 636 | measures are determined locally
|
|---|
| 637 |
|
|---|
| 638 | no. of levels of aggressive coarsening: 1
|
|---|
| 639 |
|
|---|
| 640 | Interpolation = extended+i interpolation
|
|---|
| 641 |
|
|---|
| 642 | Operator Matrix Information:
|
|---|
| 643 |
|
|---|
| 644 | nonzero entries per row row sums
|
|---|
| 645 | lev rows entries sparse min max avg min max
|
|---|
| 646 | ===================================================================
|
|---|
| 647 | 0 82944 648936 0.000 4 9 7.8 -4.274e-15 3.000e+02
|
|---|
| 648 | 1 8985 159896 0.002 4 45 17.8 -2.069e-13 9.293e+02
|
|---|
| 649 | 2 2763 72864 0.010 6 121 26.4 -2.487e-14 1.668e+03
|
|---|
| 650 | 3 1001 21100 0.021 3 167 21.1 8.298e-02 3.147e+03
|
|---|
| 651 | 4 320 8354 0.082 2 79 26.1 1.938e-01 1.098e+02
|
|---|
| 652 | 5 21 171 0.388 4 12 8.1 5.854e+00 6.784e+00
|
|---|
| 653 |
|
|---|
| 654 |
|
|---|
| 655 | Interpolation Matrix Information:
|
|---|
| 656 |
|
|---|
| 657 | entries/row min max row sums
|
|---|
| 658 | lev rows cols min max weight weight min max
|
|---|
| 659 | =================================================================
|
|---|
| 660 | 0 82944 x 8985 1 10 1.488e-02 9.980e-01 1.759e-01 1.000e+00
|
|---|
| 661 | 1 8985 x 2763 1 4 8.769e-03 1.000e+00 1.624e-01 1.000e+00
|
|---|
| 662 | 2 2763 x 1001 0 4 2.076e-03 1.000e+00 0.000e+00 1.000e+00
|
|---|
| 663 | 3 1001 x 320 0 4 -4.281e-01 1.452e+00 -7.104e-03 1.000e+00
|
|---|
| 664 | 4 320 x 21 0 4 2.627e-03 5.150e-02 0.000e+00 1.000e+00
|
|---|
| 665 |
|
|---|
| 666 |
|
|---|
| 667 | Complexity: grid = 1.157817
|
|---|
| 668 | operator = 1.404331
|
|---|
| 669 |
|
|---|
| 670 |
|
|---|
| 671 |
|
|---|
| 672 |
|
|---|
| 673 | BoomerAMG SOLVER PARAMETERS:
|
|---|
| 674 |
|
|---|
| 675 | Maximum number of cycles: 1
|
|---|
| 676 | Stopping Tolerance: 0.000000e+00
|
|---|
| 677 | Cycle type (1 = V, 2 = W, etc.): 1
|
|---|
| 678 |
|
|---|
| 679 | Relaxation Parameters:
|
|---|
| 680 | Visiting Grid: down up coarse
|
|---|
| 681 | Number of partial sweeps: 1 1 1
|
|---|
| 682 | Type 0=Jac, 3=hGS, 6=hSGS, 9=GE: 8 8 8
|
|---|
| 683 | Point types, partial sweeps (1=C, -1=F):
|
|---|
| 684 | Pre-CG relaxation (down): 0
|
|---|
| 685 | Post-CG relaxation (up): 0
|
|---|
| 686 | Coarsest grid: 0
|
|---|
| 687 |
|
|---|
| 688 | =============================================
|
|---|
| 689 | Setup phase times:
|
|---|
| 690 | =============================================
|
|---|
| 691 | PCG Setup:
|
|---|
| 692 | PCG Setup wall clock time = 0.066036 seconds
|
|---|
| 693 | PCG Setup cpu clock time = 0.090000 seconds
|
|---|
| 694 |
|
|---|
| 695 | System Size / Setup Phase Time: 1.674723e+06
|
|---|
| 696 |
|
|---|
| 697 | =============================================
|
|---|
| 698 | Solve phase times:
|
|---|
| 699 | =============================================
|
|---|
| 700 | PCG Solve:
|
|---|
| 701 | PCG Solve wall clock time = 0.103601 seconds
|
|---|
| 702 | PCG Solve cpu clock time = 0.140000 seconds
|
|---|
| 703 |
|
|---|
| 704 | AMG2013 Benchmark version 1.0
|
|---|
| 705 | Iterations = 8
|
|---|
| 706 | Final Relative Residual Norm = 6.945422e-07
|
|---|
| 707 |
|
|---|
| 708 | System Size * Iterations / Solve Phase Time: 8.539842e+06
|
|---|
| 709 |
|
|---|
| 710 | %==========================================================================
|
|---|
| 711 | %==========================================================================
|
|---|
| 712 |
|
|---|
| 713 | Suggested Test Runs
|
|---|
| 714 |
|
|---|
| 715 | 1. For the default problem:
|
|---|
| 716 |
|
|---|
| 717 | mpirun -np <8*px*py*pz> amg2013 -pooldist 1 -r 12 12 12 -P px py pz
|
|---|
| 718 |
|
|---|
| 719 | This will generate a problem with 82,944 variables per MPI process leading to
|
|---|
| 720 | a total system size of 663,552*px*py*pz.
|
|---|
| 721 |
|
|---|
| 722 | mpirun -np <8*px*py*pz> amg2013 -pooldist 1 -r 24 24 24 -P px py pz
|
|---|
| 723 |
|
|---|
| 724 | This will generate a problem with 663,552 variables per process leading to
|
|---|
| 725 | a total system size of 5,308,416*px*py*pz and solve it using conjugate gradient
|
|---|
| 726 | preconditioned with AMG. If one wants to use AMG-GMRES(10) append -solver 2 .
|
|---|
| 727 |
|
|---|
| 728 | The domain (for a 2-dimensional projection of the domain see mg_grid_labels.pdf)
|
|---|
| 729 | can be scaled up by increasing the values for px, py and pz.
|
|---|
| 730 |
|
|---|
| 731 | 2. For the 7pt 3D Laplace problem:
|
|---|
| 732 |
|
|---|
| 733 | mpirun -np <px*py*pz> amg2013 -laplace -n 40 40 40 -P px py pz
|
|---|
| 734 |
|
|---|
| 735 | This will generate a problem with 64,000 grid points per MPI process
|
|---|
| 736 | with a domain of the size 40*px x 40*py x 40*pz .
|
|---|
| 737 |
|
|---|
| 738 | mpirun -np <px*py*pz> amg2013 -laplace -n 80 80 80 -P px py pz
|
|---|
| 739 |
|
|---|
| 740 | This will generate a problem with 512,000 grid points per MPI process
|
|---|
| 741 | with a domain of the size 80*px x 80*py x 80*pz .
|
|---|
| 742 |
|
|---|
| 743 | %==========================================================================
|
|---|
| 744 | %==========================================================================
|
|---|
| 745 |
|
|---|
| 746 | For further information on AMG2013 contact
|
|---|
| 747 | Ulrike Yang
|
|---|
| 748 | ph: (925)422-2850
|
|---|
| 749 | email: umyang@llnl.gov
|
|---|
| 750 |
|
|---|
| 751 | %==========================================================================
|
|---|
| 752 | %==========================================================================
|
|---|
| 753 |
|
|---|
| 754 | Release and Modification Record
|
|---|
| 755 |
|
|---|
| 756 | LLNL code release number: UCRL-CODE-222953.
|
|---|
| 757 |
|
|---|
| 758 | See the files COPYRIGHT and COPYING.LESSER for a complete copyright notice,
|
|---|
| 759 | additional contact information, disclaimer and license.
|
|---|