| | 355 | |
| | 356 | === Worksharing and barriers === |
| | 357 | |
| | 358 | * `void $omp_barrier($omp_team team)` |
| | 359 | * performs a barrier only. Note however that usually (always?) a barrier is accompanied by a flush-all, so `$omp_barrier_and_flush` should be used instead. |
| | 360 | |
| | 361 | * `void $omp_barrier_and_flush($omp_team team)` |
| | 362 | * combines a barrier and a flush on all shared objects owned by the team. Implicit in many OpenMP worksharing constructs. |
| | 363 | |
| | 364 | * `$domain $omp_arrive_loop($omp_team team, int location, $domain loop_dom, $DecompositionStrategy strategy)` |
| | 365 | * called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute. The dimension of the domain returned equals the dimension of the given domain `loop_dom`. |
| | 366 | |
| | 367 | * `$domain(1) $omp_arrive_sections($omp_team team, int location, int numSections)` |
| | 368 | * called by a thread when it reaches an omp sections construct, this function returns the subset of the integers 0..numSections-1 specifying the indexes of the sections that this thread will execute. The sections are numbered from 0 in increasing order. |
| | 369 | |
| | 370 | * `int $omp_arrive_single($omp_team team, int location)` |
| | 371 | * called by a thread when it reaches on omp single construct, returns the thread ID of the thread that will execute the single construct. |
| | 372 | |
| | 373 | |
| | 374 | == Worksharing model == |
| | 375 | |
| | 376 | This section describes how the system functions dealing with worksharing are implemented. |
| | 377 | |
| | 378 | The global data structure `$omp_gteam` contains a FIFO queue for each thread. The queue contains work-sharing records, one record for each work-sharing or barrier construct encountered. The record contains the basic information about the construct as provided by the arguments to the arrival function, as well as the distribution chosen for that thread. |
| | 379 | |
| | 380 | The constructs are a lot like MPI collective operations, and are modeled similarly. |
| | 381 | |
| | 382 | When a thread arrives at one of these constructs, it invokes the relevant arrival function. At this point you can determine whether this thread is the first to arrive at that construct. If its queue is empty, it is the first, otherwise it is not first, and the oldest entry in its queue will be the entry corresponding to this construct. |
| | 383 | |
| | 384 | When a thread is the first thread to arrive at a construct, a distribution is chosen for every thread and a record is created and enqueued in each thread queue (including the caller). The distributions can be chosen nondeterministically, possibly with some restrictions to achieve some tractability/soundness compromise. The record for this thread is then dequeued and the iterator returned. |
| | 385 | |
| | 386 | If a thread is not the first to arrive, its record is dequeued and compared with the arguments given in the function call. They should match, and if they don't, an error is reported. This indicates that either threads encountered constructs in different orders or the loop parameters changed. |
| | 387 | |
| | 388 | |
| | 389 | == Translations of specific directives == |
| | 390 | |
| | 391 | === Translating `parallel` === |
| | 392 | |
| | 393 | `parallel`: this spawns some nondeterministic number of threads. We will assume there is a constant `THREAD_MAX` defined somewhere. The number of threads created will be between 1 and `THREAD_MAX` (inclusive). Each thread is assigned an ID. The original ("master") thread has ID 0. All threads execute the parallel region. |
| | 394 | |
| | 395 | {{{ |
| | 396 | float x; // shared |
| | 397 | int y; // private |
| | 398 | #pragma omp parallel shared(x) private(y) |
| | 399 | { |
| | 400 | ... |
| | 401 | x=5.2; |
| | 402 | y=3; |
| | 403 | ... |
| | 404 | } |
| | 405 | }}} |
| | 406 | |
| | 407 | => |
| | 408 | |
| | 409 | {{{ |
| | 410 | float x; |
| | 411 | int y; |
| | 412 | { // begin parallel construct |
| | 413 | int _nthreads = 1+$choose_int(THREAD_MAX); |
| | 414 | $omp_gteam gteam = $omp_gteam_create($here, nthreads); |
| | 415 | $omp_gshared x_gshared = $omp_gshared_create(gteam, &x); |
| | 416 | |
| | 417 | $parfor (int _tid : {0..nthreads-1}) { |
| | 418 | $omp_team team = $omp_team_create($here, gteam, _tid); |
| | 419 | $omp_shared x_shared = $omp_shared_create(team, x_gshared); |
| | 420 | int _y; // private variable |
| | 421 | |
| | 422 | ... |
| | 423 | { // "x=5.2": |
| | 424 | float tmp = 5.2; |
| | 425 | |
| | 426 | $omp_write(x_shared, x_shared->local, &tmp); |
| | 427 | } |
| | 428 | _y = 3; |
| | 429 | ... |
| | 430 | $omp_barrier_and_flush(team); // implicit at end of parallel region |
| | 431 | $omp_shared_destroy(x_shared); |
| | 432 | $omp_team_destroy(team); |
| | 433 | } // end $parfor |
| | 434 | $omp_gshared_destroy(x_gshared); |
| | 435 | $omp_gteam_destroy(gteam); |
| | 436 | } // end parallel construct |
| | 437 | }}} |
| | 438 | |
| | 439 | All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared. This is determined by the clauses and the default rules as specified in the OpenMP Standard. Obviously any variable declared within the construct itself must be private. |
| | 440 | |
| | 441 | For all private variables `y` not declared within the parallel construct, create a new variable of the same type, `_y`. The new variable is declared within the thread scope. If `y` is also firstprivate, then `_y` is initialized with the value of `y`, e.g. `int _y=y;`. Otherwise, `_y` is uninitialized, so has an undefined value. |
| | 442 | |
| | 443 | === Translating `for` === |
| | 444 | |
| | 445 | Try to determine whether the loop iterations are independent. In that case, they can all be executed by one thread. |
| | 446 | Otherwise: |
| | 447 | |
| | 448 | {{{ |
| | 449 | #pragma omp for |
| | 450 | for (i=0; i<n; i++) |
| | 451 | S |
| | 452 | }}} |
| | 453 | |
| | 454 | => |
| | 455 | |
| | 456 | {{{ |
| | 457 | { |
| | 458 | $domain loop_domain = {0..n-1}; |
| | 459 | $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY); |
| | 460 | |
| | 461 | $for (int i : my_iters) { |
| | 462 | translate(S); |
| | 463 | } |
| | 464 | $barrier_and_flush(team); |
| | 465 | } |
| | 466 | }}} |
| | 467 | |
| | 468 | We can vary the way the sub-domains are chosen to explore different tradeoffs and strategies. On one extreme, every kind of partition can be explored; on the other, some fixed strategy like round-robin with chunksize 1 can be used. This only changes the definition of `$omp_arrive_loop`, not the translation above. |
| | 469 | |
| | 470 | {{{ |
| | 471 | #pragma omp parallel for collapse(3) |
| | 472 | for (i=0; i<n; i++) |
| | 473 | for (j=0; j<m; j++) |
| | 474 | for (k=0; k<l; k++) { |
| | 475 | S |
| | 476 | } |
| | 477 | }}} |
| | 478 | |
| | 479 | => |
| | 480 | |
| | 481 | {{{ |
| | 482 | { |
| | 483 | $domain loop_domain = {0..n-1, 0..m-1, 0..l-1}; |
| | 484 | $domain(3) my_iters = ($domain(3))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY); |
| | 485 | |
| | 486 | $for (int i, j, k : my_iters) { |
| | 487 | translate(S); |
| | 488 | } |
| | 489 | $omp_barrier_and_flush(team); |
| | 490 | } |
| | 491 | }}} |
| | 492 | |
| | 493 | === Translating `reduction` clause === |
| | 494 | |
| | 495 | {{{ |
| | 496 | #pragma omp for reduction(+:x,y) |
| | 497 | for (i=a; i<b; i++) { |
| | 498 | S |
| | 499 | } |
| | 500 | }}} |
| | 501 | |
| | 502 | => |
| | 503 | |
| | 504 | The following translation assumes that $omp_shared has already been created somewhere. |
| | 505 | |
| | 506 | {{{ |
| | 507 | { |
| | 508 | $domain loop_domain = {a..b-1}; |
| | 509 | $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY); |
| | 510 | double _x=0.0, _y=0.0; // not the local view of the shared variable x/y |
| | 511 | |
| | 512 | $for (int i : my_iters) { |
| | 513 | translate(S) but replace x with _x and y with _y; |
| | 514 | } |
| | 515 | $omp_apply_assoc(x_shared, CIVL_SUM, &_x); |
| | 516 | $omp_apply_assoc(y_shared, CIVL_SUM, &_y); |
| | 517 | $omp_barrier_and_flush(team); |
| | 518 | } |
| | 519 | }}} |
| | 520 | |
| | 521 | |
| | 522 | |
| | 523 | === Translating `sections` === |
| | 524 | |
| | 525 | Say there are `numSections` sections. This number is known statically. |
| | 526 | |
| | 527 | {{{ |
| | 528 | #pragma omp sections |
| | 529 | #pragma omp section |
| | 530 | S0 |
| | 531 | #pragma omp section |
| | 532 | S1 |
| | 533 | ... |
| | 534 | }}} |
| | 535 | |
| | 536 | => |
| | 537 | |
| | 538 | {{{ |
| | 539 | { |
| | 540 | $domain(1) my_secs = $omp_arrive_sections(team, SEC_LOC++, numSections); |
| | 541 | |
| | 542 | $for (int i : my_secs) { |
| | 543 | switch (i) { |
| | 544 | case 0: { |
| | 545 | translate(S0); |
| | 546 | break; |
| | 547 | } |
| | 548 | case 1: { |
| | 549 | translate(S1); |
| | 550 | break; |
| | 551 | } |
| | 552 | ... |
| | 553 | } /* end of switch */ |
| | 554 | } /* end of $for loop */ |
| | 555 | $omp_barrier_and_flush(); |
| | 556 | } |
| | 557 | }}} |
| | 558 | |
| | 559 | |
| | 560 | === Translating `single` === |
| | 561 | |
| | 562 | {{{ |
| | 563 | #pragma omp single |
| | 564 | S |
| | 565 | }}} |
| | 566 | |
| | 567 | => |
| | 568 | |
| | 569 | {{{ |
| | 570 | int owner = $omp_arrive_single(team, SINGLE_LOC++); |
| | 571 | |
| | 572 | if (owner == _tid) { |
| | 573 | translate(S); |
| | 574 | } |
| | 575 | $omp_barrier_and_flush(team); |
| | 576 | }}} |
| | 577 | |
| | 578 | |
| | 579 | === Translating `barrier` === |
| | 580 | |
| | 581 | {{{ |
| | 582 | #pragma omp barrier |
| | 583 | }}} |
| | 584 | |
| | 585 | => |
| | 586 | |
| | 587 | {{{ |
| | 588 | $omp_barrier_and_flush(team); |
| | 589 | }}} |
| | 590 | |
| | 591 | === Translating `critical` === |
| | 592 | |
| | 593 | Basically, use a lock for each critical name, plus one for the "no name". All threads must obtain lock to enter the critical section, then release it. |
| | 594 | I.e., if there are critical sections name a, b, and c, there should be global root-scope variables of boolean type named `_critical_noname`, `_critical_a`, etc. |
| | 595 | |
| | 596 | {{{ |
| | 597 | #pragma omp critical(a) |
| | 598 | S |
| | 599 | }}} |
| | 600 | |
| | 601 | => |
| | 602 | |
| | 603 | {{{ |
| | 604 | ... |
| | 605 | _Bool _critical_a = $false; |
| | 606 | . |
| | 607 | . |
| | 608 | . |
| | 609 | $when (!_critical_a) _critical_a=$true; |
| | 610 | translate(S); |
| | 611 | _critical_a=$false; |
| | 612 | }}} |
| | 613 | |
| | 614 | === Translating `atomic` === |
| | 615 | |
| | 616 | In general, reads and writes to shared variables will be processed using the protocols described above. However if the operation occurs within an omp atomic construct, it is translated differently. |
| | 617 | |
| | 618 | TODO: need to look up the rules on the different flavors of atomics. |
| | 619 | |
| | 620 | If sequentially consistent atomic... |
| | 621 | |
| | 622 | If non-sequentially consistent atomic... |
| | 623 | |
| | 624 | |
| | 625 | === Translating`ordered` === |
| | 626 | |
| | 627 | This can only be used inside and OMP `for` loop in which the pragma used the `ordered` clause. (Check that.) It indicates that the specified region must be executed in iteration order. |
| | 628 | |
| | 629 | In this case the system function must return an int iterator in which the ints occur in loop order. |
| | 630 | |
| | 631 | {{{ |
| | 632 | #pragma omp for ordered |
| | 633 | for (i=a; i<b; i++) { |
| | 634 | ... |
| | 635 | #pragma omp ordered |
| | 636 | S1 |
| | 637 | ... |
| | 638 | #pragma omp ordered |
| | 639 | S2 |
| | 640 | ... |
| | 641 | } |
| | 642 | }}} |
| | 643 | |
| | 644 | => |
| | 645 | |
| | 646 | {{{ |
| | 647 | { |
| | 648 | $domain loop_domain = {a..b}; |
| | 649 | $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY); |
| | 650 | int order1=a, order2=a; |
| | 651 | |
| | 652 | $for (int i : my_iters) { |
| | 653 | ... |
| | 654 | $when (order1==i) { |
| | 655 | translate(S1); |
| | 656 | order1++; |
| | 657 | } |
| | 658 | ... |
| | 659 | $when (order2==i) { |
| | 660 | translate(S2); |
| | 661 | order2++; |
| | 662 | } |
| | 663 | ... |
| | 664 | } |
| | 665 | } |
| | 666 | }}} |
| | 667 | |
| | 668 | === Translating `master` === |
| | 669 | |
| | 670 | {{{ |
| | 671 | #pragma omp master |
| | 672 | S |
| | 673 | }}} |
| | 674 | |
| | 675 | => |
| | 676 | |
| | 677 | {{{ |
| | 678 | if (_tid == 0) { |
| | 679 | translate(S); |
| | 680 | } |
| | 681 | }}} |
| | 682 | |
| | 683 | === Translating `nowait` === |
| | 684 | |
| | 685 | Just leave out the `$omp_barrier_and_flush` at the end of the translated construct. |
| | 686 | |