(EC)2 2009: Challenge Problems

This web page presents challenge problems for the (EC)2 Challenge Problem session of the 2009 (EC)2 (Exploiting Concurrency Efficiently and Correctly) workshop. The objective is to collect specific problems that illustrate more general, difficult challenges that arise in attempting to develop or verify correct, efficient concurrent programs. Everyone is welcome to contribute problems by emailing ec2-org@googlegroups.com. Even more welcome are solutions to the problems, or ideas or insights concerning them. These may be presented and discussed at the workshop at 4:00 PM on June 26, 2009.


The Problems

Matrix multiplication: correctness

Contributor: Stephen Siegel

In this problem we are given a parallel, C/MPI-based implementation of matrix-matrix multiplication. It is based on an example from the book Using MPI: Portable Parallel Programming with the Message-Passing Interface by William Gropp, Ewing Lusk, and Anthony Skjellum. The goal is to prove that this program produces the correct output for any given input. (This includes showing the program terminates, does not deadlock, uses MPI correctly, etc.) The straigtforward sequential version of matrix multiplication is also given below. The problem can then be re-stated as follows: show that the two programs are functionally equivalent.

One of the challenging aspects of the problem is the combinatorial blow-up resulting from the different orders in which the workers can return their results to the manager. This can lead to extreme state explosion in explicit state enumeration techniques.

Sequential version:

#include <stdlib.h>
#include <stdio.h>
/* define N, L, M before compiling */

void printMatrix(int numRows, int numCols, double *m) {
  int i, j;
  for (i = 0; i < numRows; i++) {
    for (j = 0; j < numCols; j++) {
      printf("%6.1f ", m[i*numCols + j]);
    }
    printf("\n");
  }
  printf("\n");
  fflush(stdout);
}

void vecmat(double vector[L], double matrix[L][M], double result[M]) {
  int j, k;
  for (j = 0; j < M; j++)
    for (k = 0, result[j] = 0.0; k < L; k++)
      result[j] += vector[k]*matrix[k][j];
}

int main(int argc, char *argv[]) {
  int i, j, k;
  double a[N][L], b[L][M], c[N][M];

  /* read a, b somehow */
  for (i = 0; i < N; i++) vecmat(a[i], b, c[i]);
  printMatrix(N, M, &c[0][0]);
  return 0;
}
    

Parallel version:

#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
/* define N, L, M before compiling */
#define comm MPI_COMM_WORLD

void printMatrix(int numRows, int numCols, double *m) {
  int i, j;
  for (i = 0; i < numRows; i++) {
    for (j = 0; j < numCols; j++) {
      printf("%6.1f ", m[i*numCols + j]);
    }
    printf("\n");
  }
  printf("\n");
  fflush(stdout);
}

void vecmat(double vector[L], double matrix[L][M], double result[M]) {
  int j, k;
  for (j = 0; j < M; j++)
    for (k = 0, result[j] = 0.0; k < L; k++)
      result[j] += vector[k]*matrix[k][j];
}

int main(int argc, char *argv[]) {
  int rank, nprocs, i, j;
  MPI_Status status;
  
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(comm, &rank);
  MPI_Comm_size(comm, &nprocs);
  if (rank == 0) {
    double a[N][L], b[L][M], c[N][M], tmp[M];
    int count;

    /* read a, b somehow */
    MPI_Bcast(b, L*M, MPI_DOUBLE, 0, comm);
    for (count = 0; count < nprocs-1 && count < N; count++)
      MPI_Send(&a[count][0], L, MPI_DOUBLE, count+1, count+1, comm);
    for (i = 0; i < N; i++) {
      MPI_Recv(tmp, M, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG, comm, &status);
      for (j = 0; j < M; j++) c[status.MPI_TAG-1][j] = tmp[j];
      if (count < N) {
	MPI_Send(&a[count][0], L, MPI_DOUBLE, status.MPI_SOURCE, count+1, comm);
	count++;
      }
    }
    for (i = 1; i < nprocs; i++) MPI_Send(NULL, 0, MPI_INT, i, 0, comm);
    printMatrix(N, M, &c[0][0]);
  } else {
    double b[L][M], in[L], out[M];

    MPI_Bcast(b, L*M, MPI_DOUBLE, 0, comm);
    while (1) {
      MPI_Recv(in, L, MPI_DOUBLE, 0, MPI_ANY_TAG, comm, &status);
      if (status.MPI_TAG == 0) break;
      vecmat(in, b, out);
      MPI_Send(out, M, MPI_DOUBLE, 0, status.MPI_TAG, comm);
    }
  }
  MPI_Finalize();
  return 0;
}
    

Concurrent Data Structures: Two Examples

Contributor: Rajeev Alur

Building tools to debug/verify concurrent data structures is a worthy goal for the CAV community. In particular, the objective should be to prove/disprove that a concurrent implementation, given as C/Java code, is linearizable. The book The Art of Multiprocessor Programming by Herlihy and Shavit, contains many such implementations. The following two instances are concrete challenges:
  1. The LockFreeList implementation in Chapter 9 is buggy (i.e. it is not linearizable). The bug was revealed by students during discussion of the algorithm in my course. It would be great if a model checker can discover the bug. The bug can be revealed by a bounded and small instance of the problem. However, the checker would need to model the semantics of various Java features in a reasonably detailed manner.
  2. The LockFreeHashTable in Chapter 13 seems correct. It is wonderfully interesting and complex. In particular, it allows resizing to proceed concurrently with search using a nontrivial ordering of keys. The challenge is first to come up with a compelling manual proof of the correctness in its most general form. The next step would be to come up with a more detailed proof in an automated theorem prover.


Proving Race-freedom for Data-Parallel Programs

Contributor: Madhusudan Parthasarathy

Race-freedom is an important generic correctness property for concurrent programs. The complexity of memory models compels banishing data-races for almost all, if not all, programs.

A challenging problem is to design programming languages that assure race-freedom by design. Since this seems hard, without severe restrictions on programmability, we could at least try to verify data-race freedom of programs that use relatively simple communication and coordination.

In particular, can we build manual (or automatic) proofs of race-freedom for at least non-trivial "data-parallel" programs? By data-parallel programs, I mean programs that exploit parallelism embarrassingly by computing in parallel on mostly disjoint parts of data. There is a rich class of implementations of concurrent algorithms available (JavaGrande, BarnesHut, Sorting, Triangulation, etc.); what do proofs for data-race freedom on these programs look like?


Dynamic Verification, Reduction, API Design, Formal Specification, and Runtime Control

Contributor : Ganesh Gopalakrishnan

It is unarguable that the bulk of concurrent programming will rely upon APIs -- often in a "tail that wags the dog" relationship - i.e., the programming language statements may often set up control / data plumbings and get out of the way, letting the API do the bulk of the work! If you don't see this happening already, consider these APIs : MPI, OpenMP, CUDA, Pthreads, various transaction memories, Intel Ct, Microsoft TPL, Microsoft Axum. Why is this a dominant trend? One important reason is that with the exploding base functionality provided by the new types of multi-core CPUs and libraries, APIs are the easiest means of encapsulating the functionality and make it available within a programming language. A number of dynamic verification tools already cater to verifying programs that are API-rich. A partial list includes: Verisoft, Java Pathfinder, Bogor, CHESS, ISP, Inspect, MoDist, MCC, and CREST. Unless we can verify programs that employ multiple APIs through push-button verification tools that look like debuggers, pedagogy as well as formal methods uptake by practitioners will be impacted.

Here are many general challenges and some specific ones:

See http://www.cs.utah.edu/formal_verification/Concurrency_Education/

Checking Equivalence of SPMD Programs Using Non-Interference

Contributor: Stavros Tripakis

The method outlined in the technical report below may be useful in handling some of the problems raised above, in particular "Proving Race-freedom for Data-Parallel Programs" and "Matrix multiplication: correctness".

http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-42.html