\section{Motivation}

Message-passing is used pervasively in high-performance computing
(HPC).  This is expected to continue for the foreseeable future.  For
example, a recent survey found that 93\% of the applications being
developed under the U.S.\ Department of Energy's Exascale Computing
Project are currently using or plan to use the Message Passing
Interface \cite{bernholdt-etal:2018:mpi_exascale}.\footnote{Table 4,
  Question 20, ``Do you expect the exascale version of your program to
  use MPI?''.  Based on the 77 projects which responded to the survey,
  out of a total of 97 active projects.  For the reported result,
  ``application'' includes application and software technology
  projects.}

%(MPI, \cite{mpi-forum:2015:mpi31})

These applications are run on the world's most powerful
supercomputers---machines which are extremely expensive to build,
operate, and maintain.  Time on these machines is very valuable, and
the machines consume an enormous amount of energy.  Time spent
debugging at scale, or re-running programs after a defect was
discovered at scale, wastes these resources.

Moreover, HPC projects deal with issues of fundamental importance to
science, engineering, and society more generally.  They include
applications to predict earthquake damage
\cite{johansen:2017:earthquake}, model the global climate
\cite{drake:2005:overview}, perform atomic-level simulations of
chemical and biological systems \cite{valiev:2010:nwchem}, and to
investigate the electronic structure of molecules
\cite{mathuriya:2017:embracingquantum}, among many others.  These
programs inform both profound scientific conclusions and decisions of
the utmost importance to society.

For all of these reasons, it is imperative to develop effective
methods for verifying correctness of message-passing HPC
programs. Indeed, there is a growing consensus in the HPC community
that better methods are needed to verify the correctness of HPC
applications \cite{gopalakrishnan-etal:2017:correctness}.

% you can cite many more things here about the correctness crisis

% (DOI: 10.1109/MCSE.2017.3421558) ...

%% various meanings of correctness
There are many aspects to the correctness of a program.  A programming
language, such as C, imposes certain \emph{generic} correctness
requirements that apply to all programs in that language.  For
example, any correct C program should not use an array index that lies
outside the bounds of the array, dereference a null pointer, or invoke
\texttt{free} on a pointer that was not returned by an earlier call to
\texttt{malloc}.  The use of concurrency APIs, such as MPI, imposes
further generic requirements; for example, a correct MPI program
should not deadlock, or have processes in a communicator invoke
collective operations in different orders.

While the work of this dissertation does apply to these generic
properties, the focus is more on \emph{program-specific} correctness
properties.  These properties assert that a program---or an individual
procedure within a program---should behave in the intended way, and
should compute what the developer expects.

By their very nature, program-specific properties require the
developer to \emph{specify} intended behavior.  While there are many
different approaches to specifying programs, one of the most
influential is the \emph{code contract}.  This approach was introduced
by Bertrand Meyer \cite{Meyer:1992:ADC} and realized in the Eiffel
programming language \cite{Meyer:1988:eiffel}, and has since
spread to many contexts. Larch \cite{vandevoorde:1994:specializedprocedures}
was one of the early specification languages and was applicable to
multiple programming languages, the Java Model Langauge
\cite{leavens:1999:jml} was developed to specify Java programs and has been
extremely influential, Spark \cite{spark} is a highly successful
specification language for Ada \cite{ada2012}, and ACSL
\cite{ACSL:Reference} is a specification language for C that is used
with the Frama-C analysis platform.

The most fundamental kind of code contract is the \emph{procedure
  contract}.  A procedure contract consists primarily of a pair of
\emph{pre-} and \emph{post-conditions}.  These conditions specify a
relation between the input and the output of the procedure.  The
contract asserts that \emph{if} the pre-condition holds when the
procedure is called, \emph{then} the post-condition will hold when the
procedure returns.  This fundamental semantic idea originated in Hoare
logic \cite{hoare:1969:axiomatic}, but is extended in several ways to
deal with procedure calls, side effects, and other complexities that
arise in real programming languages.

The contract approach enables modular and composite verification of
programs.  For every procedure in a program, one verifies whether it
satisfies its contract under the assumption that all other procedures
satisfy their contracts.  By doing so, the verification problem is
decomposed into a number of smaller and independent sub-problems.
This sort of decomposition, or something like it, is the only way that
verification has any chance of scaling to software of the considerable
size and complexity seen in modern HPC.

There are many different ways to ``verify'' that a procedure satisfies
its contracts.  These include the automatic insertion of assertions
for runtime checks, and automatic generation of test cases.  Such
approaches cannot verify with certainty, but only provide varying
degrees of evidence for the likelihood that a contract holds.

The most rigorous approach to contract verification is \emph{deductive
  reasoning}, which works by generating verification conditions that
are individual theorems, which, if proved, imply a contract holds.
Advances in automated theorem proving have eliminated much of the
human effort previously required to discharge the proof obligations,
but even the best tools still require significant manual guidance in
the form of invariants, lemmas, and various ``hints'' for the theorem
provers.  This methodology has been realized in many contract
verification tools, e.g.,
\cite{barnett:2006:boogie,gordon:1989:mechanizing,leino:1999:checkingjava,
  dannenberg:1982:formal, cuoq:2011:wp, filliatre:2013:why3,
  leino:2010:dafny, ball:2010:towards, barnett:2005:specsharp,
  beckert:2007:KeY}.

A common point of the majority of these tools is that they are only
suitable for sequential programs.  There has been some work extending
the ideas for shared-memory concurrent programs (e.g.,
\cite{cohen-etal:2009:vcc, blom:2017:vercors,
  amani:2017:complex}), but almost nothing for
message-passing parallelism.

In fact, it is not obvious that procedure contracts are suitable for
reasoning about message-passing programs.  First, in a message-passing
parallel program, the behavior of a procedure depends not only on its
inputs but also on the behaviors of other processes that run in
parallel.  Second, there is not a clear notion of pre- or post-state
because processes can enter and exit a given procedure at different
times.  Nevertheless, the need to ``divide and conquer'' the
verification task is as evident for message-passing programs as it is
for sequential programs.

\section{Contributions}

\subsection{A contract theory for message-passing programs}

The first contribution of this dissertation is to cope with the
problems above by developing a \emph{theory of contracts} for
message-passing programs.  The basic idea of this theory is to use a
contract to specify the collective behavior of a procedure that is
executed by all processes collectively.  We call such procedures the
\emph{collective-style procedures}.

This theory is applicable to programs containing non-collective-style
procedures, it simply does not provide a way to decompose the
specification/verification task for such procedures.  In other words,
if a collective-style procedure $f$ calls some non-collective-style
procedure $g$, then $f$ and $g$ will be specified and verified as a
unit: the user will write a contract for $f$ (but not write one for
$g$) and the verification of $f$'s contract will require the
definitions of both $f$ and $g$.

Obviously, this approach is only useful to the extent that there are
many collective-style procedures in real message-passing programs.  We
contend that such procedures are ubiquitous.  For starters, all of the
official MPI collective functions are collective-style, and the MPI
collectives are widely used.  As the ECP teams responded in
\cite{bernholdt-etal:2018:mpi_exascale}, 80\% of them use MPI
collective functions in their current applications and this percentage
increases to 82\% for exascale version of their applications.

Furthermore, we have examined source code in two large,
state-of-the-art MPI applications: the Monte Carlo particle transport
code OpenMC \cite{romano-etal:2015:openmc}, and a module
(\texttt{parcsr\U{}ls}) in the algebraic multigrid solver AMG
\cite{yang-etal:2017:amg}.  OpenMC consists of over 24 thousand lines
of C++ code, and the AMG module is over 35 thousand lines of C code.
Through a combination of static analysis and manual inspection, we
confirmed that every function in these codes either involves no MPI
communication (in which case the usual sequential contract techniques
can be applied) or is collective-style.

% For OpenMC, we confirmed that every direct call to an MPI
% communication function belongs to a collective-style function.  For
% the part of code in AMG, we confirmed that every direct call to MPI
% point-to-point communication functions belongs to a collective-style
% function.

\subsection{A method for verifying collective contracts}

The second contribution of this dissertation is a method for
automatically verifying that a collective-style function satisfies its
contract.  As explained above, deductive reasoning is typically used
for this sort of verification in the case of sequential programs.  The
main advantage of deductive reasoning is that it can perform
\emph{complete verification} for an arbitrary configuration of a
program.  But it costs the user's effort in specifying every single
function and loop in the program with a contract and a \emph{loop
  invariant}, respectively, and possibly additional annotations
specifying theories, lemmas, and so on. 

% Finding appropriate loop
% invariants is a notoriously difficult process.

Instead of deductive reasoning, our method is based on model checking
\cite{clarke:2000:modelchecking} and symbolic execution
\cite{king:1976:symbolic}.  The main advantage of this method is that it
requires no invariants or annotations beyond the procedure contracts,
and given those it is completely automated.  Furthermore, in order to
verify that a procedure $f$ satisfies its contract, it is not
necessary to have contracts for all procedures called by $f$.  The
called procedures without contracts will effectively be executed when
verifying $f$.   This enables a very incremental approach to the
specification and verification work.

The cost is that our method performs \emph{bounded verification}---it
requires small bounds to be placed on the number of processes, and
possibly on other parameters.  If the verification succeeds, it is
theoretically possible that a contract could be violated with a larger
number of processes.  The effectiveness of this approach relies on the
\emph{Small Scope Hypothesis} \cite{Jackson:2006:SAL}, which posits
that almost all defects can be found by exhaustively exploring all
possible behaviors in all small configurations of a system.

Both the contract theory and the verification method will be
formulated in this dissertation using a ``toy'' message-passing
language, which we call MiniMP.  Using a simple language with a
mathematically precise semantics is the best way to elucidate the
essential ideas, and also allows us to state and prove precise
statements about the soundness of the verification method.

% Considering the fact that the MPI standard in real world contains too
% much complexity, e.g., multiple communicators, various data types and
% C void pointers, the simplicity of MiniMP allows us to illustrate the
% ideas and prove their soundness with mathematical precisions.

\subsection{An MPI Specification Language}

The simple MiniMP language leaves out many of the ``accidental''
complexities of C and MPI.   These includes pointers and pointer 
arithmetic, specifying buffers through \code{void*} pointers and
byte-counts, use of multiple communicators, representing types and
reduction operations as values, and so on.   A practical specification
language for message-passing programs must deal with all these issues.

As the third contribution of this dissertation, we present such a
specification language for C/MPI programs.  The language extends the
existing ACSL language for sequential C.  It adds new ``MPI-aware''
primitives which realize many of the central concepts that arise
informally in the MPI Standard.  These include the notion of a
\emph{region} of memory specified by a \code{void*} pointer and size
(i.e., a buffer); a predicate for asserting that two buffers contain
the same typed values; predicates that compare memory across different
processes, and so on.  The contract system supports a significant
subset of MPI, including standard mode blocking point-to-point
communication, the wildcards \texttt{MPI\U{}ANY\U{}SOURCE} and
\texttt{MPI\U{}ANY\U{}TAG}, all blocking collective communication, all
kinds of pre-defined MPI data types and operations, and multiple
communicators.

We believe this new specification language has significant practical
potential, even without a corresponding verification tool.  For
example, it could be used to make formally precise the semantics of
the MPI collective functions, which are currently described using only
natural language in the MPI Standard.  It could also be used by
programmers simply to document the intended behavior of their code.
The language is sufficiently intuitive and readable that it can be
easily understood by programmers who are not experts in formal
verification.

\subsection{A Prototype MPI Contract-Based Verification Tool}

The fourth contribution of this dissertation is an actual verification
tool for C/MPI programs specified using the language described above.
The prototype tool implements the theoretical verification approach,
based on model checking and symbolic execution, described in Section
\ref{fillmein}.  

The tool was developed using a general verification framework, CIVL
\cite{siegel-etal:2015:civl_sc}.  The framework is based on an
intermediate verification language, CIVL-C, which provides a small set
of generic concurrency primitives on top of standard C.  We have added
an automatic transformer which consumes a C/MPI program with contracts
and produces a CIVL-C program.  The CIVL-C program contains assertions
and assumptions, in such a way that if all assertions hold on all
executions, then the original contract must hold.

This prototype implementation was evaluated with a number of simple
C/MPI programs, including MPI collective implementations used by CIVL
itself, advanced MPI collective algorithms and typical MPI
computational applications.


\begin{comment}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
A program is considered \emph{reliable}, if it always behave as
expected.  The expectation to a program is called the
\emph{specification} of the program.  A program is reliable if it
\emph{satisfies} its specification.  A specification describes the
\emph{properties} of a program without ambiguity.  For example, a
common property for message-passing parallel programs is deadlock
freedom; free of invalid pointer dereferencing is a common C
program property.

%% hard to debug and test, verification approaches are more effective:
The behaviors of a message-passing programs is asynchronous and
nondeterministic, so traditional software testing methods can hardly
cover all possible behaviors of a message-passing program
\cite{Pervez:2010:FMA:1712067.1712069, godefroid2008concurrency}.  The
research work in recent decades \cite{vakkalanka2008dynamic,
Yu:2018:CSE:3183440.3190336, khanna2018dynamic, siegel2011tass,
Luo:2017:VMP:3127024.3127032, Strout:2006:DAM:1156433.1157634} shows
that static analysis and formal verification approaches are effective
in finding bugs in message-passing concurrent systems.  These
approaches can roughly be divided into two kinds: \emph{automated} and
\emph{deductive}.

Automated analysis and verification approaches require little effort
from the user side and is able to check a set of pre-defined
\emph{standard} properties for a message-passing program.  For
example, the ISP \cite{vakkalanka2008dynamic} model checker, which
uses the model checking \cite{Clarke:2000:MC:332656} technique, takes
a message-passing program as an input and verifies whether the program
is free of deadlock or assertion violation.  Message-passing
applications usually have large codebases, hence automated approaches
are attractive to developers.

In contrast, deductive verification approaches \cite{luo2018towards,
Lopez:2015:PVM:2814270.2814302, oortwijn2016future} require the user
to provide a specification for the verifying program.  The
specification usually is written in a specific programming language so
that it can be understood by computers.  By writing a specification,
the user is able to express rich and customized properties.  A
deductive verification tool performs parametric verification for a
program against its specification in a composite and modular way.

Both kinds of the verification approaches have their disadvantages.
Automated approaches are mainly based on finite-state searching or
static analysis techniques.  The former technique requires the program
parameters to be bounded in small scopes.  The latter technique
produces false positives.  For deductive approaches, the human effort
that needs to be made is not only excessive but also requiring expert
knowledge.


%% what does this dissertation do ?
The main contribution of this dissertation is the attempt to combine
these two kinds of approaches to verify message-passing programs.  In
summary, our approach is to divide a message-passing program into a
set of \emph{collective-style} procedures.  The division is done via
assigning each collective-style procedure a specification.  By
verifying the satisfaction to the specification for each
\emph{collective-style} procedure independently, the whole program is
thus verified.  The verification of each procedure is based on
automated approaches.

%% the advantage of the combinatorial approach ?
Our approach has three main advantages.  First, it enables composite
and modular verification.  Verifying each of the divided procedures is
a much smaller problem than the verification of the whole program.
During the verification of a single procedure, all other procedures
will be assumed being verified and their verification results will be
re-used.  Second, instead of dividing a message-passing program into
arbitrary procedures, the collective-style based division allows the
user to pay more attention to the inputs and outputs of procedures
than the interaction among processes.  All the non-collective style
procedures are automatically executed during the verification.
Finally, encoding specifications in a programming language encourages
the developers to document and be careful with their code.


This approach is implemented in a general verification framework, CIVL
\cite{siegel-etal:2015:civl_sc}.  The implementation is evaluated with
MPI programs.  MPI, the Message-Passing Interface, is the
\emph{de-facto} standard of message-passing programming in HPC.  In
order to specify properties for MPI programs, we extended and designed
a \emph{contract language} for C/MPI programs from an existing
specification language for sequential C programs.  This language is
another contribution of this dissertation.
\end{comment}

\section{Outline}

The dissertation is composed of three parts.

Part I includes chapters for preliminary information. Chapter
\ref{chp:background-related} introduces background knowledge and
related work.  Chapter \ref{chp:notation} presents general notation
that will be used throughout the document.

In Part II, we describe a theory of message-passing contracts and
their verification using the simple MiniMP language.  The syntax and
semantics of the language proper are formally described in Chapter
\ref{chp:minimp}.  In Chapter \ref{chp:minimp:spec}, we present a
specification language for MiniMP and show how the correctness of a
MiniMP program can be expressed using this specification language.
Chapter \ref{chp:minimp:rules} presents an inference system, which
consists of four rules, for reasoning about MiniMP programs with
specifications.  Chapter \ref{chp:minimp:system} presents a system
based on that inference system for verifying MiniMP programs, using
model checking and symbolic execution techniques.

Part III focuses on bringing the theoretical approach to practice.
Chapter \ref{chp:mpi:model} gives an overview of the CIVL framework,
on which a contract system for verifying C/MPI programs will be built.
In Chapter \ref{chp:mpi:system}, we introduce our \emph{MPI contract
  language} and the implementation of an C/MPI contract system.
Chapter \ref{chp:mpi:eval} presents our evaluation of the contract
language and system.  It also includes a discussion of the
experimental results.

Finally, we conclude this dissertation in Chapter
\ref{chp:conclusion}.