In this chapter, we give a summary on background and related research
work.  The background knowledge includes an introduction to classical
Hoare style verification approaches for sequential
(\S\ref{sec:background:hoarelogic}) and concurrent programs
(\S\ref{sec:background:formal4concurrency}), two semiformal
verification approaches: model checking
(\S\ref{sec:background:modelchecking}) and symbolic execution
(\S\ref{sec:background:symexe}), and the MPI standard
(\S\ref{sec:background:mpi}).  The state-of-the-art verification tools
for message-passing parallel programs, as well as their relations to
this dissertation, are described in the second part.  The second part
consists of three sections: \S\ref{sec:related:modelcheckers} talks
about automated model checkers; \S\ref{sec:related:staticanalyzer}
gives an overview on other automated tools, including static analyzers
and runtime verifiers; finally, existing deductive verification tools
are briefly described in \S\ref{sec:related:deductive}.


%% HOARE LOGIC
\section{Hoare Logic} \label{sec:background:hoarelogic}

\emph{Hoare logic} \cite{hoare:1969:axiomatic} provides a proof system
for deriving safety properties of sequential programs.  It is the
basis for deductive verification.

The key notion in Hoare logic is the \emph{Hoare triple},
$\{P\}~S~\{Q\}$, where $S$ is a program statement, and $P$ and $Q$ are
logic formulas.  The formula $P$ is called a \emph{pre-condition}, and
$Q$ is called a \emph{post-condition}.  The Hoare triple is
\emph{valid} if the following holds: whenever $S$ is executed from a
state in which $P$ holds, if $S$ terminates then $Q$ will hold in the
resulting state.

Hoare's original paper uses a simple ``while'' programming language
which has assignments, while loops, if and if-else statements, and
sequential composition (\code{;}).  A set of inference rules are
defined for reasoning about Hoare triples over this language.  This
rule set is called the \emph{Hoare calculus}.   Each inference rule
has a set of premises---Hoare triples assumed to hold---and a 
conclusion---the Hoare triple which one can conclude holds given
that the premises do.
% Every inference rule also formally describes the semantics
% of a statement kind.
% SFS: which statement kind does the consequence rule describe?

Fig.\ \ref{fig:intro:hoare:rules} shows the inference rules in Hoare
calculus.  The rules are presented in the standard syntax where the
premises appear above the horizontal line and the conclusion below the
line.

The \emph{skip} rule shows that a pre-condition will be preserved by a
no-op.

The \emph{assignment} rule has no premise.  It states that the
post-condition $Q$ of the assignment \texttt{$a$ := $e$} can be
ensured, if the pre-condition is obtained by substituting $a$ with
expression $e$ in $Q$.

The \emph{sequence} rule states that if there is a condition that can
serve as the post-condition of a statement $S_0$ as well as the
pre-condition of another statement $S_1$, a Hoare triple for the
composite statement $S_0\texttt{;}S_1$ can be inferred.

The \emph{consequence} rule introduces a \emph{side condition}, which
is a logical formula over the pre- and post-conditions of two Hoare
triples.  The rule states that for a statement $S$, one triple can be
inferred from the other if the side condition is valid.

The \emph{conditional} rule expresses that a Hoare triple of a
conditional statement can be inferred from Hoare triples corresponding
to the two branches.

The \emph{loop} rule states that a Hoare triple of a while-loop can be
inferred with a condition $I$, if $I$ is an invariant of the loop
body.  This is the rule that makes proving programs with Hoare logic
hard.  The problem is that there are an infinity of loop invariants
$I$ (i.e., formulas $I$ for which the premise of the loop rule is
valid), but there is no algorithm to find just the right loop
invariant to make a proof go through.  Most program verification
systems based on deductive reasoning require the user to provide the
loop invariants.  This requires intuition about the program and
experience.

\newcommand{\ds}{\displaystyle}
\newcommand{\iskip}{1.5em}

\begin{figure}
  \[
    \begin{array}{rll}      
      \ds\frac{}{ \{P\}~\texttt{skip}~\{P\} }
      & \text{(skip)} &\\[\iskip]      
      \ds\frac{}{ \{Q[e / a]\}~a\texttt{ := }e~\{Q\} }
      & \text{(assignment)} &\\[\iskip]
      \ds\frac{\{P\}~S_0~\{R\},\,\{R\}~S_1~\{Q\}}%
      {\{P\}~S_0\texttt{;}S_1~\{Q\}}
      & \text{(sequence)} & \\[\iskip]
      \ds\frac{\{P'\}~S~\{Q'\}}{\{P\}~S~\{Q\}}
      & \text{(consequence)}
                            &
                              \text{if $P \Rightarrow P' 
                              \wedge{} Q' \Rightarrow{} Q$}\\[\iskip]
      \ds\frac{ \{P \wedge{} C\}~S_0~\{Q\},\, \{P \wedge{}
      \neg{}C\}~S_1~\{Q\}}%
      {\{P\}~\texttt{if } C \texttt{ then }S_0\texttt{ else }S_1~\{Q\}}
      & \text{(cond)} & \\[\iskip]
      \ds\frac{
        \{C\wedge{}I\}~S~\{I\}}%
      {\{I\}~\texttt{while }C\texttt{ do}~S~\{\neg{}C \wedge{} I\}} & \text{(loop)}
    \end{array}
  \]
  \caption{Inference rules of Hoare calculus.}
  \label{fig:intro:hoare:rules}
\end{figure}

%%% verification condition generation
Given appropriate loop invariants for each loop, the proof process of
applying Hoare calculus to Hoare triples can be automated with the
\emph{verification condition generation} (VCG)
\cite{barnett:2006:boogie,gordon:1989:mechanizing,leino:1999:checkingjava}
technique.  Given a Hoare triple, the VCG algorithm automatically
computes a set of formulas in the underlying logic with the property
that the validity of the formulas implies the existence of a Hoare
logic derivation of the triple.  The key step in the VCG algorithm is
the computation of the \emph{weakest pre-condition} (WP) of the
statement with respect to the post-condition. Most of the deductive
verification tools for sequential programs are based on VCG.

\begin{exmp}
  Figure \ref{fig:intro:hoare:derive} shows a derivation for a Hoare
  triple of a \emph{while} program using Hoare calculus.  In this
  figure, auxiliary variables $X$ and $Y$ are used to represent the
  values of \texttt{x} and \texttt{y} at the pre-state, respectively.
  Auxiliary variables are not part of a program, hence they will
  retain their values in the post-state.  The VCG computation for this
  example is given in Fig.\ \ref{fig:intro:hoare:wp}.  Note that the
  term $abs(\texttt{x})$ in the code stands for the absolute value of
  \texttt{x}.  By VCG, the validity of the Hoare triple is represented
  by the following condition:
  \[
  \begin{array}{ll}
    & (\texttt{x} = X \wedge{} \texttt{y} = Y) \Rightarrow{}
    \mymath{\textup{WP}}(\texttt{if x >= 0 then y = x
      else y = -x},~~\texttt{y} = abs(\texttt{x})) \\[\iskip]
    \text{i.e.,} &(\texttt{x} = X \wedge{} \texttt{y} = Y) \Rightarrow{} true
  \end{array}
\]
This verification condition is obviously $\mymath{true}$ hence the
Hoare triple is valid.  $\qed$
\end{exmp}

\begin{figure}[t]
  \centering
\begin{small}
    \fbox{
    \parbox{.96\textwidth}{
\[\begin{array}{lll}
\hspace{-.2cm}    & \{\texttt{x} = X \wedge{} \texttt{y} = Y\}~~  
    \texttt{if x>=0 then y = x else y = -x} 
    ~~\{\texttt{y} = abs(\texttt{x})\} & \hspace{-.2cm}\text{(cond 1, 2)}\\[\myskip]
    
\hspace{-.2cm}    1 & \{\texttt{x} = X \wedge{} \texttt{y} = Y \wedge{} X \geq 0\}~~
    \texttt{y = x} ~~\{\texttt{y} = abs(\texttt{x})\} & \hspace{-.2cm}\text{(conseq 1.1)}\\[\myskip]

    %%% 1.1
\hspace{-.2cm}    1.1 & \{\texttt{x} = \mymath{abs(}\texttt{x})\}~~ \texttt{y = x} ~~\{\texttt{y} =
                        abs(\texttt{x})\} &
                                            \hspace{-.2cm}\text{(assign)}\\[\myskip]

\hspace{-.2cm}    2 & \{\texttt{x} = X \wedge{} \texttt{y} = Y \wedge{} X < 0\}~~
    \texttt{y = -x} ~~\{\texttt{y} = abs(\texttt{x})\} & \hspace{-.2cm}\text{(conseq 1.2)}\\[\myskip]

    %%% 1.2
\hspace{-.2cm}    1.2 & \{-\texttt{x} = \mymath{abs(}\texttt{x})\}~~
                        \texttt{y = -x} ~~\{\texttt{y} =
                        abs(\texttt{x})\} &
                                            \hspace{-.2cm}\text{(assign)}
  \end{array}\]}}
\end{small}
\caption{The derivation of a Hoare triple.  $X$ and $Y$ are auxiliary
  variables holding the values of \texttt{x} and \texttt{y},
  respectively, at pre-state.  The symbol $\mymath{abs}$ represents
  the absolute value function in the underlying first order logic.  }
          \label{fig:intro:hoare:derive}
\end{figure}

\begin{figure}[t]
  \centering
    \fbox{
    \parbox{.96\textwidth}{
  \[
  \begin{array}{lrll}
  \multicolumn{3}{l}{\mymath{\textup{WP}}(\texttt{if x>=0 then y = x
    else y = -x},~~\texttt{x} = abs(\texttt{y}))} & (\text{cond})\\[\myskip]    

 \hspace{1cm} & = & \texttt{x} \geq{} 0 \Rightarrow{} \mymath{\textup{WP}}(\texttt{y = x},~~\texttt{x} =
   abs(\texttt{y}))~~\wedge{}&  \text{(assign)}\\

 \hspace{1cm}  & & \texttt{x} < 0 \Rightarrow{} \mymath{\textup{WP}}(\texttt{y = -x},~~\texttt{x} =
    abs(\texttt{y})) &  \text{(assign)}\\[\myskip]

 \hspace{1cm}  & = & (\texttt{x} \geq{} 0 \Rightarrow{} \texttt{x} =
    abs(\texttt{x}))~~\wedge{}~~(\texttt{x} < 0 \Rightarrow{} \texttt{x} =
    abs(\texttt{-x})) \\ [\myskip]
 \hspace{1cm}  & = & true & 
  \end{array}
  \]
}}
      \caption{The computation of the weakest pre-condition,
        $\textup{WP}(S, Q)$, for a given statement $S$ and desired
        post-condition $Q$. }
          \label{fig:intro:hoare:wp}
\end{figure}


\emph{Design by Contract}$^{\small{\texttt{TM}}}$
\cite{Meyer:1992:ADC} (DbC) is a methodology developed by Bertrand
Meyer for software correctness\footnote{The term was trademarked and
  the trademark is now owned by Eiffel Software.}.  The idea of an
important part of DbC is to use pre- and post-conditions to document,
as well as check the correctness of, program procedures.  A pair of a
pre- and post-conditions for a procedure is called a \emph{procedure
  contract}.  Procedure contracts are built-in constructs in the
Eiffel \cite{Meyer:1988:eiffel} programming language.  They can be
checked as assertions during runtime executions.

Procedure contracts is also absorbed by many programming languages for
playing the role of specifications.  Why3 \cite{filliatre:2013:why3}
and Boogie \cite{barnett:2006:boogie} are verification programming
languages with rich built-in specification constructs, including
procedure contracts.  Both of them at the same time also refer to the
deductive verifiers that can prove the functional correctness of Why3
and Boogie programs, respectively, against procedure contracts and
other kinds of specifications.  There are other programming languages
adopt the idea of procedure contracts.  For example, ACSL, JML
\cite{Gosling:2014:JLS:2636997} and Spec\#
\cite{barnett:2005:specsharp} are specification languages for C, Java
and C\#, respectively.  Spark \cite{Barnes:2012:SPA} is a dialect of
Ada for describing specifications.  Dafny \cite{leino:2010:dafny} is a
programming language with built-in primitives for specification.

A number of tools are developed for verifying programs with respect to
procedure contracts and other kinds of specifications.  Such as the
static verifiers powered by Boogie, including HAVOC
\cite{lahiri2012security, ball:2010:towards}, the Spec\# verifier and
the Dafny verifier.  Spark-2014 \cite{spark} and Frama-C
\cite{cuoq2012framac} with its plug-in WP \cite{cuoq:2011:wp} are the
verification tools using Why3.


\section{Formal Methods For Concurrency}
\label{sec:background:formal4concurrency} Hoare's original logic is not
suitable for specifying and proving concurrent programs. A Hoare
triple describes a statement in terms of its input and output.
However, in a concurrent program, the behavior of a process executing
a statement depends on not only the input but also the behaviors of
other processes that run in parallel.  Besides, Hoare logic cannot
describe \emph{liveness properties}, which are crucial for concurrent
programs.

Owicki and Gries \cite{Owicki1976}, Lamport \cite{lamport:1980:hoare}
and Takaoka \cite{Takaoka:1994:PPV:326619.326811} introduced
extensions to the original Hoare logic for proving properties of
concurrent programs.

%% Owicki & Gries:
Owicki and Gries introduced a new technique in \cite{Owicki1976} based
on Hoare logic for proving \emph{safety properties} of concurrent
programs.  In their work, concurrency is enabled by two general
statements: (1) \texttt{cobegin $S_0$ // $...$ // $S_n$ coend} means
that each statement $S_i$, $0 \leq i \leq n$, is executed by a
separate process and all these processes are run in parallel; (2)
\texttt{await $B$ then $S$} means that the process has to stop at $S$
unless $B$ is true and once $B$ is true, the process executes $S$ in
an atomic step.  Formal semantics for these two statements are defined
with new Hoare-style rules.  Then for a concurrent program $S$ that
consists of the extended set of statements, the validity of
$\{P\}~S~\{Q\}$ can be derived using the rules.

The key idea in this work is the semantics of the \texttt{cobegin ...\
  coend} statement,
\begin{align}
  \frac{\{P_1\}S_1\{Q_1\},  \{P_2\}S_2\{Q_2\},  ...\ \text{ are
  interference-free}}
  {\{P_1\wedge{}P_2\wedge{}...\ \}~ \texttt{cobegin}~S_1 \texttt{ // }
  S_2 \texttt{ // } ...\ \texttt{coend}
  ~\{Q_1\wedge{}Q_2\wedge{}...\ \} }\nonumber
\end{align}
which is straightforward except for the term ``interference-free.''

Given a \texttt{cobegin} $S_1$ \texttt{ // } $\ldots{}$ \texttt{ // }
$S_{m}$ \texttt{coend} statement.  Suppose there are Hoare logic
derivations for $\{P_i\}~S_i~\{Q_i\}$, for $1 \leq i \leq m$.  For
each sub-statement $S'$ of $S_i$, there is an assertion immediately
precedes $S'$ in the derivation, denoted $\textsf{pre}(S')$.  A
sub-statement $T$ of $S_j$, $1 \leq j \leq m$, is said to not
\emph{interfere} $\{P_i\}~S_i~\{Q_i\}$, if both of the following hold:

\begin{enumerate}
\item $T$ preserves $\textsf{pre}(S')$ for every sub-statement $S'$ in $S_i$.
\item $T$ preserves the post-condition $Q_i$ of $S_i$.
\end{enumerate}

To prove the correctness of the \texttt{cobegin} statement, one must
show that every atomic sub-statement of $S_i$ cannot interfere
$S_j$, for $1 \leq i \neq j \leq m$.  Usually, in order to prevent a
triple $\{P\}~S~\{Q\}$ from being interfered, the user needs to weaken
$P$ to accept the behaviors of other parallel statements.

% Lamport: concurrent hoare logic:
Lamport \cite{lamport:1980:hoare} generalizes Hoare logic by refining
the meaning of the triple $\{P\}~S~\{Q\}$ so that (1) $P$ must be
strong enough to guarantee that $Q$ will hold upon termination even if
the execution starts from any location inside $S$; and (2) $P$ must
remain true as long as the control is in $S$, where $S$ is a
concurrent program composed of sequential statements as well as the
\texttt{cobegin} statement used by Owicki and Gries.
There are three predicates that refer to program locations with
respect to a statement $S$: immediately before $S$, inside $S$ and
immediately after $S$.  For a sub-statement $S$ in a \texttt{cobegin}
structure, a strong enough pre-condition usually takes program
locations of different processes into consideration.

% Directed graph based proof technique:
Takaoka \cite{Takaoka:1994:PPV:326619.326811} proves a triple
$\{P\}~S~\{Q\}$ for a parallel program $S$ by analyzing a program
graph representing $S$, where vertices are sets of locations of
running processes and arrows bridging vertices are labeled by program
statements.  In addition, each vertex has an associated assertion.
Initial locations are attached with pre-conditions and final locations
are attached with post-conditions.  Then for every arrow, the
assertion of the destination vertex can be derived from the assertion
of the source vertex with respect to the statement labeling the arrow.
The rules for the assertion derivation are defined in Hoare style.
The triple $\{P\}~S~\{Q\}$ is valid iff such a graph is successfully
constructed.  In Takaoka's work, parallel programs contain
synchronizing statements.  A process $p$ can ``wait for'' another
process $q$ by executing \texttt{call} until $q$ executes
\texttt{accept}.  A similar program-graph-based analysis was done by
Newton \cite{Newton1975} earlier.  In \cite{Newton1975}, program
locations can be used to express properties in assertions though the
parallel programs have no synchronizing statement.


A more intuitive approach \cite{Ashcroft:1975:PAP:1739967.1740302} was
investigated earlier by Ashcroft.  Compared to the Hoare-style based
approaches described above, many researchers, including
Lamport\footnote{Leslie Lamport comments on his publication number 23:
  ``Ashcroft got it right.  Owicki and Gries and I just messed things
  up.  It took me quite a while to figure this out.''  He comments on
  his publication number 40: ``I think Ashcroft was right; one should
  simply reason about a single global invariant, and not do this kind
  of decomposition based on program structure.'' from
  \url{http://lamport.azurewebsites.net/pubs/pubs.html\#proving}},
believe that Ashcroft's approach is a better way to prove correctness
of concurrent programs.  In \cite{Ashcroft:1975:PAP:1739967.1740302},
safety properties of a concurrent program are expressed as a
\emph{global invariant}, i.e., an assertion that is supposed to hold
at any location of the program.  The proof is constructed by showing
that every atomic step in the program preserves the invariant.

A liveness property claims that a desired event must eventually occur
during an infinite execution of a program.  Liveness properties
involve the concept ``eventually''.  \emph{Temporal logic}, which
extends the ordinary logic with temporal operators such as
``henceforth'' and ``eventually'', is widely used to express liveness
properties.  There are different techniques for proving liveness
properties.  The \emph{well-founded set} approach was originally
applied by Floyd \cite{Floyd1993} and later adapted by Manna and
Puneli in \cite{Manna:1984:APP:2731.2734}.  Other examples include the
\emph{proof lattice} approach \cite{Owicki:1982:PLP:357172.357178} and
the \emph{rely-guarantee} approach
\cite{Stark:1985:PTR:646823.706907}.

Modern tools for verifying concurrent programs in an axiomatic way
include VCC \cite{cohen-etal:2009:vcc}, COMPLEX
\cite{amani:2017:complex}, KeY
\cite{beckert:2007:KeY}, Chalice \cite{leino:2009:verification}
and VeriFast \cite{Jacobs:2010:QTV:1947873.1947902}.  VCC is a static
verifier for concurrent C programs.  The user of VCC needs to annotate
invariants and other specifications for programs.  COMPLEX is a
verification framework that targets general imperative programming
languages.  It is based on the aforementioned proof system proposed by
Owicki and Gries.  KeY verifies Java programs against their
specifications written in JML \cite{Gosling:2014:JLS:2636997}.  It
generalizes Hoare logic for Java to the \emph{first order Java Dynamic
  Logic}.  Chalice uses a permission based approach to reason about
multi-threaded concurrent programs.  VeriFast utilizes separation
logic \cite{reynolds2002separation} for verifying multi-threaded C and
Java programs.

Most of the extensions to Hoare logic for coping with concurrency
problems target the shared-memory concurrency.  To the best of the
author's knowledge, only very few such efforts were contributed to the
message-passing concurrency.

%% MODEL CHECKING
\section{Model Checking} \label{sec:background:modelchecking} Unlike
the deductive verification techniques described in the previous two
sections that can verify infinite state systems, model checking
\cite{clarke:2000:modelchecking} is a technique for verifying finite
state systems.  Although the technique is restricted to finite state
systems, model checking has the benefit of being performed
automatically.  In addition, model checking is widely used for
hardware verification because many hardware systems have a finite
number of states.  Furthermore, one can convert an infinite state
system to a finite state system through a combination of abstraction
and restriction.  For example, to apply model checking to a
message-passing concurrent system, one can (1) abstract the actual
complex communication protocols to simple message channels
\cite{Siegel:2005:MWM:1065944.1065957, 10.1007/978-3-540-30579-8_27,
  siegel2007verifying}, and (2) bound the number of processes or the
size of the message channels.  In many cases, errors in systems can be
found by model checkers with small bounds on program parameters.


Transforming a system to a \emph{finite state model} is the first step
of using model checking.  The second step is to give a specification,
which is commonly in the form of temporal logic assertions.

A model checker verifies the satisfaction of a finite state model to a
specification by searching all reachable states in the finite state
space of the model.  A state space is usually structured as a directed
graph, where a node is a state and an edge is a transition which
alters a state in an atomic step.

Model checking is effective for verifying properties of concurrent
systems.  A concurrent systems can be abstracted to an
\emph{interleaving model}. In an interleaving model, the real-world
asynchronous executions are represented by nondeterministic choices
on atomic transitions. For example, the parallel execution of two
atomic commands $a$ and $b$ can be modeled as a nondeterministic
choice over two transition sequences: $a$ followed by $b$ or $b$
followed by $a$.

A challenge in model checking is the state explosion
problem. Typically, the number of states grows exponentially when the
number of components in a concurrent system increases.  Hence, many
model checkers can only scale to programs whose inputs are bounded
with small concrete values. But model checking is still widely used to
solve real problems for two reasons.  First, there is a belief in the
\emph{Small Scope Hypothesis} \cite{Jackson:2006:SAL}, which states
that most bugs can be found by exploring all possible program
executions with inputs, the number of processes/threads, and other
program parameters being bounded in small scopes.  Second, many
sophisticated \emph{reduction} techniques allow model checkers to only
search in a sub-space of the full state space without loss of
soundness.  For example, using the \emph{partial-order reduction}
(POR) technique \cite{godefroid:1993:refining,
  10.1007/BFb0013032,Valmari1992}, a model checker usually only needs
to explore one among a number of commutative transitions.

%% SYMBOLIC EXECUTION
\section{Symbolic Execution} \label{sec:background:symexe}
Symbolic execution is an approach for
testing and analyzing programs \cite{king:1976:symbolic}. It replaces
the concrete program inputs with symbols, each of which represents a
set of concrete values.  During the symbolic execution, a boolean
formula, the \emph{path condition} (PC), is maintained.  PC denotes
the condition under which an execution path is feasible.  Symbols are
also known as \emph{symbolic constants}.  Expressions over constants
and symbolic constants are called \emph{symbolic expressions}.  In
symbolic execution, PC and variable values are all symbolic
expressions.


Symbolic execution is able to explore all possible paths of a program
that a concrete execution may take unless there are an infinite number
of them.  Different paths are generated from the different choices
that a concrete execution can make at branches.  During the
exploration of each path, PC initially is assigned the $\mymath{true}$
value and gets updated at each branch.  The updated value is a
conjunction of the old PC value and the branch condition corresponding
to the choice made in the path.  Once PC becomes unsatisfiable, the
path being explored is infeasible and will be ignored immediately.
Automated theorem provers \cite{Z3Prover, barrett-etal:2011:cvc4,
  alt-ergo} are used to reason about symbolic expressions during the
exploration.

\begin{exmp}

Considering the following C code:

\begin{center}
  \begin{programNoNum}
\ccode{if}~(x~>=~0)\\
~~y~=~x;\\
\ccode{else}\\
~~y~=~-x;\\
\end{programNoNum}
\end{center}

Suppose the variable \texttt{x} holds a symbolic expression $X$.  A
symbolic execution engine will explore two paths for this code
snippet: one path leads to a result where the variable \texttt{y} has
value $X$ and its PC becomes $X \ge 0$; the other path leads to a
result where the variable \texttt{y} has value $-X$ and its PC becomes
$X < 0$. $\qed$
\label{exmp:intro:symexe}
\end{exmp}

%% MPI
\section{MPI} \label{sec:background:mpi} The Message-Passing Interface
(MPI) standard specifies a library for writing message-passing
parallel programs in C/C++ and Fortran.  This library includes a large
number of functions, constants, and data types.  For an MPI program,
every process has its private storage but shares nothing with others.
Communication among processes is carried out by transferring data
using different MPI functions.  By far the most common way that it is
done, an MPI program is written, compiled, and linked to generate an
executable file. The program is executed by instantiating a number of
processes, each of which runs a copy of the executable.

In MPI programs, a \emph{communicator} is an abstraction of a
``communication universe'' for processes.  A communicator comprises an
ordered group of $n$ ($1 \leq{} n$) processes.  Processes are
identified by $0, 1, ..., n-1$.  The numerical identifier of a process
in a communicator is called the \emph{rank} of the process.  A process
may belong to multiple communicators and is assigned different
identifiers in each of them.  But there is a default communicator,
named \texttt{MPI\U{}COMM\U{}WORLD}, 
which contains all processes that are invoked when a program is
launched.  The rank of a process in the default communicator can be
used as the unique ID.  By branching on process ranks, different
processes may behave differently even though they run the same
program.

Communication operations are performed through communicators.
Conceptually, a communicator includes a set of message channels.
Every message channel can be identified by an ordered pair $(i, j)$
($0 \leq i, j < n$).  In general, a channel behaves like a
First-In-First-Out (FIFO) queue for buffering messages that were sent
from a process $i$ but have not been received by process $j$.  When a
process receives an message with a specific \emph{message tag}, it
attempts to pull out the first message matched the tag in the
corresponding channel.  If a wildcard message tag
(\texttt{MPI\U{}ANY\U{}TAG}) is used for the receive operation, the
message channel behaves in exact FIFO order, i.e., the first message
in the channel will be dequeued.

In MPI, communication can be nondeterministic.  A process can perform
a receive operation without specifying an exact sender by using the
pre-defined constant \texttt{MPI\U{}ANY\U{}SOURCE} as the source
argument.  Such a receive operation is called a \emph{wildcard
  receive}.  Conceptually, a wildcard receive performed by process $i$
nondeterministically chooses an integer $j$ ($0 \leq j < n$) for which
channel $(j, i)$ contains a matching message.  It then dequeues the
first such message from the channel.

\begin{exmp}
  Figure \ref{fig:mpi:simple} is a simple C/MPI program.  Constructs
  defined in the MPI library all start with an ``\texttt{MPI\U{}}''
  prefix.  \texttt{MPI\U{}COMM\U{}WORLD} is the constant that refers
  to the default communicator.  The \texttt{MPI\U{}Init} function
  initiates the message-passing environment.  Every process can obtain
  its rank in a communicator by calling \texttt{MPI\U{}Comm\U{}rank}.
  The \texttt{MPI\U{}Finalize} function terminates the environment.

  In this example, the process of rank 0 sends a message to the
  process of rank 1.  The point-to-point (P2P) communication is
  carried out by a pair of standard P2P send and receive functions:

\begin{figure}[t]
  \begin{program}
    \ccode{\#include}~<mpi.h>\\
    \ccode{int}~main(\ccode{int}~argc,~\ccode{char}~*argv[])~\lb~\\
    ~~\ccode{int}~rank,~dat~=~0;~\\
    ~~MPI\civlU{}Init(\&argc,~\&argv);~\\
    ~~MPI\civlU{}Comm\civlU{}rank(MPI\civlU{}COMM\civlU{}WORLD,~\&rank);
\mycomment{//gets the rank of this
  process}\\
~~\ccode{if}~(rank~==~0)~\\
~~~~MPI\civlU{}Send(\&dat,~1,~MPI\civlU{}INT,~1,~0,~MPI\civlU{}COMM\civlU{}WORLD);\\
~~\ccode{else~if}~(rank~==~1)\\
~~~~MPI\civlU{}Recv(\&dat,~1,~MPI\civlU{}INT,~0,~0,~MPI\civlU{}COMM\civlU{}WORLD,
MPI\civlU{}STATUS\civlU{}IGNORE);\\
~~MPI\civlU{}Finalize();\\
\rb\\
\end{program}
\caption{A simple C/MPI program. Process 0 sends and process 1
  receives a message. The remaining processes perform no operation.}
\label{fig:mpi:simple}
\end{figure}
\end{exmp}

\begin{enumerate}
  \item the \texttt{MPI\U{}Send} function (at line 7) sends the data,
  which is stored in \texttt{dat} and of the size of one
  \texttt{MPI\U{}INT}, to the process of rank 1 in the default communicator
  with a tag 0.
  
\item the \texttt{MPI\U{}Recv} function (at line 9) receives the
  message tagged by 0, in which the data is expected to be no larger
  than the size of one \texttt{MPI\U{}INT} and will be saved in
  \texttt{dat}, from the process of rank 0 in the default
  communicator.
\end{enumerate}

No communication action is performed by any other process. $\qed$

\section{Model Checkers for MPI} \label{sec:related:modelcheckers} 
%%%% Model checkers CIVL, ISP, DMAPI, ...
Many automated formal verification tools for MPI are based on model
checking \cite{clarke:2000:modelchecking}.  ISP
\cite{vakkalanka2008dynamic} and DAMPI
\cite{Vo:2010:SDD:1884643.1884681} are dynamic MPI model checkers.
They actually execute the MPI program being verified but exhaustively
rearrange different schedules so that eventually all possible
schedules will be covered.  These schedules differ in the matches of
MPI send and receive operations, as well as the choices made at a
nondeterministic point (e.g. wildcard receive or
\texttt{MPI\U{}Test}).
ISP and DAMPI support a wide range of MPI
primitives, including blocking and non-blocking point-to-point
communications as well as collective communications.  They are able to
check deadlock freedom and other safety properties for MPI programs.

The advantage of such dynamic model checking approaches is that each
schedule can be explored almost as fast as running the parallel
program in a sequential way.  Nevertheless, these model checkers can
only verify a program for a single concrete setting at a time, i.e.,
both the number of processes and the program inputs have to be
concrete.


The model checking technique can be combined with symbolic execution
\cite{king:1976:symbolic}.  With symbolic execution, a model checker
can verify an MPI program for arbitrary program inputs.
MPI-SV\cite{Yu:2018:CSE:3183440.3190336} uses a symbolic execution
engine to explore paths of an MPI program by letting processes execute
in a \emph{round-robin} manner.  These paths cover all possible
choices that processes can make at different branches while contain a
small amount of interleavings of processes.  Once a violation-free
path is explored, a \emph{Communicating Sequential Processes} (CSP)
\cite{Hoare:1985:CSP:3921} model will be generated from the path.  The
model is then fed to a CSP model checker, which explores all
alternative interleavings and nondeterministic choices that could be
made along the path.  MPI-SV verifies MPI programs for deadlock
freedom and user-customized properties expressed in temporal logic.

MOPPER \cite{Forejt:2017:PPA:3133234.3095075} and HERMES
\cite{khanna2018dynamic} are similar to MPI-SV.  They generate SAT
formulas from an execution of an MPI program.  The SAT formulas encode
the deadlock freedom problem.  If all the SAT formulas are proved
valid, the corresponding path is free of deadlock regardless of the
possible interleavings or MPI communication matches that can happen
along that path.

MOPPER generates SAT formulas from only one execution so it does not
guarantee to cover all possible paths.  It means that MOPPER cannot
deal with multi-path programs.  A multi-path MPI program contains
branches that depend on program inputs or wildcard receives.  For a
multi-path MPI program, two different runs may result in different
paths.

HERMES generates SAT formulas for all the paths that are explored by a
symbolic execution engine hence it is sound for multi-path programs.

MPI-SV, MOPPER and HERMES leverage the symbolic execution technique to
cover all possible choices that processes can make at different
branches.  But none of them utilizes symbolic execution for verifying
functional correctness of MPI programs.

Other tools such as MPI-Spin\cite{siegel2007verifying}, TASS
\cite{siegel2011tass}, and CIVL \cite{Luo:2017:VMP:3127024.3127032}
combine model checking and symbolic execution by having symbolic
expressions as values in states.  In addition, a path condition is
associated to each state, which is a boolean symbolic expression.  The
model checker ignores a state if its path condition is unsatisfiable.
Automated theorem provers are used to reason about the symbolic
expressions.  MPI-Spin provides libraries in the \emph{Promela}
language for MPI primitives and symbolic operations.  To use MPI-Spin,
the user needs to write a model for an MPI program in Promela with the
provided libraries.  Properties of the program can be customized in
temporal logic.  The model is eventually checked by the Spin model
checker \cite{holzmann1997spin}.

CIVL uses an intermediate verification language, CIVL-C, to model
various concurrency APIs, including MPI.  MPI programs are
automatically translated to CIVL-C programs and then sent to a CIVL-C
verifier.  TASS is the predecessor of CIVL.

The Spin model checker has been highly optimized to the Promela
language.  So for an MPI program, MPI-Spin is able to explore many
more states than CIVL within the same amount of time.  There is less
chance for the CIVL verifier to be optimized for a specific
concurrency API due to its generality to multiple such APIs.
Although CIVL cannot run as fast as MPI-Spin, CIVL outperformed
MPI-Spin in same cases \cite{Luo:2017:VMP:3127024.3127032} due to
CIVL's effective POR algorithm.  Moreover, CIVL can deal with programs
using not only different concurrency APIs but also the combinations of
them.

Because of the use of symbolic expressions, program inputs can be as
general as unconstrained symbols, and program outputs are consequently
expressions over those symbols.  By asserting on the outputs,
functional correctness of programs can be verified.  One disadvantage
of such an approach is that operations on symbolic expressions are more
expensive than operations on concrete values.

A common challenge that model checkers confront is state explosion.
Typically, the number of states in the state space of a program grows
exponentially with the number of processes.  In addition, model
checkers require the state space to be finite.  So the number of
processes as well as some of the program inputs have to be bounded.
Model checking based tools for monolithic verification usually cannot
be scaled to verify large codebases with realistic configurations.
Therefore, we propose a composite and modular approach in this
dissertation.  One of the motivations of our approach is to divide the
verification problem of an MPI program into a number of smaller and
independent sub-problems.


\section{Static Analysis and Runtime Verification for
  MPI} \label{sec:related:staticanalyzer}
\subsection{MPI Static Analyzers}
%%%% Static analysis: MPI-Checker, pCFG, MPI-IXFG, PARCOACH
Compared to model checkers, static analysis tools reason about
programs for arbitrary inputs and usually have better scalability.  A
common limitation to static analysis approaches is that the analyzed
result is not always exact because of lack of dynamic information.
Moreover, static analysis tool developers often face the tradeoff
between precision and efficiency.

MPI-Checker \cite{Droste:2015:MSA:2833157.2833159} is a static
analyzer for MPI programs that was built on the Clang Static Analysis
\cite{clangstaticanalyzer} framework.  It can check various
MPI-specific errors with a close to zero false positive rate, such as
deadlock caused by communication mismatch, invalid type of arguments
to MPI calls and incorrect message buffer referencing.  The analysis
is performed on an \emph{Abstract Syntax Tree} (AST) of an MPI
program.  MPI-Checker was applied to MPI applications, some of which
have tens of thousands of lines of code.

In addition to the AST, the \emph{Control-Flow Graph} (CFG) is a
typical abstraction of a program where static analyzers perform
\emph{data-flow analysis}.  In \cite{christakis:2011:detection}, the
authors propose an analysis approach on CFG for Erlang
\cite{armstrong:2013:programming} message-passing programs.  The
message-passing model in Erlang is very similar to the conceptual
model of MPI used in this dissertation.  Analysis is performed on a
\emph{communication graph}, where a vertex is the CFG of a function
that may be run by a separate process at runtime, and a directed edge
corresponds to a message channel for the two connected processes.
This analyzer is able to detect message-passing related errors such as
deadlock caused by a receive without a matching send, or potential
deadlock caused by a send without a matching receive.  The analysis is
approximated, i.e., neither the detection of all deadlocks nor the
freedom of false alarms can be guaranteed.

The concept of CFG has been extended for representing concurrent
programs.  The PARCOACH \cite{huchant2019multivalued} static analyzer
relies on Parallel Program Control Flow Graphs to detect
\emph{collective errors} of parallel programs using concurrency APIs
such as MPI.  A collective error in MPI programs is a mismatch of MPI
collective functions.  PARCOACH is efficient in analyzing large HPC
applications but may produce false positives and false negatives.  To
avoid false positives or false negatives, studies
\cite{Bronevetsky:2009:CSD:1545006.1545049,
  Strout:2006:DAM:1156433.1157634} have been made toward improving the
precision of static analysis for message-passing via extending CFGs.

Authors of \cite{ye:2018:detecting} combines the static analysis and
symbolic execution approaches for detecting \emph{anomalies}
\cite{board:1993:ieee}.  An anomaly is a suspicious code that does not
meet the expected coding style.  Most anomalies are true program
defects.  Other cases include coding-style violations or inelegant
code.  For MPI programs, anomalies can reveal true MPI defects
including deadlock caused by communication call mismatch, message
buffer type mismatch and simultaneous accesses to a same message
buffer.  This approach first performs static analysis to a whole MPI
program and collects MPI-related control flow information.  Then,
symbolic execution is only performed to program segments specified by
users.  The biggest advantage of this approach is that the
\emph{partially} applied symbolic execution largely increases the
precision of the analysis while mitigating the \emph{path explosion
  problem}.  Another advantage is that the symbolic execution task on
each code segment is independent hence there is a potential to
parallelize these tasks.  One disadvantage of this approach is that an
anomaly is not equivalent to a defect so that false alarms will be
reported.

In general, static analyzers may report both false positives and false
negatives.  Most of the static analyzers detect a set of pre-defined
errors but cannot check the functional correctness of programs.

Many static analysis approaches are not monolithic.  For example,
\cite{ye:2018:detecting} applies symbolic execution to program
segments instead of a whole program.  Our composite and modular
verification approach performs model checking and symbolic execution
to program segments as well.  Unlike \cite{ye:2018:detecting}, our
approach does not include a static analysis prephase, hence we have to
assume a weaker condition at the beginning of every symbolic
execution, compared to \cite{ye:2018:detecting}.  For example, we
assume that a pointer value can be arbitrary while
\cite{ye:2018:detecting} has a concrete set of candidate pointers.  We
will explore the possibility of combining static analysis with our
approach in the future.

\subsection{MPI Runtime Verification}
%%%% Runtime
There are runtime verification tools, such as Marmot
\cite{krammer2004marmot} and its successor MUST
\cite{Hilbrich:2012:MRE:2388996.2389037}.  MUST intercepts MPI calls
during the execution of an MPI program and collects the state of MPI
communication operations.  After the execution, the tool studies the
\emph{wait-for} dependencies among processes and determines if
MPI-related errors, such as deadlock or message type mismatch, can
happen in alternative executions.  Compared to the actually observed
execution, alternative executions may have different interleavings,
different matches for MPI wildcard receives or different results
returned from \texttt{MPI\U{}Test}.  Runtime verification tools, such
as MUST, are generally more scalable than static analysis based tools.
Besides, this kind of tools benefit from dynamic information, so it
does not report a false alarm.  However, based on the observed runtime
executions, there is no guarantee that all analyzed alternative
executions are complete.  Therefore, a runtime verification tool may
miss a defect if the defect is not revealed in the executions to which
it is applied.

Runtime verifiers are efficient in catching a large number of program
defects but may miss corner cases.  Besides, they cannot verify the
functional correctness of a program.

\section{Deductive Verification for MPI} \label{sec:related:deductive}
The related work described in previous sections all have a certain
degree of automation: no human effort is required during the
verification and little knowledge about verification is needed.  On
the other hand, these approaches face limitations.  First, they have
to balance between precision and scalability.  In addition, these
approaches verify programs only against a set of pre-defined
properties (e.g., deadlock or runtime error freedom).  Some tools
accept customized assertions or temporal logic formulas.  But in
general, verification against a user-defined specification is not well
supported.

%%%% Type-based verification, ParTypes, Future-based, Frama-C
In contrast, deductive verification approaches can formally verify the
conformance of a program to a specification for arbitrary program
inputs.  The user is responsible for providing a specification and
usually is required to interact with the verification tools to guide
the verification.


\partypes{} \cite{Lopez:2015:PVM:2814270.2814302} verifies an MPI
program with a \emph{protocol}.  A protocol consists of vectors of
\emph{term types}.  A subset of MPI operations, such as standard send
and receive functions and collective functions, can be represented by
term types.  \emph{Well-formed} term types are deadlock-free.  Whether
a protocol is well-formed is decidable and will be proved by SMT
solvers.

VCC \cite{cohen-etal:2009:vcc} is the underlying theorem prover used
by \partypes{}.  To verify the conformance of an MPI program to a
protocol, \partypes{} translates the protocol to VCC contracts and
then feeds both the program and the contracts to the VCC verifier.  In
addition to a protocol, the user needs to annotate branches in the
program to inform the verifier about the control flow.

\partypes{}'s protocol language cannot express the functional
correctness of a program.  Furthermore, \partypes{} cannot deal with
\texttt{MPI\U{}ANY\U{}SOURCE}.  The subset of MPI supported in
\partypes{} is deterministic, i.e., given the same inputs, the way
that the send and receive operations are matched is always the same.
In this dissertation, we will show that our approach becomes much more
efficient if we assume that an MPI program is free of wildcard
receives.

In \cite{oortwijn2016future}, Oortwijn et al.\ present their work for
specifying and verifying MPI programs modularly.  They use process
algebra \cite{bergstra2001handbook} terms, which are named
\emph{futures}, to model MPI operations.  The user can use futures to
``predict'' (or specify) the communication behavior of each procedure
in an MPI program for arbitrary inputs.  Whether a procedure conforms
to its futures is proved by Hoare style reasoning.  The global
communication correctness is carried out by a model checker in the
mCRL2 \cite{Cranen:2013:OMT:2450387.2450410} tool set, which reasons
about all the futures for an arbitrary number of processes.  The current
state of this work is preliminary: only a limited set of MPI
primitives are supported, including standard send and receive
operations and a few collective operations.  Receiving from ``any''
source is supported.

Compared to this ``future-based'' contracts, our approach attempts to
minimize the communication details involved in contracts and
encourages the user to pay more attention on expressing the functional
correctness properties.

A different preliminary work for message-passing deductive
verification is done by Luo et al.\ \cite{luo2018towards}.  The
authors verify a simple message-passing program with contracts and a
global invariant.  The approach used in this work was originally
proposed by Ashcroft in \cite{Ashcroft:1975:PAP:1739967.1740302}.  The
verification is complete, i.e., both termination and functional
correctness of the program is proved for arbitrary inputs and number
of processes.  The cost of such completeness is a significant amount
of user effort.  The user has to provide a global invariant, function
contracts, and loop invariants to the program.  Figuring out a proper
global invariant is hard.  The deductive reasoning, which shows the
global invariant is preserved by every program statement, is done by
Frama-C/WP.  Although the tool is automated, authors spent much time
in manually guiding the proof with intermediate assertions.

Being different from Frama-C/WP, which is based on VCG techniques, our
approach uses model checking and symbolic execution, hence it has a
higher degree of automation: contracts on functions and loops are
optional.  For the code that has no contract, it will just be simply
executed.