%% background
\section{Background} \label{sec:background}
%% HOARE LOGIC
\subsection{Hoare Logic} \emph{Hoare logic} \cite{hoare:1969:axiomatic}
is the basis of deductive verification.  It provides a proof system
for showing safety properties of sequential programs.

The key
notion in Hoare logic is the \emph{Hoare triple}: $\{P\}~S~\{Q\}$,
where $S$ is a program statement and $P, Q$ are the \emph{pre-} and
\emph{post-conditions} respectively.  The Hoare triple is valid if $S$
is executed from a \emph{pre-state} where $P$ holds, and is
eventually terminated at a \emph{post-state}, then $Q$ will
hold on the post-state.

A set of inference rules are defined for reasoning about Hoare
triples.  This rule set is called the \emph{Hoare calculus}.  Every
inference rule also formally describes the semantics of a statement
kind.  Four rules are selected from Hoare calculus to be presented in
Fig.\ \ref{fig:intro:hoare:rules} as examples.  The \emph{assignment}
rules states that the post-condition $Q$ of the assignment \texttt{$a$
  := $e$} can be ensured, if the pre-condition is obtained by
substituting $a$ with $x$ in $Q$.  The \emph{sequence} rule states
that if there is a condition that can serve as the post-condition of a
statement $S_0$ as well as the pre-condition of another statement
$S_1$, a Hoare triple for the composite statement $S_0\texttt{;}S_1$
can be inferred.  The \emph{consequence} rule introduces a \emph{side
  condition}, which is a logic formula over the pre- and
post-conditions of two Hoare triples.  The rule states that for a
statement $S$, one triple can be inferred from the other if the side
condition is valid.  The \emph{branch} rule expresses that a Hoare
triple of a branch statement can be inferred from the Hoare triples of
its two branches.

\begin{figure}
  \begin{center}
\begin{tabular}{p{7cm} p{7cm}}
$\frac{}{ \{Q[e / a]\}~a\texttt{ := }e~\{Q\} }$ (assignment) &
$\frac{\{P\}~S_0~\{R\},\,
\{R\}~S_1~\{Q\}}{\{P\}~S_0\texttt{;}S_1~\{Q\}}$ (sequence)\\ \\

$\frac{\{P'\}~S~\{Q'\}}{\{P\}~S~\{Q\}}$ (consequence) &

$\frac{ \{P \wedge{}
C\}~S_0~\{Q\},\, \{P \wedge{} \neg{}C\}~S_1~\{Q\}}{\{P\}~
\texttt{if }C\texttt{ then }S_0\texttt{ else }S_1~\{Q\}}$ (branch) \\
\text{if $P' \Rightarrow P
\wedge{} Q \Rightarrow{} Q'$} &
\end{tabular}
\end{center}
\caption{Four selected inference rules in Hoare calculus.}
\label{fig:intro:hoare:rules}
\end{figure}

%%% verification condition generation
The proof process of applying Hoare calculus to Hoare triples can be
automated with the \emph{verification condition generation} (VCG)
\cite{barnett:2006:boogie,gordon:1989:mechanizing,leino:1999:checkingjava}
technique.  Given a statement, a pair of pre- and post-conditions and
a set of assumptions, a VCG algorithm automatically computes the
\emph{weakest pre-condition} (WP) of the statement that ensures the
post-condition according to the rules in Hoare calculus.  Then the
validity of the triple can be represented by a generated condition
stating that the given pre-condition is stronger than the WP.  Most of
the deductive verification tools for sequential programs are
established on VCG.

\begin{exmp}
  Figure \ref{fig:intro:hoare:derive} shows a derivation for a Hoare
  triple of a C code using Hoare calculus.  The
  VCG process for this example is given by Fig.\
  \ref{fig:intro:hoare:wp}.  Note that the term $abs(\texttt{y})$ in
  the code stands for the absolute value of \texttt{y}.  By VCG, the
  validity of the Hoare triple is represented by the following
  condition:
    \begin{center}
        $(\texttt{x} = X \wedge{} \texttt{y} = Y) \Rightarrow{}$
        $\mymath{WP}($\texttt{\ccode{if} (x >= 0) y = x;
    \ccode{else} y = -x;}$,\, \{\texttt{x} = abs(\texttt{y})\})$
    \end{center}
        $\qed$
\end{exmp}

\begin{figure}[t]
    \fbox{
    \parbox{\textwidth}{
    \begin{itemize}
    \item $\{\texttt{x} = X \wedge{} \texttt{y} = Y\}$\texttt{\ccode{if}
    (x>=0) y = x; \ccode{else} y = -x;}$\{\texttt{x} =
    abs(\texttt{y})\}$ (branch 1, 2)

    \begin{itemize}
    \item[1.]   $\{\texttt{x} = X \wedge{} \texttt{y} = Y \wedge{} X
    \geq 0\}$ \texttt{y = x} $\{\texttt{x} =
    abs(\texttt{y})\}$ (consequence)
    
    %% FINAL-ASSIGNMENT-1
    \begin{itemize}
    \item[] $\{\texttt{x} \geq{} 0\}$ \texttt{y = x} $\{\texttt{x} =
    abs(\texttt{y})\}$ (assignment)
    \end{itemize}

    \item[2.]  $\{\texttt{x} = X \wedge{} \texttt{y} = Y \wedge{} X <
    0\}$ \texttt{y = -x} $\{\texttt{x} = abs(\texttt{y})\}$


    %% FINAL-ASSIGNMENT-2
    \begin{itemize}
    \item[] $\{\texttt{x} < 0\}$ \texttt{y = -x} $\{\texttt{x} =
    abs(\texttt{y})\}$ (assignment)
    \end{itemize}
    \end{itemize}
    
  \end{itemize}
  }}
      \caption{The derivation of a Hoare triple.  Note that
        $\mymath{abs(\texttt{y})}$ denotes the absolute value of \texttt{y}.}
          \label{fig:intro:hoare:derive}      
\end{figure}

\begin{figure}[t]
    \fbox{
    \parbox{\textwidth}{
      \begin{itemize}
      \item[Let] $\mymath{WP(s, q)}$ denote the WP of
    the statement $s$ for a post-condition $q$,
    
      \item[] $\mymath{WP}($\texttt{\ccode{if} (x >= 0) y = x;
    \ccode{else} y = -x;}$,\, \{\texttt{x} = abs(\texttt{y})\})$

    \item[$=$] $\texttt{x} \geq{} 0 \Rightarrow{} \mymath{WP}($\texttt{y = x} $,\,\{\texttt{x} =
    abs(\texttt{y})\})$ \text{and} $\texttt{x} < 0 \Rightarrow{} \mymath{WP}($\texttt{y = -x} $,\,\{\texttt{x} =
    abs(\texttt{y})\})$

    \item[$=$] $(\texttt{x} \geq{} 0 \Rightarrow{} \texttt{x} =
    abs(\texttt{x})) \wedge{} (\texttt{x} < 0 \Rightarrow{} \texttt{x} =
    abs(\texttt{-x}))$

    \item[$=$] $\mymath{true}$
    \end{itemize}
    }}
      \caption{The computation of the WP for a desired post-condition
    of the given code. }
          \label{fig:intro:hoare:wp}
\end{figure}


\emph{Design by Contract}$^{\small{\texttt{TM}}}$ (DbC) a methodology
originated with Hoare logic.  It was coined by Bertrand Meyer in
\cite{Meyer:1992:ADC} and is now a trademark owned by Eiffel Software.
The idea of DbC is to use pre- and post-conditions to document, as
well as check the correctness of, program procedures.  A pair of a
pre- and post-conditions for a procedure is called a \emph{procedure
  contract}.  Procedure contracts are built-in constructs in the
Eiffel \cite{Meyer:1988:eiffel} programming language.  They will be
checked as assertions during runtime executions.  Other than Eiffel,
procedure contracts are widely used by deductive verification tools.

Why3 \cite{filliatre:2013:why3} and Boogie \cite{barnett:2006:boogie} are
verification programming languages with rich built-in specification
constructs, including procedure contracts.  Both of them at the same
time also refer to the deductive verifiers that can prove the
functional correctness of Why3 and Boogie programs respectively.  A
number of verification tools for higher level programming languages
are built on top of Why3 and Boogie, such as HAVOC
\cite{lahiri2012security, ball:2010:towards} and Frama-C
\cite{cuoq2012framac} with its plug-in \emph{WP}
\cite{cuoq:2011:wp} for C, Spark-2014 \cite{spark} for Ada, KeY
\cite{beckert:2007:KeY} for Java, Spec\#
\cite{barnett:2005:specsharp} for C\# and the verifier for Dafny
\cite{leino:2010:dafny}.


\subsection{Formal Methods For Concurrency}
Hoare logic is not suitable for specifying and proving concurrent
programs. A Hoare triple describes a statement in terms of its input
and output.  However, in a concurrent program, the behavior of a
process executing a statement depends on not only the input but also
the behaviors of other processes that run in parallel.  Besides, Hoare
logic cannot describe \emph{liveness properties}, which are crucial
for concurrent programs.

Owicki and Gries \cite{Owicki1976}, Lamport \cite{lamport1980hoare}
and Takaoka \cite{Takaoka:1994:PPV:326619.326811} introduced their
techniques inheriting Hoare's style for proving properties of
concurrent programs.

%% Owicki & Gries:
Owicki and Gries introduced their technique in \cite{Owicki1976} based
on Hoare logic for proving \emph{safety properties} of concurrent
programs.  In their work, concurrency is enabled by two general
statements: 1) \texttt{cobegin $S_0 //\ ...\ // S_n$ coend} means that
each statement $S_i$, $0 \leq i \leq n$, is executed by a separate
process and all these processes are run in parallel; 2) \texttt{await $B$
  then $S$} means that the process has to stop at $S$ unless $B$ is
true and once $B$ is true, the process executes $S$ in an atomic step.
Formal semantics of the two statements is defined as reasoning rules,
just like what Hoare logic does.  Then for a concurrent program $S$
that consists of the extended set of statements, the validity of
$\{P\}~S~\{Q\}$ can be derived using the rules.

The most important idea in this work is the semantics of the
\texttt{cobegin ...\ coend} statement,
\begin{align}
  \frac{\{P_1\}S_1\{Q_1\},  \{P_2\}S_2\{Q_2\},  ...\ \text{ are
  interference-free}}
  {\{P_1\wedge{}P_2\wedge{}...\ \} \texttt{cobegin}~S_1 // S_2 // ...\ \texttt{coend}
  \{Q_1\wedge{}Q_2\wedge{}...\ \} }\nonumber
\end{align}
which is straightforward except for the term ``interference-free''.
Let $\mymath{pre}(S)$ be the pre-condition of a statement $S$, a
statement $T$ is said to not interfere $\{P\}~S~\{Q\}$, iff

\begin{enumerate}
\item  $\{Q \wedge{} \mymath{pre}(T)\}~T~\{Q\}$, i.e., $T$ does not break post-condition of $S$;
    
\item for any sub-statement $S' $of $S$, such that $\{P'\}~S'~\{Q'\}$,
  $\{\mymath{pre}{S'} \wedge{} \mymath{pre}{T}\}~T~\{\mymath{pre}(S')\}$, i.e., $T$ does not
  break the pre-condition of any sub-statement $S'$.
\end{enumerate}
A proof of interference-free can be mechanized by showing that every
atomic sub-statement $S_i$ in a \texttt{cobegin} statement cannot be
interfered by another one $S_j$.  Usually, in order to prevent a
triple $\{P\}~S~\{Q\}$ from being interfered, the user needs to weaken
$P$ to accept the behaviors of other parallel statements.

% Lamport: concurrent hoare logic:
Lamport \cite{lamport1980hoare} generalizes Hoare logic by refining
the meaning of the triple $\{P\}~S~\{Q\}$ to that 1) $P$ must be
strong enough to guarantee that $Q$ will hold upon termination even if
the execution starts from any location inside $S$; and 2) $P$ must
remain true as long as the control is in $S$, where $S$ is a
concurrent program composed of sequential statements as well as the
\texttt{cobegin} statement used by Owicki and Gries.
There are three predicates that refer to program locations with
respect to a statement $S$: immediately before $S$, inside $S$ and
immediately after $S$.  For a sub-statement $S$ in a \texttt{cobegin}
structure, a strong enough pre-condition usually takes program
locations of different processes into consideration.

% Directed graph based proof technique:
Takaoka \cite{Takaoka:1994:PPV:326619.326811} proves a triple
$\{P\}~S~\{Q\}$ for a parallel program $S$ by analyzing a program
graph representing $S$, where vertices are sets of locations of
running processes and arrows bridging vertices are labeled by program
statements.  In addition, each vertex will be attached with an
assertion.  Initial locations are attached with pre-conditions and
final locations are attached with post-conditions.  Then for every
arrow, the assertion of the destination vertex can be derived from the
assertion of the source vertex with respect to the statement labeling
the arrow.  The rules for the assertion derivation are defined in
Hoare style.  The triple $\{P\}~S~\{Q\}$ is valid iff such a graph is
successfully constructed.  In Takaoka's work, parallel programs
contains synchronous statements.  A process can ``wait for '' another
process by executing \texttt{call} until the waited process executes
\texttt{accept}.  A similar program graph based analysis was done by
Newton \cite{Newton1975} earlier.  In \cite{Newton1975}, program
locations can be used to express properties in assertions though the
parallel programs have no synchronous statement.

A more intuitive approach was suggested by Ashcroft
\cite{Ashcroft:1975:PAP:1739967.1740302}.  In
\cite{Ashcroft:1975:PAP:1739967.1740302}, safety properties of a
concurrent program are expressed as a set of \emph{global invariants},
i.e., a set of assertions that are suppose to hold at any location of
the program.  The proof is constructed by showing the validity of
those assertions for every program location.

Liveness properties, such as the \emph{fairness} property, can be
expressed with \emph{temporal logic}.  Techniques for proving liveness
properties including the \emph{well-founded set} approach, which was
originally applied by Floyd \cite{Floyd1993} and later adapted by
Manna and Puneli in \cite{Manna:1984:APP:2731.2734}, the \emph{proof
  lattice} approach \cite{Owicki:1982:PLP:357172.357178} and the
\emph{rely-guarantee} approach \cite{Stark:1985:PTR:646823.706907}.

Modern tools for verifying concurrent programs in an axiomatic way
include VCC \cite{cohen-etal:2009:vcc}, COMPLX
\cite{amani:2017:complex}, KeY
\cite{beckert:2007:KeY}, Chalice \cite{leino:2009:verification}
and VeriFast \cite{Jacobs:2010:QTV:1947873.1947902}.  VCC is a static
verifier for concurrent C programs.  The user of VCC needs to annotate
invariants and other specifications for programs.  COMPLEX is a
verification framework that targets general imperial programming
languages.  It is based on the aforementioned proof system proposed by
Owicki and Gries.  KeY verifies Java programs against their
specifications written in JML \cite{Gosling:2014:JLS:2636997}.  It
generalizes Hoare logic for Java to the \emph{first order Java Dynamic
  Logic}.  Chalice uses a permission based approach to reason about
multi-threaded concurrent programs.  VeriFast utilizes separation
logic \cite{reynolds2002separation} for verifying multi-threaded C and
Java programs.


%% MODEL CHECKING
\subsection{Model Checking}
Model checking \cite{clarke:2000:modelchecking} is an algorithmic method
for checking correctness of a system by searching all reachable states
in a finite state space.  The system is abstracted to a \emph{model}
where irrelavant details are removed.  A state space represents all
possible behaviors of the model.  A state space is a directed graph,
where a node is a state and an edge is a transition which alters a
state in an atomic step.

The model checking technique is effective for verifying properties of
concurrent systems. A concurrent system can be abstracted to an
\emph{interleaving model}. In an interleaving model, the real-world
asynchronous executions are represented by non-deterministic choices
on atomic transitions. For example, the parallel execution of two
atomic commands $a$ and $b$ can be modeled as two non-deterministic
choices: $a$ followed by $b$ or $b$ followed by $a$.

A challenge in model checking is the state explosion problem. The
number of states grows exponentially when the number of components in
a concurrent system increases.  Hence, many model checkers can only
scale to programs whose inputs are bounded with small concrete
values. But model checking is still widely used to solve real problems
for two reasons.  First, there is a belief in the \emph{Small Scope
  Hypothesis} \cite{Jackson:2006:SAL}, which states that bugs can be
found by exploring all possible program executions with inputs being
in small scopes.  Second, many sophisticated \emph{reduction}
techniques allow model checkers to only search in a sub-space of the
full state space without loss of soundness.  For example, using the
\emph{partial-order reduction} (POR) technique
\cite{godefroid:1993:refining, 10.1007/BFb0013032,Valmari1992}, a
model checker usually only needs to explore one among a number of
commutative executions.

%% SYMBOLIC EXECUTION
\subsection{Symbolic Execution} Symbolic execution is an approach for
testing and analyzing programs \cite{king:1976:symbolic}. It replaces
the concrete program inputs with symbols, each of which represents a
set of concrete values.  During the symbolic execution, a boolean
formula, \emph{path condition} (PC), is maintained.  PC denotes the
condition, under which an execution path is feasible.  Due to the
existence of symbols, the PC and values stored in variables are all
\emph{symbolic expressions}.

Symbolic execution is able to explore all possible paths of a program
that a concrete execution may take.  Different paths are generated
from the different choices that a concrete execution can make at
branches.  During the exploration of each path, PC initially is
assigned the true value and gets updated at each branch.  The updated
value is a conjunction of the old PC value and the branch condition
corresponding to the choice made in the path.  Once PC becomes
unsatisfiable, the exploring path is infeasible and will be ignored
immediately.  Automated theorem provers \cite{Z3Prover,
  barrett-etal:2011:cvc4, alt-ergo} are used to reason about symbolic
expressions during the exploration.

\begin{exmp}

Considering the following C code:

\begin{center}
  \begin{program}
\ccode{if}~(x~>=~0)\\
~~y~=~x;\\
\ccode{else}\\
~~y~=~-x;\\
\end{program}
\end{center}

Suppose the variable \texttt{x} holds a symbol $X$.  A symbolic
execution engine will explore two paths for this code snippet: one
path leads to a result where the variable \texttt{y} has value $X$ and
its PC becomes $X \ge 0$; The other path leads to a result where the
variable \texttt{y} has value $-X$ and its PC becomes $X < 0$. $\qed$
\label{exmp:intro:symexe}
\end{exmp}

%% MPI
\subsection{MPI}
The Message-Passing Interface (MPI) standard specifies a library for
writing message-passing parallel programs in different languages such
as C and Fortran.  This library includes a large number of functions,
constants, and data types.  For an MPI program, every process has its
private storage but shares nothing with others.  Communication among
processes is carried out by transferring data using different MPI
functions.  An MPI program is written, compiled, and linked to
generate an executable file. The program is executed by instantiating
a number of processes, each of which runs a copy of the executable.

In MPI programs, a \emph{communicator} is an abstraction of a
``communication universe'' for processes.  A communicator comprises an
ordered group of $n$ ($1 \leq{} n$) processes.  Processes are
identified by $0, 1, ..., n-1$.  The numerical identifier of a process
in a communicator is called the \emph{rank} of the process.  A process
may belong to multiple communicators and is assigned different
identifiers in each of them.  But there is a default communicator,
which contains all processes that are invoked when a program is
launched.  The rank of a process in the default communicator can be
used as the unique ID.  By branching on process ranks, different
processes may behave differently even though they run the same
program.

Communicaton operations are performed through communicators.
Conceptually, a communicator includes a set of message channels.
Every message channel can be identified by an ordered pair $(i, j)$
($0 \leq i, j < n$).  It behaves like a First-In-First-Out (FIFO)
queue for bufferring messages that were sent from a process $i$ but
have not been received by a process $j$.

In MPI, communication can be non-deterministic.  A process can perform
a receive operation without specifying an exact sender.  Such a
receive operation is called a wildcard receive.  In concept, a
wildcard receive performed by process $i$ non-deterministically picks
up a message from the set $\{(j, i) | 0 \leq j < n\}$ of message channels
that contain appropriate messages.

\begin{exmp}
  Figure \ref{fig:mpi:simple} is a simple C/MPI program.  Constructs
  defined in MPI library all start with a ``\texttt{MPI\U{}}'' prefix.
  \texttt{MPI\U{}COMM\U{}WORLD} is the constant that refers to the
  default communicator.  The \texttt{MPI\U{}Init} function initiates
  the message-passing environment.  Every process can obtain its rank
  in a communicator by calling \texttt{MPI\U{}Comm\U{}rank}.  The
  \texttt{MPI\U{}Finalize} function terminates the environment.

In this example, process of rank 0 sends a message to process of rank
1.  The point-to-point (P2P) communication is carried out by a pair of
standard P2P send and receive functions:

\begin{figure}[t]
  \begin{program}
    \ccode{\#include}~<mpi.h>\\
    \ccode{int}~main(\ccode{int}~argc,~\ccode{char}~*argv[])~\lb~\\
    ~~\ccode{int}~rank,~dat~=~0;~\\
    ~~MPI\civlU{}Init(\&argc,~\&argv);~\\
    ~~MPI\civlU{}Comm\civlU{}rank(MPI\civlU{}COMM\civlU{}WORLD,~\&rank);
\mycomment{//gets the rank of this
  process}\\
~~\ccode{if}~(rank~==~0)~\\
~~~~MPI\civlU{}Send(\&dat,~1,~MPI\civlU{}INT,~1,~0,~MPI\civlU{}COMM\civlU{}WORLD);\\
~~\ccode{else~if}~(rank~==~1)\\
~~~~MPI\civlU{}Recv(\&dat,~1,~MPI\civlU{}INT,~0,~0,~MPI\civlU{}COMM\civlU{}WORLD,
MPI\civlU{}STATUS\civlU{}IGNORE);\\
~~MPI\civlU{}Finalize();\\
\rb\\
\end{program}
\caption{A simple MPI program. Process 0 sends and process 1 receives
  a message. The remaining processes perform no operation.}
\label{fig:mpi:simple}
\end{figure}
\end{exmp}

\begin{enumerate}
  \item the \texttt{MPI\U{}Send} function (at line 7) sends the data,
  which is stored in \texttt{dat} and of the size of one
  \texttt{MPI\U{}INT}, to process of rank 1 in the default communicator
  with a tag 0.
  
 \item the \texttt{MPI\U{}Recv} function (at line 9) receives the
 message tagged by 0, in which the data is expected to be no larger
 than the size of one \texttt{MPI\U{}INT} and will be saved in
 \texttt{dat}, from process of rank 0 in the default communicator.
\end{enumerate}

No communication action is performed by any other process. $\qed$