\part{Language}
\label{part:lang}

\chapter{Overview of CIVL-C}

CIVL-C is an extension of a subset of the C11 dialect of C. It
includes the most commonly-used elements of C, including most of the
syntax, types, expressions, and statements. Missing are some of the
more esoteric type qualifiers and much of the standard library. None
of the C11 language elements dealing with concurrency are included, as
CIVL-C has its own concurrency primitives.

The keywords in CIVL-C not already in C begin with symbol \cckey.
This makes them readily identifiable and also prevents any naming
conflicts with identifiers in C programs.  This means that most
legal C programs will also be legal CIVL-C programs.

One of the most important features of CIVL-C not found in standard C
is the ability to define functions in any scope. (Standard C allows
function definitions only in the file scope.)  This feature is also
found in GNU C, the GNU extension of C.

Another central CIVL-C feature is the ability to \emph{spawn}
functions, i.e., run the function in a new process (thread).




Key concepts: static scope tree, nested functions, dynamic scope tree,
nested functions, processes, spawning, waiting, types, ...


\chapter{Structure of a CIVL-C program}

CIVL-C program may use preprocessor directives as specified in the C
Standard.  A source program is preprocessed, then parsed, resulting
in a translation unit.

A CIVL-C program begins with the line
\begin{verbatim}
#include <civlc.h>
\end{verbatim}
which includes the main CIVL-C header file, which declares all the
types and other CIVL primitives.

A translation unit consists of a sequence of variable declarations,
function prototypes, function definitions, and \emph{assume}
statements.

% lexical scopes and the lexical scope tree

% naming scopes (\$scope)

% translation of source to model

% root function, root scope and the  main function

% formal parameters to main, return value


% write a grammar.  Leave out type qualifiers, etc.  Keep pointers.
% Keep simple types?  Why not keep all the standard types.
% How about "symbolic types"?  Make all the casts explicit.
% Look at CIL?

% Describe as subset of C, but leave out:... and add...

% add Set<proc> and use it.

\chapter{Sequential Elements}

In this chapter we describe the main sequential elements of the
language.  For the most part these are the same as in C.
Primitives dealing with concurrency are introduced in Chapter
\ref{chap:concurrency}.

\section{Types}

\subsection{Standard types inherited from C}

The boolean type is denoted \verb!_Bool!, as in C. Its values are $0$
and $1$, which are also denoted by $\ctrue$ and $\cfalse$,
respectively.

There is one integer type, corresponding to the mathematical integers.
Currently, all of the C integer types \texttt{int}, \texttt{long},
\texttt{unsigned\ int}, \texttt{short}, etc., are mapped to the CIVL
integer type.

There is one real type, corresponding to the mathematical real
numbers. Currently, all of the C real types \texttt{double},
\texttt{float}, etc., are mapped to the CIVL real type.

Array types, \texttt{struct} and \texttt{union} types, \texttt{char},
and pointer types are all exactly as in C.

% \subsection{The heap type $\cheap$ and handles}

% Unlike C, a CIVL-C program does not necessarily have access to a
% single, global heap. Instead, there is a $\cheap$ type, and heaps may
% be declared explicitly wherever they are needed. Hence a CIVL-C
% program may have several heaps, and these may exist in different
% scopes.

% A heap is declared and created as follows:
% \begin{verbatim}
%   $heap h = $heap_create();
% \end{verbatim}
% The function \verb!$heap_create()! creates a new empty heap in the
% current scope and returns a \emph{handle} to that heap. A handle is
% like a pointer: it is a reference to another object. However, a handle
% is much more restricted than a general pointer. In particular, it
% cannot be dereferenced (by the \ct{*} operator). The underlying heap
% object can only be accessed by using a handle to it as an argument to
% a system function.

% Handles can be used in assignments and passed as arguments to functions.
% For example, this declaration could follow the one above:
% \begin{verbatim}
%   $heap h2=h;
% \end{verbatim}
% After executing this code, \ct{h2} and \ct{h} will be aliased, i.e., the two
% handles will refer to the same heap object.

% The heap object exists in the scope in which it is created. In
% particular, it will disappear when that scope disappears, i.e., when
% control reaches the right curly brace that defines the end of the
% scope. At that point, any references into the heap become invalid.

% The following system functions deal with heaps:
% \begin{verbatim}
%   void* $malloc($heap h, int size);
%   void free(void *p)
% \end{verbatim}
% The first function is like C's \texttt{malloc}, except that you
% specify the heap in which the allocation takes place.
% This modifies the specified heap and returns a pointer to the new object.
% The function can only occur in a context in which the type of the object is
% specified, as in:
% \begin{verbatim}
%   $heap h;
%   int n = 10;
%   double *p = (double*)$malloc(h, n*sizeof(double));
% \end{verbatim}
% The function \ct{free} is exactly the same as in C. Note that
% \texttt{free} modifies the heap which was used to allocate \texttt{p}.


\subsection{The bundle type: $\cbundle$}

CIVL-C includes a type named \cbundle. A bundle is basically a
sequence of data, wrapped into an atomic package. A bundle is created
using a function that specifies a region of memory. One can create a
bundle from an array of integers, and another bundle from an array of
reals. Both bundles have the same type, \cbundle. They can therefore
be entered into an array of \cbundle, for example. Hence bundles are
useful for mixing objects of different (even statically unknown) types
into a single data structure. Later, the contents of a bundle can be
extracted with another function that specifies a region of memory into
which to unpack the bundle; if that memory does not have the right
type to receive the contents of the bundle, a runtime error is
generated.

\begin{figure}
\begin{verbatim}
/* Creates a bundle from the memory region specified by ptr and size,
 * copying the data into the new bundle */
$bundle $bundle_pack(void *ptr, int size);

/* Returns the size (number of bytes) of the bundle */
int $bundle_size($bundle b);

/* Copies the data out of the bundle into the region specified */
void $bundle_unpack($bundle bundle, void *ptr);
\end{verbatim}
  \caption{The \emph{bundle} abstract data type}
  \label{fig:bundle}
\end{figure}

The relevant functions for creating and manipulating bundles
are given in Figure \ref{fig:bundle}.

\subsection{The \cscope{} type}
\label{sec:scopetype}

An object of type $\cscope$ is a reference to a dynamic scope.  It may
be thought of as a ``dynamic scope ID, '' but it is not an integer and
cannot be converted to an integer.  Operations defined on scopes are
discussed in Section \ref{sec:scopeexpr}.

\section{Expressions}

\subsection{Expressions inherited from C}

The following C expressions are included in CIVL: 
\begin{itemize}
\item \emph{constant} expressions
\item \emph{identifier} expressions (\texttt{x})
\item parenthetical expressions (\verb!(e)!)
\item numerical \emph{addition} (\verb!a+b!), \emph{subtraction} (\verb!a-b!),
  \emph{multiplication} (\verb!a*b!), \emph{division} (\verb!a/b!),
  \emph{unary plus} (\verb!+a!), \emph{unary minus} (\verb!-a!),
  \emph{integer division} (\verb!a/b!) and \emph{modulus} (\verb!a%b!),
  all with their ideal mathematical interpretations
\item array \emph{index} expressions (\verb!a[e]!) and struct or union
  \emph{navigation} expressions (\verb!x.f!, \verb!p->f!)
\item \emph{address-of} (\verb!&e!), pointer \emph{dereference} (\verb!*p!),
  pointer \emph{addition} (\verb!p+i!) and \emph{subtraction} (\verb!p-q!)
  expressions
\item relational expressions (\verb!a==b!, \verb~a!=b~, \verb!a>=b!,
  \verb!a<=b!, \verb!a<b!, \verb!a>b!)
\item logical \emph{not} (\verb~!p~), \emph{and} (\verb!p&&q!), and
  \emph{or} (\verb!p||q!)
\item \emph{sizeof} a type (\verb!sizeof(t)!) or expression (\verb!sizeof(e)!)
\item \emph{assignment} expressions (\verb!a=b!, \verb!a+=b!, \verb!a-=b!,
  \verb!a*=b!, \verb!a/=b!, \verb!a%=b!)
\item function \emph{calls} \verb!f(e1,...,en)!
\item \emph{conditional} expressions (\verb!b ? e : f!).
\item \emph{cast} expressions (\verb!(t)e!)
\end{itemize}

Bit-wise operations are not yet supported.

\subsection{Scope expressions}
\label{sec:scopeexpr}

As mentioned in Section \ref{sec:scopetype}, CIVL-C provides a type
\cscope.  An object of this type is a reference to a dynamic scope.
Several constants, expressions, and functions dealing with the
\cscope{} type are also provided.

The $\cscope$ type is like any other object type.  It may be used as
the element type of an array, a field in a structure or union, and so
on.  Expressions of type $\cscope$ may occur on the left or right-hand
sides of assignments and as arguments in function calls just like any
other expression.  Two different variables of type $\cscope$ may be
aliased, i.e., they may refer to the same dynamic scope.  When a
dynamic scope leaves the state (because control reached the right
curly brace which marks the end of the scope, or the process exits),
any reference to that scope becomes ``undefined,'' and an attempt to
use that scope will result in a runtime error.

\subsubsection{The constant \chere}

A constant \chere{} exists in every scope.  This constant has
type \cscope{} and refers to the dynamic scope in which it is
contained.  For example,
\begin{verbatim}
  { // scope s
    int *p = (int*)$malloc($here, n*sizeof(int);
  }
\end{verbatim}
allocates an object consisting of $n$ ints in the scope $s$.

\subsubsection{The constant \cscoperoot{}}

There is a global constant \cscoperoot{} of type $\cscope$ which
refers to the root dynamic scope.

\subsubsection{Scope relational operators}

Let $s_1$ and $s_2$ be expressions of type \cscope.  The following are
all CIVL-C expressions of boolean type:
\begin{itemize}
\item $s_1$ \ct{==} $s_2$.  This is \emph{true} iff $s_1$ and $s_2$
  refer to the same dynamic scope.
\item  $s_1$ \ct{!=} $s_2$.  This is \emph{true} iff $s_1$ and $s_2$
  refer to different dynamic scopes.
\item  $s_1$ \ct{<=} $s_2$.  This is \emph{true} iff $s_1$ is equal to
  or a descendant of $s_2$, i.e., $s_1$ is equal to or contained in $s_2$.
\item  $s_1$ \ct{<} $s_2$.  This is \emph{true} iff $s_1$ is a strict 
  descendant of $s_2$, i.e., $s_1$ is contained in $s_2$ and is not
  equal to $s_2$.
\item $s_1$ \ct{>} $s_2$.  This is equivalent to $s_2$ \ct{<} $s_1$.
\item  $s_1$ \ct{>=} $s_2$.  This is equivalent to $s_2$ \ct{<=} $s_1$.
\end{itemize}

\subsubsection{Scope parent function $\cscopeparent$}

The system function
\begin{verbatim}
  $scope $scope_parent($scope s);
\end{verbatim}
returns the parent dynamic scope of the dynamic scope referenced by
\ct{s}.  If \ct{s} is the root dynamic scope, it returns the undefined
value of type $\cscope$.

\subsubsection{Lowest Common Ancestor: \ct{+}}

The expression $s_1$ \ct{+} $s_2$, where $s_1$ and $s_2$ are
expressions of type \cscope, evaluates to the lowest common ancestor
of $s_1$ and $s_2$ in the dynamic scope tree. This is the smallest
dynamic scope containing both $s_1$ and $s_2$.

\subsubsection{The \cscopeof{} expression}

Given any left-hand-side expression \ct{expr}, the expression
\begin{verbatim}
  $scopeof(expr)
\end{verbatim}
evaluates to the dynamic scope containing the object specified by
\ct{expr}.

The following example illustrates the semantics of the \cscopeof{}
operator.  All of the assertions hold:
\begin{verbatim}
{
  $scope s1 = $here;
  int x;
  double a[10];

  {
    $scope s2 = $here;
    int *p = &x;
    double *q = &a[4];

    assert($scopeof(x)==s1);
    assert($scopeof(p)==s2);
    assert($scopeof(*p)==s1);
    assert($scopeof(a)==s1);
    assert($scopeof(a[5])==s1);
    assert($scopeof(q)==s2);
    assert($scopeof(*q)==s1);
  }
}  
\end{verbatim}

\section{Statements}

The usual C statements are supported:
\begin{itemize}
\item \emph{no-op} (\ct{;})
\item expression statements (\ct{e;})
\item labeled statements, including \ct{case} and \ct{default} labels
  (\ct{l: s})
\item \emph{for} (\ct{for (init; cond; inc) s}), \emph{while} 
  (\ct{while (cond) s}) and \emph{do} (\ct{do s while (cond)})
  loops
\item compound statements (\lb \ct{s1;s2;} \ldots \rb)
\item \texttt{if} and \verb!if! \ldots \verb!else!
\item \verb!goto!
\item \verb!switch!
\item \verb!break!
\item \verb!continue!
\item \verb!return!
\end{itemize}

\section{Guards and nondeterminism}

\subsection{Guarded commands: \cwhen}

A guarded command is encoded in CIVL-C using a $\cwhen$ statement:
\begin{verbatim}
  $when (expr) stmt;
\end{verbatim}
All statements have a guard, either implicit or explicit.  For most
statements, the guard is \ctrue.  The \cwhen{} statement allows one to
attach an explicit guard to a statement.

When \texttt{expr} is \emph{true}, the statement is enabled, otherwise
it is disabled.  A disabled statement is \emph{blocked}---it will not
be scheduled for execution.  When it is enabled, it may execute by
moving control to the \texttt{stmt} and executing the first atomic
action in the \texttt{stmt}.

If \texttt{stmt} itself has a non-trivial guard, the guard of the
\cwhen{} statement is effectively the conjunction of the \texttt{expr}
and the guard of \texttt{stmt}.

The evaluation of \texttt{expr} and the first atomic action of
\texttt{stmt} effectively occur as a single atomic action.  There is
no guarantee that execution of \texttt{stmt} will continue atomically
if it contains more than one atomic action, i.e., other processes may
be scheduled.

Examples:
\begin{verbatim}
  $when (s>0) s--;
\end{verbatim}
This will block until \texttt{s} is positive and then decrement
\texttt{s}.  The execution of \texttt{s--} is guaranteed to take place
in an environment in which \texttt{s} is positive.

\begin{verbatim}
  $when (s>0) {s--; t++}
\end{verbatim}
The execution of \texttt{s--} must happen when \texttt{s>0}, but
between \texttt{s--} and \texttt{t++}, other processes may execute.

\begin{verbatim}
  $when (s>0) $when (t>0) x=y*t;
\end{verbatim}
This blocks until both \texttt{x} and \texttt{t} are positive then
executes the assignment in that state.  It is equivalent to
\begin{verbatim}
  $when (s>0 && t>0) x=y*t;
\end{verbatim}

\subsection{Nondeterministic selection statement: \cchoose}

A \cchoose{} statement has the form
\begin{verbatim}
  $choose {
    stmt1;
    stmt2;
    ...
    default: stmt
  }
\end{verbatim}
The \texttt{default} clause is optional.

The guards of the statements are evaluated and among those that are
\emph{true}, one is chosen nondeterministically and executed.  If none
are \emph{true} and the \texttt{default} clause is present, it is
chosen.  The \texttt{default} clause will only be selected if all
guards are \emph{false}.  If no \texttt{default} clause is present and
all guards are \emph{false}, the statement blocks.  Hence the implicit
guard of the \cchoose{} statement without a \texttt{default} clause is
the disjunction of the guards of its sub-statements.  The implicit
guard of the \cchoose{} statement with a default clause is
\emph{true}.

Example: this shows how to encode a ``low-level'' CIVL guarded
transition system:

\begin{verbatim}
  l1: $choose {
    $when (x>0) {x--; goto l2;}
    $when (x==0) {y=1; goto l3;}
    default: {z=1; goto l4;}
  }
  l2: $choose {
    ...
  }
  l3: $choose {
    ...
  }
\end{verbatim}


\subsection{Nondeterministic choice of integer: \cchooseint}

The system function \cchooseint{} has the following signature:
\begin{verbatim}
  int $choose_int(int n);
\end{verbatim}
This function takes as input a positive integer \texttt{n} and
nondeterministicaly returns an integer in the range
$[0,\texttt{n}-1]$.


\chapter{Concurrency}
\label{chap:concurrency}

\section{Process creation and management}

\subsection{The process type: \cproc}

This is a primitive object type and functions like any other primitive
C type (e.g., \texttt{int}). An object of this type refers to a
process. It can be thought of as a process ID, but it is not an
integer and cannot be cast to one.  It is analogous to the $\cscope$
type for dynamic scopes.

Certain expressions take an argument of \cproc{} type and some return
something of \cproc{} type.

\subsection{The \emph{self} process constant: \cself}

This is a constant of type \cproc. It can be used wherever an argument
of type \cproc{} is called for. It refers to the process that is
evaluating the expression containing \cself.

\subsection{Spawning a new process: \cspawn}

A \emph{spawn} expression is an expression with side-effects.  It
spawns a new process and returns a reference to the new process, i.e.,
an object of type \cproc.  The syntax is the same as a procedure
invocation with the keyword \cspawn{} inserted in front:
\begin{verbatim}
  $spawn f(expr1, ..., exprn)
\end{verbatim}
Typically the returned value is assigned to a variable, e.g.,
\begin{verbatim}
  $proc p = $spawn f(i);
\end{verbatim}
If the invoked function \texttt{f} returns a value, that value is
simply ignored.

\subsection{Waiting for another process to terminate: \cwait}

The system function $\cwait$ has signature
\begin{verbatim}
  void $wait($proc p);
\end{verbatim}
When invoked, this function will not return until the process
referenced by \ct{p} has terminated. Note that $p$ can be any
expression of type \cproc{}, not just a variable.

\subsection{Terminating a process immediately: \cexit}

This function takes no arguments.  It causes the
calling process to terminate immediately, regardless of the state of
its call stack:
\begin{verbatim}
  void $exit(void);
\end{verbatim}

\section{Atomicity}

\subsection{Atom blocks: \catom} This defines a number of statements to be executed
as a single atomic transition.  An \catom~block has the following
form:
\begin{verbatim}
  $atom {
    stmt1;
    stmt2;
    ...
  }
\end{verbatim}

The statements inside an \catom\ block are to be executed as one
transition. It is required that the execution of the statements in an
\catom\ block satisfy all of the following properties:
\begin{enumerate}
\item \emph{deterministic}: at each step in the execution of the atom
  block, there must be at most one enabled statement;
\item \emph{nonblocking}: at each step in the execution, there must be
  at least one enabled statement, hence, together with (1), there must
  be exactly one enabled statement;
\item \emph{finite}: the execution of the atom block must terminate
  after a finite number of steps; and
\item \emph{isolated}: there are no jumps from outside the atom block
  to inside the atom block, or from inside the atomc block to outside
  of it.
\end{enumerate}

Violations of the \emph{deterministic}, \emph{nonblocking}, or
\emph{isolated} properties will be reported either statically or
dynamically.  If the \emph{finite} property is violated, the
verification may just run forever.

Once the process enters an \catom\ block is said to be \emph{executing
  atomly}.  The process remains executing atomly until it reaches the
terminating right brace of the block.  Hence \emph{executing atomly}
is a dynamic, not static condition.  For example, the block might
contain a function call which takes the process to a point in code
which is not statically contained in an atom block; that process is
nevertheless still executing atomly and is subject to the rules above.
The process only stops executing atomly when that function call
returns and control finally reaches the right curly brace at the end
of the atom block (assuming the block is not contained in another atom
block).

\emph{Note:} \cwait\ statements are not allowed in \catom\ blocks.
The rationale for this is that there is never a way to know for
certain that another process has terminated (until \cwait\ has
returned) so there is never a way to be certain the \cwait\ statement
will not block.  If one does occur in an \catom\ block, an error will
be reported statically (if it can be detected statically) or
dynamically (otherwise).  Note that it is not always possible to
detect this statically because the \catom\ block may contain a
function call, and the function may contain the \cwait\ statement.

\subsection{Atomic blocks: \catomic} The statements in an \emph{atomic} block
will be executed without other processes interleaving, to
the extent possible.  It has the form:
\begin{verbatim}
  $atomic {
    stmt1;
    stmt2;
    ...
  }
\end{verbatim}
It is essentially a weaker form of \catom.  Unlike \catom, there are
no restrictions on the statements that can go inside an \catomic\
block.  A process executing an \catomic~block will try to execute the
statements without interleaving with other processes, unless it
becomes blocked.  Unlike an \catom, the statements in an atomic block
do not necessarily execute as a single transition; they may be spread
out over multiple transitions.

When no statement is enabled, the execution of the \catomic\ block
will be interrupted.  At this point, other processes are allowed to
execute.  Eventually, if the original process becomes enabled due to
the actions of other processes, it may be scheduled again, in which
case it regains atomicity and continues where it left off.  For
example, after executing the first loop, the process executing the
following code will become blocked at the first \cwait\ statement:
 \begin{verbatim}  
$atomic{
  for(int i = 0; i < 5; i++) p[i] = $spawn foo(i);
  for(int i = 0; i < 5; i++) $wait p[i];
}
\end{verbatim}
Other processes will then execute. Eventually, if the process being
waited on terminates, the original process becomes enabled and may be
scheduled, in which case it regain atomicity, increments \texttt{i}
and proceeds to the next $\cwait$ statement.  This is in fact a common
idiom for spawning and waiting on a set of processes.

A process that enters an $\catomic$ block is said to be
\emph{executing atomically}; it remains executing atomically until it
reaches the closing curly brace.

Both $\catom$ and $\catomic$ blocks can be nested arbitrarily, but
$\catom$ overrides $\catomic$: a process that is executing atomly will
continue executing atomly if it encounters an $\catomic$ statement;
but a process executing atomically that encounters an $\catom$ will
begin executing atomly.

The atomic semantics are defined more precisely as follows: there is a
single global variable called the \emph{atomic lock}. This variable
can either be null (meaning the atomic lock is ``free''), or it can
hold the PID of a process; that process is said to ``hold'' the atomic
lock.  Moreover, each process contains a special integer variable, its
\emph{atomic counter}, which is initially 0.  Every time a process
enters an atomic block, it increments its atomic counter; every time
it exits an atomic block, it decrements its counter.  In order to
increment its counter from $0$ to $1$, it must first wait for the
atomic lock to become free, and then take the lock.  When it
decrements its counter from $1$ to $0$, it releases the atomic lock.
When a process executing atomically becomes blocked, it releases the
lock (without changing the value of its atomic counter).


\section{Message-Passing}

CIVL-C provides a number of additional primitives that can be used to
model message-passing systems.  This part of the language is built in
two layers: the lower layer defines an abstract data type for
representing messages; the higher layer defines an abstract data type
of \emph{communicators} for managing sets of messages being
transferred among some set of processes.

\subsection{Messages: \cmessage}

Messages are similar to bundles, but with some additional meta-data.
The \emph{data} component of the message is the ``contents'' of the
message and is formed and extracted much like a bundle.  The meta-data
consists of an integer identifier for the \emph{source} place of the
message, an integer identifier for the message \emph{destination}
place, and an integer \emph{tag} which can be used by a process to
discriminate among messages for reception.  This is very similar to
MPI.

\begin{figure}
  \begin{small}
\begin{verbatim}
/* creates a new message, copying data from the specified buffer */ 
$message $message_pack(int source, int dest, int tag, void *data, int size);

/* returns the message source */ 
int $message_source($message message);

/* returns the message tag */
int $message_tag($message message);

/* returns the message destination */ 
int $message_dest($message message);

/* returns the message size */ 
int $message_size($message message);

/* transfers message data to buf, throwing exception if message
 * size exceeds specified size */ 
void $message_unpack($message message, void *buf, int size);
\end{verbatim}
  \end{small}
  \caption{The \emph{message} abstract data type}
  \label{fig:message}
\end{figure}

The functions for creating, and extracting information from, messages
are given in Figure \ref{fig:message}.

\subsection{Communicators: \cgcomm{} and \ccomm}
\label{sec:communicators}

CIVL-C defines a \emph{global communicator} type $\cgcomm$ and a
\emph{local communicator} type $\ccomm$. The global communicator is an
abstraction for a ``communication universe'' that stores buffered
messages and perhaps other data.  The local communicator wraps
together a reference to a global communicator and an integer
\emph{place}.  Most of the message-passing commands take a local
communicator as an argument to specify the communication universe used
for that operation and the place from which that operation will be
executed.  The communication universes are isolated from one
another---a message sent on one can never be received using a
different communicator, for example.

The global communicator is the shared object that must be declared in
a scope containing all scopes in which communication in that universe
will take place. It is created by specifying the number of
\emph{places} that will comprise the communicator. A place is an
address to which messages may be sent or where they may be received.
There is not necessarily a one-to-one correspondence between places and
processes: many processes can use the same place.

Local communicators are created (typically in some child scope of the
scope in which the global communicator is declared) by specifying the
gobal communicator to which the local one will be associated and the
place ID. The local communicator will be used in most of the
message-passing functions; it may be thought of as an ordered pair
consisting of a reference to the global communicator and the integer
place.

Both types ($\cgcomm$ and $\ccomm$) are handle types. When declared
with a call to the corresponding creation function, they create an
object in the specified scope and return a handle to that object. The
object can only be accessed through the specified system functions
that take this handle as an argument.

\begin{figure}
  \begin{small}
\begin{verbatim}
/* Creates a new global communicator object and returns a handle to it.
 * The global communicator will have size communication places.  The
 * global communicator defines a communication "universe" and encompasses
 * message buffers and all other components of the state associated to
 * message-passing.  The new object will be allocated in the given scope. */
$gcomm $gcomm_create($scope s, int size);

/* Creates a new local communicator object and returns a handle to it.
 * The new communicator will be affiliated with the specified global
 * communicator.   This local communicator handle will be used as an
 * argument in most message-passing functions.  The place must be in
 * [0,size-1] and specifies the place in the global communication universe
 * that will be occupied by the local communicator.  The local communicator
 * handle may be used by more than one process, but all of those
 * processes will be viewed as occupying the same place.
 * Only one call to $comm_create may occur for each gcomm-place pair.
 * The new object will be allocated in the given scope. */
$comm $comm_create($scope s, $gcomm gcomm, int place);

/* Returns the size (number of places) in the global communicator associated
 * to the given comm. */
int $comm_size($comm comm);

/* Returns the place of the local communicator.  This is the same as the
 * place argument used to create the local communicator. */
int $comm_place($comm comm);

/* Adds the message to the appropriate message queue in the communication
 * universe specified by the comm.  The source of the message must equal
 * the place of the comm. */
void $comm_enqueue($comm comm, $message message);

/* Returns true iff a matching message exists in the communication universe
 * specified by the comm.  A message matches the arguments if the destination
 * of the message is the place of the comm, and the sources and tags match. */
_Bool $comm_probe($comm comm, int source, int tag);

/* Finds the first matching message and returns it without modifying
 * the communication universe.  If no matching message exists, returns a message
 * with source, dest, and tag all negative. */
$message $comm_seek($comm comm, int source, int tag);

/* Finds the first matching message, removes it from the communicator,
 * and returns the message */ 
$message $comm_dequeue($comm comm, int source, int tag);
\end{verbatim}
  \end{small}
  \caption{The \emph{communicator} interface specifies handle 
    types $\cgcomm$ and $\ccomm$ and the functions above}
  \label{fig:comm}
\end{figure}

The communicator interface is given in Figure \ref{fig:comm}.

Certain restrictions are enforced on some relations between the
objects involved in a communication universe.

Fix a \cgcomm{} object.  This object corresponds to a single
communication universe with, say, $n$ places.  At any time, there can
be \emph{at most one} \ccomm{} object associated to a given place.  If
a program attempts to create a \ccomm{} object with the same \cgcomm{}
and place as an earlier created \ccomm{} object, a runtime error will
occur.  In particular, there can be at most $n$ \ccomm{} objects
associated to the \cgcomm.

The relation between processes and \ccomm{} objects is unconstrained.
One process may use any number of \ccomm{} objects.  (Of course, the
process must have access to handles for those \ccomm{} objects.)
Dually, a single \ccomm{} object may be used by any number of
processes; this situation arises naturally when modeling a
multi-threaded MPI program.

\begin{figure}
  \begin{small}
\begin{verbatim}
$gcomm gcomm = $gcomm_create($here, nprocs);
void Process(int rank) {
  $comm comm = $comm_create($here, gcomm, rank);

  void Thread(int tid) {
    ...$comm_enqueue(comm, msg)...
    ...$comm_dequeue(comm, source, tag)...
  }

  for (int i=0; i<nthreads; i++) $spawn Thread(i);
  ...
}
for (int i=0; i<nprocs; i++) $spawn Process(i);
...
\end{verbatim}
  \end{small}
  \caption{Code skeleton for model of multithreaded MPI program
    showing placement of global and local communicator objects}
  \label{fig:mpi-threads-comm}
\end{figure}

There is no special status given to the process which creates the
\ccomm{} object of a given place.  Any process which can access a
handle for that \ccomm{} object can use it to send or receive
messages, regardless of whether that process was the one that created
the \ccomm{} object.  However, users should be aware that verification
is likely to be most efficient when variables are declared as locally
as possible, so it is best to declare the \ccomm{} object in the
innermost scope possible.  Figure \ref{fig:mpi-threads-comm}
illustrates an effective way to do this in the context of modeling a
multithreaded MPI program.  In the code skeleton, each thread can
access the local communicator object of its process, but not that of
any other process.



% nonblocking interface: instead of message_pack, it is a request
% form a new request and Isend it
% can it be re-used?
% buffer_create, 
% request create,
% match, complete, 

\chapter{Specification}

\section{Overview}

Specification is the means by which one expresses what a program is
supposed to do, i.e., what it means for it to be correct.

There are several specification mechanisms in CIVL-C. First, there are
the default properties: these are generic properties which are checked
by default in any program, and require no additional specification
effort. These properties include absence of deadlocks, division by 0,
illegal pointer dereferences, and out of bounds array indexes.

Many more program-specific properties can be specified using
assertions. CIVL-C has a rich assertion language which extends the
language of boolean-valued C expressions. Assumptions are a
specification dual to assertions in that they restrict the set
of executions on which the assertions are checked.

Functional equivalence is a power specification mechanism. In this
approach, two programs are provided, one playing the role of the
specification, the other the role of the implementation. The
implementation is correct if, for all inputs $x$, it produces the same
output as that produced by the specification on input $x$. In other
words, the two programs define the same function; this is sometimes
known as \emph{input-output equivalence}. In order to take this
approach, one must first have a way to specify what the inputs and
outputs of a programs are; CIVL-C provides special keywords for this.

Procedure contracts are another powerful specification mechanisms.
These typically involve specifying preconditions and postconditions
for a function. The function is correct if, whenever it is called in a
state satisfying the precondition, when it returns the state will
satsify the postcondition. A program is correct if all its functions
satsify their contract.

\section{Input-output signature}

\subsection{Input type qualifier: \cinput}

The declaration of a variable in the root scope may
include the type qualifier \cinput, e.g.,
\begin{verbatim}
  $input int n;
\end{verbatim}
This declares the variable to be an input variable, i.e., one which is
considered to be an input to the program.  Such a variable is
initialized with an arbitrary (unconstrained) value of its type.  When
using symbolic execution to verify a program, such a variable will be
assigned a unique symbolic constant of its type.

In contrast, variables in the root scope which are not input variables
will instead be initialized with the ``undefined'' value.  If an
undefined value is used in some way (such as in an argument to an
operator), an error occurs.

In addition, input variables may only be read, never written to.

Alternatively, it is also possible to specify a particular concrete
initial value for an input variable.  This is done using a command
line argument when verifying or running the program.

Input (and output) variables also play a key role when determining
whether two programs are functionally equivalent.  Two programs are
considered functionally equivalent if, whenever they are given the
same inputs (i.e., corresponding \cinput{} variables are initialized
with the same values) they will produce the same outputs (i.e.,
corresponding \coutput{} variables will end up with the same values at
termination).

\subsection{Output type qualifier: \coutput}

A variable in the root scope may be declared with this type qualifier
to declare it to be an output variable.  Output variables are ``dual''
to input variables.  They may only be written to, never read.  They
are used primarily in functional equivalence checking.

\section{Assertions and assumptions}

\subsection{Assertions: \cassert}

The system function \cassert{} takes an argument of boolean type:
\begin{verbatim}
  void $assert (_Bool expr);
\end{verbatim}
During verification, the assertion is checked.  If it cannot be proved
that it must hold, a violation is reported.

Note that CIVL-C boolean expressions have a richer syntax than C
expressions, and may include universal or existential quantifiers
(see below), and the boolean values  \ctrue{} and \cfalse{}.

The assertion function may take additional optional arguments used to
print a specific message if the assertion is violated.  These
additional arguments are similar in form to those used in C's
\texttt{printf} statement: a format string, followed by some number of
arguments which are evaluated and substituted for successive codes in
the format string.  For example,
\begin{verbatim}
  $assert(x<=B, "x-coordinate %f exceeds bound %f", x, B); 
\end{verbatim}

The C function \texttt{assert}, included in the standard library
\texttt{assert.h}, is identical to \cassert.  Programmers are
free to use either one.


\subsection{Assume statements: \cassume}

As \emph{assume statement} has the form
\begin{verbatim}
  $assume expr;
\end{verbatim}
During verification, the assumed expression is assumed to hold.  If
this leads to a contradiction on some execution, that execution is
simply ignored.  It never reports a violation, it only restricts the
set of possible executions that will be explored by the verification
algorithm.

Like as assertion statement, as assume statement can be used any place
a statement is expected.  In addition, as assume statement can be used
in file scope to place restrictions on the global variables of the
programs.  For example,
\begin{verbatim}
$input int B;
$input int N;
$assume 0<=N && N<=B;
\end{verbatim}
declares \texttt{N} and \texttt{B} to be integer inputs and restricts
consideration to inputs satisfying $0\leq\texttt{N}\leq\texttt{B}$.


\section{Formulas}

A formula is a boolean expression that can be used in an assert
statement, assume statement, procedure contract (below), or invariant.
Any ordinary C boolean expression is a formula. CIVL-C provides some
additional kinds of formulas, described below.

\subsection{Implication: \cimplies}

The binary operation \cimplies{} represents logical implication.
The expression \verb!p=>q! is equivalent to \verb~(!p)||q~.

\subsection{Universal quantifier: \cforall}

The universally qunatified formula has the form
\begin{verbatim}
  $forall { type identifier | restriction} expr
\end{verbatim}
where \verb!type! is a type name (e.g., \texttt{int} or
\texttt{double}), \verb!identifier! is the name of the bound variable,
\verb!restriction! is a boolean expression which expresses some
restriction on the values that the bound variable can take, and
\verb!expr! is a formula.  The universally quantified formula
holds iff for all values assignable to the bound variable
for which the restriction holds, the formula \ct{expr} holds.

A variation on the construct above can be used in the special case
where the bound variable is to range over a finite interval
of integers.  In this case the quantified formula may be written:
\begin{verbatim}
  $forall { type identifier=lower .. upper } expr
\end{verbatim}
where \ct{lower} and \ct{upper} are integer expressions.

\subsection{Existential quantifier: \cexists}

The syntax for existentially quantified expressions is exactly the
same as for universally quantified expressions, with \cexists{} in
place of \cforall{}.

\section{Contracts}

\subsection{Procedure contracts: \crequires{} and \censures{}}
The \crequires{} and \censures{} primitives are used to encode
procedure contracts.  There are optional
elements that may occur in a procedure declaration or definition,
as follows.  For a function prototype:
\begin{verbatim}
  T f(...)
    $requires expr;
    $ensures expr;
  ;
\end{verbatim}
For a function definition:
\begin{verbatim}
  T f(...)
    $requires expr;
    $ensures expr;
  {
    ...
  }
\end{verbatim}
The value \cresult{} may be used in post-conditions to refer
to the result returned by a procedure.

\emph{Status}: parsed, but nothing is currently done with this
information.

\subsection{Loop invariants: \cinvariant}

This indicates a loop invariant.  Each C loop
construct has an optional invariant clause as follows:
\begin{verbatim}
  while (expr) $invariant (expr) stmt
  for (e1; e2; e3) $invariant (expr) stmt
  do stmt while (expr) $invariant (expr) ;
\end{verbatim}
The invariant encodes the claim that if \texttt{expr} holds upon
entering the loop and the loop condition holds, then it will hold
after completion of execution of the loop body.  The invariant is used
by certain verification techniques.

\emph{Status:} parsed, but nothing is currently done with this
information.

\section{Concurrency specification}

\subsection{Remote expressions: \texttt{e@x}}.

These have the form \verb!expr@x! and refer to a variable in another
process, e.g., \verb!procs[i]@x!. This special kind of expression is
used in collective expressions, which are used to formulate collective
assertions and invariants.

The expression \verb!expr! must have \cproc{} type.  The variable
\texttt{x} must be a statically visible variable in the context in
which it is occurs.  When this expression is evaluated, the evaluation
context will be shifted to the process referred to by \texttt{expr}.

\emph{Status}: not implemented.

\subsection{Collective expressions: \ccollective}.  These have the form
\begin{verbatim}
  $collective(proc_expr, int_expr) expr 
\end{verbatim}
This is a collective expression over a set of processes.  The
expression \texttt{proc{\U}expr} yields a pointer to the first element
of an array of \cproc.  The expression \texttt{int{\U}expr} gives the
length of that array, i.e., the number of processes.  Expression
\texttt{expr} is a boolean-valued expression; it may use remote
expressions to refer to variables in the processes specified in the
array.  Example:
\begin{verbatim}
  $proc procs[N];
  ...
  $assert $collective(procs, N) i==procs[(pid+1)%N]@i ;
\end{verbatim}

\emph{Status}: not implemented.

\chapter{Pointers and Heaps}
\label{chap:pointers}

CIVL-C supports pointers, using the same operators with the same
meanings as C (\texttt{\&}, \texttt{*}, pointer arithmetic).  There is
also a heap in every scope, and system functions to allocate and
deallocate objects in the specified scope.

\section{Memory functions: \texttt{memcpy}}

The function \texttt{memcpy} is defined in the standard C library
\texttt{string.h} and works exactly the same in CIVL-C: it copies
data from the region pointed to by \ct{q} to that pointed to by
\ct{p}.  The signature is

\begin{verbatim}
  void memcpy(void *p, void *q, size_t size);
\end{verbatim}

\section{Heaps, \cmalloc{} and \texttt{free}}

As mentioned above, each dynamic scope has an implicit heap on which
objects can be allocated and deallocated dynamically.  To allocate an
object, one first needs a reference to the dynamic scope to be used.
The system function $\cmalloc$ is like C's \texttt{malloc}, but takes
this extra scope argument:
\begin{verbatim}
  void * $malloc($scope scope, int size);
\end{verbatim}
The standard C function
\begin{verbatim}
  void * malloc(int size);
\end{verbatim}
is equivalent to \verb!$malloc($root, size)!.

The system function \ct{free} is used to deallocate a heap object;
it is just like C's \texttt{free}:
\begin{verbatim}
  void free(void *p);
\end{verbatim}

% \section{Pointer types}

% Given any object type $T$ and a static scope $s$ in a CIVL-C program,
% there is a type \emph{pointer-to-$T$-in-$s$}.  The type is used to
% represent a pointer to a memory location of type $T$ in scope $s$ or a
% descendant of $s$ (i.e., some scope contained in $s$).

% If scope $s_1$ is a descendant of $s_2$ (i.e., $s_1$ is lexically
% contained in $s_2$), the type \emph{pointer-to-$T$-in-$s_1$} is a
% subtype of \emph{pointer-to-$T$-in-$s_2$}.  This means that any
% expression of the first type can be used wherever an object of the
% second type is expected.  In particular, any expression $e$ of the
% subtype can be assigned to a left-hand-side expression of the
% supertype without explicit casts; also $e$ can be used as an argument
% to a function for which the corresponding parameter has the supertype.

% The syntax for denoting this type adheres to the usual C syntax for
% denoting the type \emph{pointer-to-$T$} with the addition of a scope
% parameter within angular brackets immediately following the \texttt{*}
% token.  For example, to declare a variable \texttt{p} of type
% \emph{pointer-to-$T$-in-$s$}, one writes
% \begin{verbatim}
%   int *<s> p;
% \end{verbatim}
% If the scope modifier \texttt{<...>} is absent, the scope is taken to
% be the root scope $s_0$.  The object has type
% \emph{pointer-to-$T$-in-$s_0$}, which is abreviated as
% \emph{pointer-to-$T$}.  In this way, stanard C programs can be
% interpreted as CIVL-C programs.

% \section{Address-of operator}

% The address-of operator \texttt{\&} returns a pointer of the
% appropriate subtype using the innermost scope in which its left-hand-side
% argument is declared.  For example

% \begin{verbatim}
%   {
%     $scope s1 = $here();
%     int x;
%     double a[N];
%     int *<s1> p = &x;
%     double *<s1> q = &a[2];
%   }
% \end{verbatim}
% is correct (in particular, it is type-correct) because \texttt{\&x}
% has type \emph{pointer-to-\texttt{int}-in-\texttt{s1}}, since
% \texttt{s1} is the scope in which \texttt{x} is declared.

% Another pointer example:
% \begin{small}
% \begin{verbatim}
% { $scope s0 = $here();
%   { $scope s1 = $here();
%     double x;
%     { $scope s2 = $here();
%       double y;
%       double *<s1> p;
%       /* p can only point to something in s1 or descendant, for example, s2 */
%       p = &x; // fine
%       p = &y; // fine
%       p = (double*)$malloc(s0, 10*sizeof(double)); // static type error
%     }
%   }
% }
% \end{verbatim}
% \end{small}

% \section{Pointer addition and subtractions}

% If \texttt{e} is an expression of type \emph{pointer-to-$T$-in-$s$}
% and \texttt{i} is an expression of integer type then \texttt{e+i} also
% has type \emph{pointer-to-$T$-in-$s$}.  In other words, pointer
% addition cannot leave the scope of the original pointer.  This
% reflects the fact that every object is contained in one scope, and
% pointer addition cannot leave the object.

% Pointer subtraction is defined on two pointers of the same type, where
% ``same'' includes the scope.  That is checked statically.  As in C, it
% is only defined if the two pointers point to the same object.  In
% CIVL-C, a runtime error will be thrown if they do not point to the
% same object.

% \section{Semantics of scopes and pointer types}

% A variable of type \cscope{} is treated like any other variable.
% It becomes part of the state when the scope in which it is declared
% is instantiated to form a dynamic scope.  The variable is 
% initialized  at that time and its value cannot change.

% Each time a dynamic scope is instantiated, it is assigned a unique ID
% number.  The exactly value of the ID number is not relevant, it just
% has to be distince from any other scope ID number that currently
% exists in the state.  This is the value that is assigned to the scope
% variable.  Therefore, if a static scope contains a scope variable, and
% that scope is instantiated twice to form two distinct dynamic scopes,
% the values assigned to the two variables will be distinct.

% A pointer value is an ordered pair $\langle \delta,r \rangle$, where
% $\delta$ is a dynamic scope ID and $r$ is a reference to a memory
% location in the static scope associated to $\delta$.  (We will define
% the exact form of a reference later.)

% When a dynamic scope is instantiated, each new variable created is
% assigned a \emph{dynamic type}.  This is a refinement of the static
% type associated to the static variable.   Every dynamic type
% is an instance of exactly one static type.  The dynamic
% type of the newly instantiated variable is an instance of the
% static type of the static variable.

% The dynamic pointer types have the form
% \emph{pointer-to-$t$-in-$\delta$}, where $t$ is a dynamic type and
% $\delta$ is a dynamic scope ID.  For a program to be dynamically type
% safe, such a variable should hold only values of the form $\langle
% \delta, r\rangle$.  In particular, the variable should never be
% assigned a value where the dynamic scope component is a different
% instance of the static scope $s$ associated to $\delta$.

% \section{Pointer casts}

% If scope $s_1$ is contained in scope $s_2$, an expression of type
% \emph{pointer-to-$T$-in-$s_1$} can always be cast to
% \emph{pointer-to-$T$-in-$s_2$},
%  because the first is a subtype of the second.  (As described above,
% the cast is unnecessary.)  

% The cast in the other direction is also allowed, but the dynamic type
% safety of that cast will only be checked at runtime.  In particular, a
% runtime error will result if the cast attempts to cast the pointer
% value to a dynamic scope which does not contain (is an ancestor of)
% the dynamic scope component of the pointer value.

% A type \emph{pointer-to-$T_1$-in-$s$} can be cast to a type
% \emph{pointer-to-$T_2$-in-$s$} according to the usual rules of C.  In
% other words, usual casting rules apply as long as you don't change the
% scope.

% \section{Scope-Parameterized Functions}

% Coming soon.  (Parsed, type checked, not currently used otherwise.)

% \section{Scope-Parameterized Type Definitions}

% Coming soon. (Ditto.)

\chapter{Libraries}

Each of the following libraries is at least partially implemented and can
be included in a CIVL-C program:
\begin{itemize}
\item \ct{stdlib}
\item \ct{stdbool}
\item \ct{stdio}
\item \ct{assert}
\end{itemize}
