IR – CIVL

Context Navigation

Version 108 (modified by zmanchun, 10 years ago) ( diff )
--

The CIVL-IR language. A program in this language is also known as a "CIVL model".

Properties of the language:

the language is not intended to be written by humans; it is an intermediate form constructed by CIVL. However it should be readable to help debug things
a CIVL-IR program represents a guarded-transition system explicitly
as in CIVL-C, there are functions, scopes, and functions can be defined in any scope
all blocks (including a function definition) consist of the following elements, in this order:
- a sequence of contract clauses (for function definitions only)
- a sequence of type definitions
- a sequence of variable declarations (with no initializers)
- a sequence of function definitions
- a body, which encodes a program graph.
an array is declared without any length expression. When it is initialized it can specify length.
curly braces are used only to indicate scopes, as in { // new scope ... }
parentheses are used to indicate function invocations, as in add(x,y)
angular brackets are used to delimit tuples or sequences
square brackets are used to delimit parameters in types
unlike C, there is no "array-pointer pun". If an array a needs to be converted to a pointer, you must use addr(asub(a, 0)).
there are no automatic conversions. All conversions must be by explicit casts or other functions. Operations such as numeric addition (add) require that both operands have the exact same type.
in general, symbols can be used before they are defined, as long as they are in scope. For example, a function f can call g, even if g is defined after f in the same scope. There is no need for a "prototype". Similarly, a type definition can refer to a type defined later, or can refer to itself in its own definition.

Grammar

This is not the complete grammar, but the high-level overview.

program: typedef* vardecl* fundef+ ;
block: '{' typedef* vardecl* fundef* statement* '}' ;
statement: block | basicStmt ;
basicStmt:  (ID ':')? (simpleStmt | chooseStmt) ;
simpleStmt: (guardedStmt | primitiveStmt) ('goto' ID)? ;
guardedStmt: 'when' expr 'do' primitiveStmt ;
chooseStmt: 'begin choose' simpleStmt+ 'end choose' ;
typedef: 'type' ID '=' typeName ';' ;
vardecl: 'var' varopts? ID ':' typeName ';' ;
varopts: '[' varopt+ ']' ;
varopt: 'input' | 'output' ;
fundef:  'fun' funopts? ID '(' paramlist ')' ':' typeName conclause* block ;
funopts: '[' funopt+ ']' ;
conclause : 'assigns' expr ';' 
          | 'requires' expr ';'
          | 'ensures' expr ';'
          ...
          ;

Notes:

If goto is missing, default is the next location.
If guard missing, default is true.

Example:

fun f(u:Integer, a:Array[Real]): Integer {
  var x: Real;
  var y: Real;
  var z: Float[16,23];
 
L1 :
  begin choose
    when g1 do stmt1; goto L2;
    when g2 do stmt2; goto L3;
  end choose

  { // begin new scope
    x: Real;
L2 :
    stmt3; goto L4;
    ...
  } // end new scope
...
}

Example: the C code

{
  int x=3*y;
  int a[x+1];
}

could be represented as

{
  var x: Integer;
  var a_t : Dytype[Array[Integer]];
  var a: Array[Integer];

  ASSIGN "x", mul(3,"y");
  ASSIGN "a_t", dytype(Array[Integer, add("x",1)]);
  ASSIGN "a", new("a_t");
}

Types

The types (and their type names) are:

Bool : boolean type, values are true and false
Proc : process type
Scope : scope type
Char : character type
Bundle : type representing some un-typed chunk of data
Heap : heap type
Range : ordered set of integers
Domain : ordered set of tuples of integers
Domain[n], n is an integer at least 1; subtype of Domain in which all tuples have arity n.
Enum types.
- different from integers or like C?
The numeric types
- The integral types
  - Integer : the mathematical integers
  - Int[lo,hi,wrap]
    - lo, hi are concrete (?) integers, wrap is boolean
    - finite interval of integers [lo,hi]. If wrap is true then all operations "wrap", otherwise, any operation resulting in a value outside of the interval results in an exception being thrown.
    - Do we want to allow lo and hi to be any values of type Integer, which means they are dynamic types, like complete array types?
- The real types
  - Real : the mathematical real numbers
  - Float[e,f], e, f are concrete integers, each at least 1.
    - IEEE754 floating point numbers. This type includes NaNs, negative 0s, infinities, etc.
- Herbrand[T], for a non-Herbrand numeric type T
  - the values of this type are (unsimplified) symbolic expressions. Each numeric operator is treated as an uninterpreted function. The leaf nodes in an expression are values of type T. Two values are equal iff the two symbolic expressions are identical (using equality in T on the leaf nodes). A value of Herbrand type may therefore be thought of as representing an expression in T but where all the numerical operations have been "delayed".
Tuple[<T0, T1, ...>]: a tuple type, the Cartesian product of T0, T1, ...
- What about bit-widths?
Union[<T0, T1, ...>]: union type, the disjoint union of T0, T1, ...
Array[T] : arrays of any length whose elements belong to T
- Note: there is also a dynamic type Array[T,n], signifying arrays of length n. That is a subtype of the dynamic type of Array[T].
Function[<T0,T1,...>,T] : functions consuming T0,T1,... and returning T. T can be void to indicate nothing is returned.
Mem : type representing a memory set. May be thought of as a set of pointers.
Pointer : all pointers, a subtype of Mem
Pointer[T] : pointer-to-T, subtype of Pointer
Dytype : the set of all dynamic types
Dytype[T]: dynamic types refining T, for a static type T.
- Values of this type represent dynamic types that refine T. For example dytype(Array[Integer,24]) has static type Dytype[Array[Integer]]

Type facts:

Static types are the types assigned to variables in a program statically. A static type contains no values anywhere in the type tree. That is, there is no array length expression in the type. These are the types that are used in declarations. Each variable is declared to have some static type.

Value types (aka dynamic types) are the types associated to values. They include all the static types plus possible array lengths. A value type refines a static type if when you delete the values from the value type you get the static type.

A type name is a syntactic element that names a (static or value) type. Examples of type names include Array[Integer] and Array[Integer,24].

The expression new(t) takes a Dytype and returns the initial value for an object of that type. The initial value of Integer and other primitive (non-compound) types is "undefined". The initial value of Array[Integer] is an array of length 0 of Integer. The initial value of Pointer[Real] is the undefined pointer to Real. The initial value of Array[Real, 10] is the array of length 10 in which each element is undefined. In general, the initial value of an array of length n is the sequence of length n in which every element is the initial value of the element type of the array. The initial value of a tuple type is the tuple in which each component is assigned the initial value for its type.

Example: the C code

int n = 10;
struct S { int a[n]; };
struct S x1;
n=20;
struct S x2;

may be translated as

type S=Tuple[<Array[Integer]>];

var n: Integer;
var S_d: Dytype[S];
var x1: S;
var x2: S;

ASSIGN "n", 10;
ASSIGN "S_d", dytype(Tuple[<Array[Integer, "n"]]>);
ASSIGN "x1", new("S_d");
ASSIGN "n", 20;
ASSIGN "x2", new("S_d");

Example of self-referential data definition, a linked list:

type Node=Tuple[<Integer,Pointer[Node]>];

Expressions

In the following list of expressions, e, e0, e1, etc., are expressions. T is a type name. t is an expression of type Dytype.

Logical

true, false : literal values of type Bool
not(e) : logical not
and(e1,e2), or(e1,e2): logical and/or operation.
- These operators are short-circuiting, which matters because of exception side-effects.
implies(e1,e2): logical implication. Short-circuiting.
eq(e1,e2), neq(e1,e2): equality/inequality test
forall(<i1:T1,i2:T2,...>,e) : universal quantification. For all i1 in type T1, i2 in type T2, ..., e2 holds.
exists(<i1:T1,i2:T2,...>,e): existential quantification. There is some i1 in type T1, i2 in type T2, ..., such that e holds.

Numeric

123, -123, 3.1415, etc. : values of type Integer, Int, Real, Float. NEED TO BE MORE SPECIFIC
add(e1,e2) : numeric addition.
- e1 and e2 have the same numeric type. Note that there are no "automatic conversions" as there are in C. If the original expressions have different types, explicit conversions must be inserted.
sub(e1,e2) : subtraction
mul(e1,e2) : multiplication
div(e1,e2) : division
- If both are integral types, the result is integer division. Otherwise it is real division. Need to define what happens for negative integers.
mod(e1,e2) : integer modulus. Arguments may be Integer or any Int type. They must have the same type, which is also the type of the result.
neg(e) : negative. Argument may have any numeric type; result has same type.
lt(e1,e2), lte(e1,e2): less than/less than or equal to. Arguments must have same numeric type.
round(e,t): returns value in numerical type t that is "closest" to the given value e
- add argument for rounding mode?
- exception if out of range?
floor(e) : given a real or floating number, returns the greatest Integer less than or equal to it.
ceil(e) : given a real or floating number, returns the least Integer greater than or equal to it.
abs(e): given any numeric expression e, returns the absolute value; result is same type as e.
pow(e,n): power operator
- given any numeric expression e and expression n of any integral type, returns e to the n-th power.
- n must evaluate to a nonnegative integer.
herbrand(e): "Herbrandize" a numeric value.
- given a numeric value of type T, returns the value as type Herbrand[T], a trivial symbolic expression consisting of a single node.
eval(e) : evaluate a Herbrand expression
- given a value of type Herbrand[T], returns the value of type T obtained by evaluating all the delayed operations
- exceptions may be thrown if evaluating any of the delayed operations results in an exception

Characters and Strings

'a', 'b', ... : Char values. UNICODE?
string("abc") : string literals
- value of type Array[Char, n+1], where n is the length of the string (the last element is the character \0)

Ranges and Domains

range(e1,e2,e3) : value of type Range.
- If e3 is positive, the integers e1, e1+e3, e1+2*e3, ... that are less than or equal to e2. If e3 is negative, the integers e1, e1+e3, e1+2*e3, ... that are greater than or equal to e2. Exception if e3 is 0.
domain(<r1,...,rn>) : value of type Domain[n], the Cartesian product of the n ranges, with dictionary order
hasnext(dom, <i,j,…>): an expression of type Bool, testing if the domain dom contains any element after <i,j,...>. If i or j or ... is undefined, then it evaluates to false if dom is empty, and true if dom is not empty.

Arrays (which are also Sequences)

array(T,<e0,...,en-1>): value of type Array[T, n], a literal array
array(T,n,e): value of type Array[T,n] in which each of the n elements is e
asub(e1,e2) : array subscript expression.
- Note that e1 must have array type, not pointer type. (This is different from C.) If e1 has pointer type, use deref(padd(e1, e2)) instead.
seq_add(a,e) : array obtained by adding element e to the end of the array. Original array a is not modified.
seq_append(a1,a2) : array obtained by concatenating two arrays. Original array not modified.
seq_remove(a,i) : array obtained by removing element at position i from a. Original array a not modified.
seq_insert(a,i,x) : array obtained by inserting an element x at position i in a.
seq_length(a) : returns length of array a (an Integer)
bit_and(e1, e2), bit_or(e1, e2), bit_xor(e1, e2), bit_comp(e1) : bit-wise operations
- arguments are arrays of booleans of the same length
need conversions between bit vectors and integers

Tuples

tuple(S,<e0,e1,...>) : value of tuple type S (tuple literal)
tsel(e1,i) : tuple selection of component i of e1. i must be a literal natural number.

Unions

union_inj(U,i,x) : x is in Ti, result is in the union type U
union_sel(U,i,y) : y is in U, result is in Ti (or exception if y is not in image of injection from Ti)
union_test(U,i,y) : y is in U; determines if y is in the image under injection of Ti. Returns a Bool.

Pointers and Memory

NULL : value of type void*
deref(e) : pointer dereference
addr(e) : address-of operator
padd(e1,e2): pointer addition.
- e1 has pointer type and e2 has an integer type or range type.
- If e2 has integer type the result has pointer type. Otherwise, the result has Mem type. (BETTER TO USE A DIFFERENT FUNCTION?)
psub(e1,e2): pointer subtraction
- Both args must have same type Pointer[T] for some T.
- Result has type Integer
mem_reach(ptr), where ptr is an expression with a pointer type.
- This represents the set of all memory units reachable from ptr, including the memory unit pointed to by ptr itself.
mem_union(mem1,mem2), where mem1 and mem2 are expressions of type Mem.
- This is the union of the two memory sets.
mem_isect(mem1,mem1) : set intersection
mem_comp(mem1) : set complement (everything not in mem1)
mem_slice(a,dom)
- where a is an expression of array type and dom is an expression of Domain type.
- The dimension of the array must match the dimension of the domain. This represents all memory units which are the cells in the array indexed by a tuple in dom.

Scopes and Processes

root, here : values of type Scope
self, proc_null : values of type Proc
scopeof(e): scope of left-hand-side expression e
scope_eq(e1,e2) : equality of two dyscopes
scope_lt(e1,e2) : is first scope a proper subscope of second?
scope_lte(e1,e2): is first scope a subscope or equal to second?

Types

new(t) : new (default) value of Dytype t. If t has type Dytype[T], then this expression has static type T.
typeof(e) : returns the value of type Dytype representing the value type of e.
- If e has static type T then this expression has static type Dytype[T].
subtype(t1,t2): is t1 a subtype of t2? Both args have type Dytype. Returns a Bool
sizeof_type(t) : the size of the dynamic type t; return an Integer
cast(e,t) : casts e to a value of dytype t.
- If t has type Dytype[T], this expression has static type T and the value will have dynamic type t.
- A cast can only take place when there is no change to the underlying value in the concrete semantics. It is only for changing the type.
- One can cast from any integral type to another (as the values in all cases are integers), but there might be an exception if the value is not in the range of the targeted type.
- One can cast any Float value to a Real value, but not the other way. (To convert from a Real to a Float, use, e.g., round.)
- One cannot cast from one Float type to another Float type (unless the float parameters are exactly the same). To do this, one could instead use round.
- If t1 is a subtype of t2, one can always cast from t1 to t2 (though this is rarely necessary, since an expression of type t1 can be used wherever one of type t2 is expected). One can also cast from t2 to t1, but a runtime exception will be thrown if the value being cast does not already belong to t1. For example, one might cast from Domain to Domain[n], or from Array[Real] to Array[Real, n].
- One cannot cast between a Herbrand type and any other type. Instead, use functions eval and herbrand.

Other expressions

"x" : x is an identifier, naming either a type, variable, or function
sizeof_expr(e) : the size of the value of expression e; return an Integer
defined(e) : is e defined? Bool type
ite(e1,e2,e3): if-then-else (conditional) expression, like e1?e2:e3 in C.
- e1 must have Bool type. e2 and e3 must have the same static types, which is the static type of the result.
call(e0,<e1,...,en>) : function call
- a function invocation where e0 must evaluate to either an abstract or pure system function
choose_int(e) : nondeterministic choice of integer

The Primitive Statements

ASSERT e, "msg", ...; : assertion with message
ASSUME e;
ASSIGN e1,e2;
CALL f, <e1,...,en>; and CALL e, f, <e1,...,en>;
- call to a function which is not abstract and is not a pure system function. The first form has no left-hand-side. The second form assigns the result returned by the call to e.
SPAWN f, <e1,...,en>; and SPAWN e, f, <e1,...,en>;
WAIT e;
WAITALL e, n; : wait on all procs in a list
- e has type Pointer[Proc] and n has an integral type, the number of procs in the list
ALLOCATE e, h, t, e0;
- e has static type Pointer[T], for some static type T
- h has type Heap
- t has type Dytype[T], for the same static type T
- e0 has an integral type.
- Allocates e0 objects of type t on heap h, returning pointer to first element in e.
- WOULD IT BE BETTER TO SPECIFY THE SCOPE?
- To translate the C malloc you first need to figure out the type of the elements being malloced. If the argument to malloc is n, then you first need to insert an assertion eq(mod(n, sizeof_type(t)), 0), and then ALLOCATE e, h, t, div(n, sizeof_type(t)).
FREE p;
EVAL e; : evaluates an expression e
- useful only because e might contain exceptions (e.g., array index out of bound, division by zero)
NOOP;
- Is there a need to add annotations for "true" or "false" branch, etc.? If so, we can just make these parameters to the Noop.
RETURN; and RETURN e;
ATOMIC_ENTER;
ATOMIC_EXIT;
PARSPAWN p, d, f; : spawn a sequence of procs
- p has type Pointer[Proc], d has Domain type and f has Function type.
NEXT dom, <i,j,…>; : move to next element in domain
- updates i,j,... to be the value of the next tuple in dom after <i,j,...>
- if i, or j, ... is undefined, then i, j, ... are updated to be the value of the first tuple in dom
- The CIVL-C code $for(int i,j,...: dom) S; is translated into the following:
```
var i: Integer;
var j: Integer;
...
ASSIGN "i", new(Integer);
ASSIGN "j", new(Integer);
...
begin choose
  when(hasnext(dom, <i,j,...>)) NEXT(dom, <i,j,...>); goto L1;
  when(not(hasnext(dom, <i,j,...>))) ; goto L2;
end choose
L1:
  S; goto L2;
L2:
  ...
```

Function Definitions

The general form of a function definition:

fun[options] foo(x:Integer, ...) : Real
  // contract clauses (optional):
  requires expr;
  ensures expr ;
  mpi_requires expr ;
  mpi_ensures expr ;
  assigns expr ;
  reads expr ;
  depends actions ;
{
  // all of the following are missing for a system function...

  // types defns:
  type T=tuple[<blah, blay>];
  type U=blah;

  // variable decls:
  var x : T;
  var y : U;

  // body:
  ASSIGN x, 0;
  ...
}

Example of declaration of a system function defined in library Concurrency:

fun[lib="edu.udel.cis.vsl.civl.lib.Concurrency"] f(x: Real, b: Bool): Float[32,33] {}

Function modifiers that may be placed in the brackets after fun:

pure : the function has no side effects, but may be nondeterministic
abstract: function is a pure, mathematical function: deterministic function of inputs
atomic: function definition is atomic, and it never blocks
- ISN'T THIS TRUE OF EVERY SYSTEM FUNCTION? IN WHICH CASE, IS THIS ONLY FOR NON-SYSTEM FUNCTIONS?
lib="...": function is a system function defined in specified library
guard="...": the name of the guard function
main : this is the program entry point (global scope only)

System functions:

A function declaration which contains an option lib=... is a system function
A system function will have no body. It may have any number (including 0) of contract clauses.
If the system function is called anywhere in the program, it must be defined by providing Java code in an Enabler and Executor. Failure to do so will result in an exception.
A system function may modify only memory it can reach. This includes allocating new data on heaps it can reach. This modifiable memory can be further limit by the contract.
A system function may have a guard, which is another function taking the same types of inputs, but returning Bool.

Example of a declaration of a system function with guard.

fun g(x:Real):Bool { ... }
fun[lib="Blah",guard="g"] f(x:Real):Integer {}

Function Contracts

THIS NEEDS TO BE UPDATED

event set expressions:

EventSetExpression
 : read(MemorySetExpression)
 | write(MemorySetExpression)
 | access(MemorySetExpression)
 | calls(FunctionCallExpression)
 | \nothing
 | \everything
 | ‘(’ EventSetExpression ‘)’
 | EventSetExpression + EventSetExpression
 | EventSetExpression - EventSetExpression
 | EventSetExpression & EventSetExpression

depends clause: depends [condition] { event1, event2, ...}

Example:

depends {
  access(n) - (calls(inc(MemorySetExpression)) + calls(dec(MemorySetExpression)))
}

absence of depends clause:

assigns-or-reads clause
- assigns clause: assigns [condition] {memory-list}
- reads clause: reads [condition] {memory-list}
- reads {nothing} implies assigns {nothing}
- reads {\nothing} is equivalent to: reads {\nothing} assigns {\nothing}
- assigns {X} where X != \nothing, implies reads {X}
- assigns {X} is equivalent to: assigns{X} reads{X}
- absence:
  - absence of reads clause: there is no assumption about the read access of the function, i.e., the function could read anything
  - absence of assigns clause: similar to the absence of reads clause
- reads/assigns {\nothing} doesn’t necessarily means that the function never reads or assigns any variable. The function could still reads/assigns its “local” variables, including function parameters and any variable declared inside the function body.

For an independent function which has depends {\nothing}, usually we also need to specify reads{nothing}, for the purpose of reachability analysis.

e.g.,

/* Returns the size of the given bundle b. */
bundle_size(Bundle b; Int)
  depends {\nothing}
  reads {\nothing}
  ;

Example of a function declaration with contracts:

atomic_f sendRecv(Int cmd, Pointer buf; Int)
  depends [eq(cmd, SEND)] {write(buf)}
  depends [eq(cmd, RECV)] {access(deref(buf))}
  assigns [eq(cmd, SEND)] {\nothing}
  assigns [eq(cmd, RECV)] {deref(buf)}
  reads {deref(buf)}
{
L0:
  when (eq(cmd, SEND))
     send(deref(buf), ...); goto L1;
  when (eq(cmd, RECV))
    deref(buf):=recv(...); goto L1;
  when (and(neq(cmd, SEND), neq(cmd, RECV)))
    ; goto L1;
L1:
}

Program

A program consists of a sequence of type definitions, followed by a sequence of variable declarations, followed by a sequence of function definitions. This is called the "global scope."

The variable declarations in the global scope may use the options

input
output

There must be exactly one global function definition with option main. This function must consume no inputs and return an Integer (check this).

Semantics

Semantics issues

define every possible cast
define every possible +, etc.
define every kind of pointer value and casts between pointer types
casts between pointer and integer types?

Values

Transitions

Libraries

Which libraries require system functions? pointer, ... Should these be called part of the language, or not?

Questions

fix notation for contracts
quotes around variables in declarations? Could be awkward but might simplify parsing.
quotes around function names in definitions, prototypes? Could be awkward.
is there ever a need for a function prototype? Could variable decl notation be used instead? NO, variable decls are different, those are state variables.
should we leave our parameter names in abstract and system functions? They are not needed for anything. THEY ARE NEEDED FOR CONTRACTS, sometimes, and maybe for documentation. Consider making them optional.

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text