The CIVL-IR language. A program in this language is also known as a "CIVL model". Properties of the language: * the language is not intended to be written by humans; it is an intermediate form constructed by CIVL. However it should be readable to help debug things * a CIVL-IR program represents a guarded-transition system explicitly * as in CIVL-C, there are functions, scopes, and functions can be defined in any scope * all blocks (including a function definition) consist of the following elements, in this order: * a sequence of contract clauses (for function definitions only) * a sequence of type definitions * a sequence of variable declarations (with no initializers) * a sequence of function definitions * a body, which encodes a program graph. * an array is declared without any length expression. When it is initialized it can specify length. * curly braces are used only to indicate scopes, as in `{ // new scope ... }` * parentheses are used to indicate function invocations, as in add(x,y) * angular brackets are used to delimit tuples or sequences * square brackets are used to delimit parameters in types * unlike C, there is no "array-pointer pun". If an array `a` needs to be converted to a pointer, you must use `addr(asub(a, 0))`. * there are no automatic conversions. All conversions must be by explicit casts or other functions. Operations such as numeric addition (`add`) require that both operands have the exact same type. * in general, symbols can be used before they are defined, as long as they are in scope. For example, a function `f` can call `g`, even if `g` is defined after `f` in the same scope. There is no need for a "prototype". Similarly, a type definition can refer to a type defined later, or can refer to itself in its own definition. == Grammar == This is not the complete grammar, but the high-level overview. {{{ program: typedef* vardecl* fundef+ ; block: '{' typedef* vardecl* fundef* statement* '}' ; statement: block | basicStmt ; basicStmt: (ID ':')? (simpleStmt | chooseStmt | gotoStmt) ; gotoStmt: 'goto' ID ; simpleStmt: (guardedStmt | primitiveStmt) gotoStmt? ; guardedStmt: 'when' expr 'do' primitiveStmt ; chooseStmt: 'begin choose' simpleStmt+ 'end choose' ; typedef: 'type' ID '=' typeName ';' ; vardecl: 'var' varopts? ID ':' typeName ';' ; varopts: '[' varopt+ ']' ; varopt: 'input' | 'output' ; fundef: 'fun' funopts? ID '(' paramlist ')' (':' typeName)? conclause* block ; funopts: '[' funopt+ ']' ; conclause : 'assigns' expr ';' | 'requires' expr ';' | 'ensures' expr ';' ... ; }}} Notes: * If goto is missing, default is the next location. * If guard missing, default is `true`. Example: {{{ fun f(u:Integer, a:Array[Real]): Integer { var x: Real; var y: Real; var z: Float[16,23]; L1 : begin choose when g1 do stmt1; goto L2; when g2 do stmt2; goto L3; end choose { // begin new scope x: Real; L2 : stmt3; goto L4; ... } // end new scope ... } }}} Example: the C code {{{ { int x=3*y; int a[x+1]; } }}} could be represented as {{{ { var x: Integer; var a: Array[Integer]; ASSIGN "x", mul(3,"y"); ASSIGN "a", new(dytype(Array[Integer, add("x",1)])); } }}} == Types == The types (and their type names) are: * `Bool` : boolean type, values are `true` and `false` * `Proc` : process type * `Scope` : scope type * `Char` : character type * `Bundle` : type representing some un-typed chunk of data * `Heap` : heap type * `Range` : ordered set of integers * `Domain` : ordered set of tuples of integers * `Domain[n]`, n is an integer at least 1; subtype of `Domain` in which all tuples have arity n. * `Enum` types. * different from integers or like C * The **numeric types** * The **integral types** * `Integer` : the mathematical integers * `Int[lo,hi,wrap]` * `lo`, `hi` are concrete integers, `wrap` is boolean * finite interval of integers `[lo,hi]`. If `wrap` is true then all operations "wrap", otherwise, any operation resulting in a value outside of the interval results in an exception being thrown. * `lo` and `hi` can be any values of type `Integer`, so they are dynamic types, like complete array types * The real types * `Real` : the mathematical real numbers * `Float[e,f]`, `e`, `f` are concrete integers, each at least 1. * IEEE754 floating point numbers. This type includes `NaNs`, negative 0s, infinities, etc. * Herbrand[T], for a non-Herbrand numeric type `T` * the values of this type are (unsimplified) symbolic expressions. Each numeric operator is treated as an uninterpreted function. The leaf nodes in an expression are values of type `T`. Two values are equal iff the two symbolic expressions are identical (using equality in `T` on the leaf nodes). A value of Herbrand type may therefore be thought of as representing an expression in `T` but where all the numerical operations have been "delayed". * `Tuple[]`: a tuple type, the Cartesian product of `T0`, `T1`, ... * **What about bit-widths?** * `Union[]`: union type, the disjoint union of `T0`, `T1`, ... * `Array[T]` : arrays of any length whose elements belong to T * Note: there is also a dynamic type `Array[T,n]`, signifying arrays of length n. That is a subtype of the dynamic type of `Array[T]`. * `Function[,T]` : functions consuming `T0`,`T1`,... and returning `T`. * `T` can be absent. `Function[]` means the function returns nothing. * `Mem` : type representing a memory set. May be thought of as a set of pointers. * `Pointer` : all pointers, a subtype of `Mem` * `Pointer[T]` : pointer-to-`T`, subtype of `Pointer` * `Dytype` : the set of all dynamic types * `Dytype[T]`: dynamic types refining `T`, for a static type `T`. * Values of this type represent dynamic types that refine `T`. For example `dytype(Array[Integer,24])` has static type `Dytype[Array[Integer]]` Type facts: **Static types** are the types assigned to variables in a program statically. A static type contains no values anywhere in the type tree. That is, there is no array length expression in the type. These are the types that are used in declarations. Each variable is declared to have some static type. **Value types** (aka **dynamic types**) are the types associated to values. They include all the static types plus possible array lengths. A value type refines a static type if when you delete the values from the value type you get the static type. A **type name** is a syntactic element that names a (static or value) type. Examples of type names include `Array[Integer]` and `Array[Integer,24]`. The expression `new(t)` takes a Dytype and returns an unconstrained value of that type. Example: the C code {{{ int n = 10; struct S { int a[n]; }; struct S x1; n=20; struct S x2; }}} may be translated as {{{ type S=Tuple[]; var n: Integer; var S_d: Dytype[S]; var x1: S; var x2: S; ASSIGN "n", 10; ASSIGN "S_d", dytype(Tuple[); ASSIGN "x1", new("S_d"); ASSIGN "n", 20; ASSIGN "x2", new("S_d"); }}} Example of self-referential data definition, a linked list: {{{ type Node=Tuple[]; }}} == Expressions == In the following list of expressions, `e`, `e0`, `e1`, etc., are expressions. `T` is a type name. `t` is an expression of type `Dytype`. Logical * `true`, `false` : literal values of type `Bool` * `not(e)` : logical not * `and(e1,e2)`, `or(e1,e2)`: logical and/or operation. * These operators are short-circuiting, which matters because of exception side-effects. * `implies(e1,e2)`: logical implication. Short-circuiting. * `eq(e1,e2)`, `neq(e1,e2)`: equality/inequality test * `forall(,e)` : universal quantification. For all i1 in type T1, i2 in type T2, ..., e2 holds. * `exists(,e)`: existential quantification. There is some i1 in type T1, i2 in type T2, ..., such that e holds. Numeric * `-? digit+`: values of types `Integer` and `Int`: `[+|-]? digit+`, e.g., 123, -123 * `-? digit+ . digit+`: values of types `Real` and `Float`, e.g., 3.1415, -0.9875 * `add(e1,e2)` : numeric addition. * `e1` and `e2` have the same numeric type. Note that there are no "automatic conversions" as there are in C. If the original expressions have different types, explicit conversions must be inserted. * `sub(e1,e2)` : subtraction * `mul(e1,e2)` : multiplication * `div(e1,e2)` : division * If both are integral types, the result is integer division. Otherwise it is real division. Need to define what happens for negative integers. * `mod(e1,e2)` : integer modulus. Arguments may be `Integer` or any `Int` type. They must have the same type, which is also the type of the result. * `neg(e)` : negative. Argument may have any numeric type; result has same type. * `lt(e1,e2)`, `lte(e1,e2)`: less than/less than or equal to. Arguments must have same numeric type. * `round(e,t)`: returns value in numerical type `t` that is "closest" to the given value `e` * add argument for rounding mode? * exception if out of range? * `floor(e)` : given a real or floating number, returns the greatest `Integer` less than or equal to it. * `ceil(e)` : given a real or floating number, returns the least `Integer` greater than or equal to it. * `abs(e)`: given any numeric expression e, returns the absolute value; result is same type as `e`. * `pow(e,n)`: power operator * given any numeric expression e and expression `n` of any integral type, returns e to the n-th power. * `n` must evaluate to a nonnegative integer. * `herbrand(e)`: "Herbrandize" a numeric value. * given a numeric value of type `T`, returns the value as type `Herbrand[T]`, a trivial symbolic expression consisting of a single node. * `eval(e)` : evaluate a Herbrand expression * given a value of type `Herbrand[T]`, returns the value of type `T` obtained by evaluating all the delayed operations * exceptions may be thrown if evaluating any of the delayed operations results in an exception Characters and Strings * 'a', 'b', ... : Char values. * `\u00ea`, `\u00f1`, ...: Unicodes * `string("abc")` : string literals * value of type `Array[Char, n+1]`, where `n` is the length of the string (the last element is the character `\0`) Ranges and Domains * `range(e1,e2,e3)` : value of type `Range`. * If `e3` is positive, the integers `e1`, `e1`+`e3`, `e1`+2*`e3`, ... that are less than or equal to `e2`. If `e3` is negative, the integers `e1`, `e1`+`e3`, `e1`+2*`e3`, ... that are greater than or equal to `e2`. Exception if `e3` is 0. * `domain()` : value of type `Domain[n]`, the Cartesian product of the n ranges, with dictionary order Arrays (which are also Sequences) * `array(T,)`: value of type `Array[T, n]`, a literal array * `array(T,n,e)`: value of type `Array[T,n]` in which each of the n elements is `e` * `asub(e1,e2)` : array subscript expression. * Note that `e1` must have array type, not pointer type. (This is different from C.) If `e1` has pointer type, use `deref(padd(e1, e2))` instead. * `seq_add(a,e)` : array obtained by adding element e to the end of the array. Original array a is not modified. * `seq_append(a1,a2)` : array obtained by concatenating two arrays. Original array not modified. * `seq_remove(a,i)` : array obtained by removing element at position i from a. Original array a not modified. * `seq_insert(a,i,x)` : array obtained by inserting an element x at position i in a. * `seq_length(a)` : returns length of array a (an `Integer`) * `bitv_and(e1, e2)`, `bitv_or(e1, e2)`, `bitv_xor(e1, e2)`, `bitv_comp(e1)` : bit-wise operations * arguments are arrays of booleans of the same length * `bitv_shiftl(a,n)` : left shift of a bit vector * `a` is a bit vector, i.e., array of boolean * `n` has integral type and is nonnegative * exception if `n` is greater than the length of `a` * new spaces created are filled in with `false` * `bitv_shiftr(a,n)` : right shift of a bit vector * `int_to_bitv(len, val)` : returns a bit-vector (array of boolean) of length `len` from `val` * `val` must be a non-negative integer value from an integral type * `len` must be large enough to contain the base-2 expansion of `val` * `bitv_to_int(a,t)` : converts a bit vector to an integer * `a` is a bit vector, i.e., an array of boolean * `t` has type `Dytype` and evaluates to an integral type large enough to hold the result of converting `a` to a non-negative integer Tuples * `tuple(S,)` : value of tuple type `S` (tuple literal) * `tsel(e1,i)` : tuple selection of component i of e1. i must be a literal natural number. Unions * `union_inj(U,i,x)` : `x` is in `Ti`, result is in the union type `U` * `union_sel(U,i,y)` : `y` is in `U`, result is in `Ti` (or exception if `y` is not in image of injection from `Ti`) * `union_test(U,i,y)` : `y` is in `U`; determines if `y` is in the image under injection of `Ti`. Returns a `Bool`. Pointers and Memory * `null` : value of type `Pointer` * `deref(e)` : pointer dereference * `addr(e)` : address-of operator * `padd(e1,e2)`: pointer addition. * `e1` has pointer type and `e2` has an integer type or range type. * If `e2` has integer type the result has pointer type. Otherwise, the result has `Mem` type. (**BETTER TO USE A DIFFERENT FUNCTION?**) * `psub(e1,e2)`: pointer subtraction * Both args must have same type `Pointer[T]` for some `T`. * Result has type `Integer` * `mem_reach(ptr)`, where `ptr` is an expression with a pointer type. * This represents the set of all memory units reachable from `ptr`, including the memory unit pointed to by `ptr` itself. * `mem_union(mem1,mem2)`, where `mem1` and `mem2` are expressions of type `Mem`. * This is the union of the two memory sets. * `mem_isect(mem1,mem1)` : set intersection * `mem_comp(mem1)` : set complement (everything not in `mem1`) * `mem_slice(a,dom)` * where `a` is an expression of array type and `dom` is an expression of `Domain` type. * The dimension of the array must match the dimension of the domain. This represents all memory units which are the cells in the array indexed by a tuple in `dom`. Scopes and Processes * `root`, `here`, `scope_null` : values of type `Scope` * `self`, `proc_null` : values of type `Proc` * `scopeof(e)`: scope of left-hand-side expression `e` * `scope_eq(e1,e2)` : equality of two dyscopes * `scope_lt(e1,e2)` : is first scope a proper subscope of second? * `scope_lte(e1,e2)`: is first scope a subscope or equal to second? Types * `new(t)` : new unconstrained value of `Dytype` t. If `t` has type `Dytype[T]`, then this expression has static type `T`. * `typeof(e)` : returns the value of type `Dytype` representing the value type of `e`. * If `e` has static type `T` then this expression has static type `Dytype[T]`. * `subtype(t1,t2)`: is `t1` a subtype of `t2`? Both args have type `Dytype`. Returns a `Bool` * `sizeof_type(t)` : the size of the dynamic type `t`; return an `Integer` * `cast(e,t)` : casts `e` to a value of dytype `t`. * If `t` has type `Dytype[T]`, this expression has static type `T` and the value will have dynamic type `t`. * A cast can only take place when there is no change to the underlying value in the concrete semantics. It is only for changing the type. * One can cast from any integral type to another (as the values in all cases are integers), but there might be an exception if the value is not in the range of the targeted type. * One can cast any `Float` value to a `Real` value, but not the other way. (To convert from a `Real` to a `Float`, use, e.g., `round`.) * One cannot cast from one `Float` type to another `Float` type (unless the float parameters are exactly the same). To do this, one could instead use `round`. * If `t1` is a subtype of `t2`, one can always cast from `t1` to `t2` (though this is rarely necessary, since an expression of type `t1` can be used wherever one of type `t2` is expected). One can also cast from `t2` to `t1`, but a runtime exception will be thrown if the value being cast does not already belong to `t1`. For example, one might cast from `Domain` to `Domain[n]`, or from `Array[Real]` to `Array[Real, n]`. * One cannot cast between a Herbrand type and any other type. Instead, use functions `eval` and `herbrand`. Other expressions * `"x"` : x is an identifier, naming either a type, variable, or function * `sizeof_expr(e)` : the size of the value of expression `e`; return an `Integer` * `ite(e1,e2,e3)`: if-then-else (conditional) expression, like `e1?e2:e3` in C. * `e1` must have `Bool` type. `e2` and `e3` must have the same static types, which is the static type of the result. * `invoke(e0,)` : function call * a function invocation where `e0` must evaluate to either an abstract or pure system function == The Primitive Statements == * `ASSERT e, "msg", ...;` : assertion with message * `ASSUME e;` * `ASSIGN e1,e2;` * `CALL f, ;` and `CALL e, f, ;` * call to a function which is not abstract and is not a pure system function. The first form has no left-hand-side. The second form assigns the result returned by the call to `e`. * `SPAWN f, ;` and `SPAWN e, f, ;` * `WAIT e;` * `WAITALL e, n;` : wait on all procs in a list * `e` has type `Pointer[Proc]` and `n` has an integral type, the number of procs in the list * `ALLOCATE e, s, t, e0;` * `e` has static type `Pointer[T]`, for some static type `T` * `s` has type `Scope` * `t` has type `Dytype[T]`, for the same static type `T` * `e0` has an integral type. * Allocates `e0` objects of type `t` on the heap of `s`, returning pointer to first element in `e`. * To translate the C `malloc` you first need to figure out the type of the elements being malloced. If the argument to malloc is `n`, then you first need to insert an assertion `eq(mod(n, sizeof_type(t)), 0)`, and then `ALLOCATE e, h, t, div(n, sizeof_type(t))`. * `FREE p;` * `EVAL e;` : evaluates an expression `e` * useful only because `e` might contain exceptions (e.g., array index out of bound, division by zero) * `NOOP;` or `NOOP LABEL;` * a no-op can take an optional label, which is one of the following: * TRUE_BRANCH_IF: to enter the true branch of a CIVL-C if-else statement; * FALSE_BRANCH_IF: to enter the false branch of a CIVL-C if-else statement; * LOOP_BODY_ENTER: to enter the body of a loop; * LOOP_BODY_EXIT: to exit from the body of a loop. * `RETURN;` and `RETURN e;` * `ATOMIC_ENTER;` * `ATOMIC_EXIT;` * `PARSPAWN p, d, f;` : spawn a sequence of procs, one for each element in a domain * `p` has type `Pointer[Proc]`, `d` has `Domain` type and `f` has `Function` type. * the new procs are entered into `p[0]`, `p[1]`, ..., `p[d-1]`, where `d` is the size of the domain. Hence `p` must point into an array of `Proc` big enough to hold `d` elements. * the function `f` must have the following signature: it consumes `n` `Integer`s, where `n` is the dimension of the `Domain` `d`. It returns nothing. * `NEXT , dom, flag;` : move to next element in domain * `i`,`j`,... have integral types * `dom` has type `Domain[n]`, where `n` is the number of variables in the sequence `i`,`j`,... * `flag` has type `Bool` * if `flag` is `false` in the pre-state, this indicates you want the first element of the domain * if `flag` is `true` in pre-state, this indicates you want the next element after current `` * `flag`=`false` in post-state indicates there is no next element * `flag`=`true` in post-state indicates the next element is in `` Notes on translation: The CIVL-C code `$parfor(int i,j: dom) S;` is translated into the following: {{{ var p:Array[Proc]; fun f(i: Integer, j: Integer) { S; } PARSPAWN addr(asub(p,0)), dom, f; WAITALL addr(asub(p,0)), domsize(dom); }}} The CIVL-C code `$for(int i,j: dom) S;` is translated into the following: {{{ { var i:Integer; var j:Integer; var flag:Bool; flag=false; L0: NEXT , dom, flag; when not(flag) do NOOP; goto L1; S; goto L0; } L1: }}} == Function Definitions == The general form of a function definition: {{{ fun[options] foo(x:Integer, ...) : Real // contract clauses (optional): Requires expr; Ensures expr ; MPI_requires expr ; MPI_ensures expr ; Behavior name: Assigns expr ; Reads expr ; Depends actions ; { // all of the following are missing for a system function... // types defns: type T=tuple[]; type U=blah; // variable decls: var x : T; var y : U; // body: ASSIGN x, 0; ... } }}} To specify a function returning nothing, simply leave out the return type part. For example: `fun foo(x:Intger, ...) {...}`. Example of declaration of a system function defined in library Concurrency: {{{ fun[package="edu.udel.cis.vsl.civl.lib", lib="Concurrency"] f(x: Real, b: Bool): Float[32,33] {} }}} Function modifiers that may be placed in the brackets after `fun`: * `state_f` : the value returned by the function is a deterministic function of the inputs (so cannot depend on the state in any way), and furthermore the function has no side-effects. So this is just like a mathematical function. (This is intended to be used for system functions, but can it also be used for non-system functions?) * `abstract`: an uninterpreted pure function. The result of a function call on such a function is a symbolic expression "`f(arg1, ...)`". There is no definition for such a function, not even a system definition. * `atomic`: function definition is atomic, and it never blocks. A function call to such a function occurs in a single atomic transition. It is slightly different than a function in which the entire body is atomic, because in the latter case the call and return happen as two separate transitions. * Since every system function is atomic, this option is only useful for non-system functions. * `package="..."`: the package where a system function is implemented * `lib="..."`: function is a system function defined in specified library * `guard="...":` the name of the guard function * `main` : this is the program entry point (global scope only) System functions: * A function declaration which contains an option `lib=...` is a system function * A system function will have no body. It may have any number (including 0) of contract clauses. * If the system function is called anywhere in the program, it must be defined by providing Java code in an Enabler and Executor. Failure to do so will result in an exception. * A system function may modify only memory it can reach. This includes allocating new data on heaps it can reach. This modifiable memory can be further limited by the contract. * A system function may have a guard, which is another function taking the same types of inputs, but returning `Bool`. Example of a declaration of a system function with guard. {{{ fun g(x:Real):Bool { ... } fun[lib="Blah",guard="g"] f(x:Real):Integer {} }}} Function Contracts: * [wiki:Function_Contracts] == Program == A program consists of a sequence of type definitions, followed by a sequence of variable declarations, followed by a sequence of function definitions. This is called the "global scope." The variable declarations in the global scope may use the options * `input` * `output` There must be exactly one global function definition with option `main`. This function must consume no inputs and return an `Integer` (check this). == Semantics == Semantics issues * define every possible cast * define every possible +, etc. * define every kind of pointer value and casts between pointer types * casts between pointer and integer types? === Values === === Transitions === == Libraries == [wiki:Library_IR] Which libraries require system functions? pointer, ... Should these be called part of the language, or not? == Questions == * fix notation for contracts * quotes around variables in declarations? Could be awkward but might simplify parsing. * quotes around function names in definitions, prototypes? Could be awkward. * is there ever a need for a function prototype? Could variable decl notation be used instead? NO, variable decls are different, those are state variables. * should we leave our parameter names in abstract and system functions? They are not needed for anything. THEY ARE NEEDED FOR CONTRACTS, sometimes, and maybe for documentation. Consider making them optional. * need a way to have contracts, including POR contracts, as part of the language. * change the syntax so it is a subset of CIVL-C