The CIVL-IR language. A program in this language is also known as a "CIVL model". Properties of the language: * the language is not intended to be written by humans; it is an intermediate form constructed by CIVL. However it should be readable to help debug things * a CIVL-IR program represents a guarded-transition system explicitly * as in CIVL-C, there are functions, scopes, and functions can be defined in any scope * all blocks (including a function body) consist of the following elements, in this order: * a sequence of type definitions * a sequence of variable declarations with no initializers * a sequence of function definitions * a sequence of labeled statements. Each clause in the labeled statement is a `\when` statement with some guard and a primitive statement, followed by a `\goto` statement * an array is declared without any length expression. When it is initialized it can specify length. * curly braces are used only to indicate scopes, as in { // new scope ... } * parentheses are used to indicate function invocations, as in \add(x,y) * angular brackets are used to delimit tuples or sequences * square brackets are used to delimit parameters in types Example: {{{ f(u:Integer, a:Array[Real]): Integer { x: Real; y: Real; z: Float[16,23]; L1 : when (g1) stmt1; goto L2; when (g2) stmt2; goto L3; { // begin new scope x: Real; L2 : when (g3) stmt3; goto L4; ... } // end new scope ... } // etc. }}} Example: {{{ { int x=3*y; int a[x+1]; } }}} translates to: {{{ { x: Integer; a_t : Dytype[Array[Integer]]; a: Array[Integer]; L1: when (\true) assign x, \mul(3,y); goto L2; L2: when(\true) assign a_t, \dytype(Array[Integer, \add(x,1)]); goto L3; L3: when (\true) assign a, \new(a_t); goto L4; L4: } }}} == Types == The types are: * `Bool` : boolean type, values are `\true` and `\false` * `Proc` : process type * `Scope` : scope type * `Char` : character type * `Bundle` : type representing some un-typed chunk of data * `Heap` : heap type * `Range` : ordered set of integers * `Domain` : ordered set of tuples of integers * `Domain[n]`, n is an integer at least 1; subtype of `Domain` in which all tuples have arity n. * `Enum` types. * **different from integers or like C?** * `Integer` : the mathematical integers * `Int[lo,hi,wrap]` * `lo`, `hi` are integers, `wrap` is boolean * finite interval of integers `[lo,hi]`. If `wrap` is true then all operations "wrap", otherwise, any operation resulting in a value outside of the interval results in an exception being thrown. * **Do we want to allow `lo` and `hi` to be any values of type `Integer`, which means they are dynamic types, like complete array types?** * `HerbrandInt` : Herbrand integers. Values are unsimplified symbolic expressions. * `Real` : the mathematical real numbers * `Float[e,f]`, e, f are integers, each at least 1. **Same question for e and f as for lo and hi.** * IEEE754 floating point numbers * `HerbrandReal` : Herbrand real numbers. Values are unsimplified symbolic expressions. * `Tuple[]`: a tuple type, the Cartesian product of T0, T1, ... * **What about bit-widths?** * `Union[]`: union type, the disjoint union of T0, T1, ... * `Array[T]` : arrays of any length whose elements belong to T * `Function[,T]` : functions consuming T0,T1,... and returning T. T can be `void` to indicate nothing is returned. * `Mem` : type representing a memory set. May be thought of as a set of pointers. * `Pointer` : all pointers, a subtype of `Mem` * `Pointer[T]` : pointer-to-T, subtype of `Pointer` * `Dytype` : the set of all dynamic types * `Dytype[T]`: dynamic types refining T. Values of this type represent dynamic types that refine T. For example `\dytype(Array[Integer,24])` has type `Dytype[Array[Integer]]` Type facts: **Static types** are the types assigned to variables in a program statically. A static type contains no values anywhere in the type tree. That is, there is no array length expression in the type. These are the types that are used in declarations. Each variable is declared to have some static type. **Value types** are the types associated to values. They include all the static types plus possible length expressions. A value type refines a static type if when you delete the values from the value type you get the static type. A **type name** is a syntactic element that names a (static or value) type. Examples of type names include `Array[Integer]` and `Array[Integer,24]`. The expression `\new(t)` takes a Dytype and returns the initial value for an object of that type. The initial value of `Integer` and other primitive (non-compound) types is "undefined". The initial value of `Array[Integer]` is an array of length 0 of `Integer`. The initial value of `Pointer[Real]` is the undefined pointer to `Real`. The initial value of `Array[Real, 10]` is the array of length 10 in which each element is undefined. In general, the initial value of an array of length n is the sequence of length n in which every element is the initial value of the element type of the array. The initial value of a tuple type is the tuple in which each component is assigned the initial value for its type. Example: the C code {{{ int n = 10; struct S { int a[n]; }; struct S x1; n=20; struct S x2; }}} may be translated as {{{ typedef S=Tuple[]; n: Integer; S_d: Dytype[S]; x1: S; x2: S; L0: when (\true) assign n, 10; goto L1; L1: when (\true) assign S_d, \dytype(Tuple[); goto L2; L2: when (\true) assign x1, \new(S_d); goto L3; L3: when (\true) assign n, 20; goto L4; L4: when (\true) assign x2, \new(S_d); goto L5; L5: }}} == Expressions == In the following list of expressions, `e`, `e0`, `e1`, etc., are expressions. `T` is a type name. `t` is an expression of type `Dytype`. * Logical * `\true`, `\false` : literal values of type `Bool` * `\not(e)` : logical not * `\and(e1, e2)`, `\or(e1, e2)`: logical and/or operation * `\eq(e1, e2)`, `\neq(e1, e2)`: equality/inequality test * `\forall , e1, e2` : universal quantification. For all i1 in type T1, i2 in type T2, ...: if e1 holds, then e2 holds. The only reason for having two expressions e1 and e2 is possible side-effects (exceptions) in e2 if e1 does not hold. For example, e1 can be x!=0, and e2 can safely divide by 0. * `\exists , e1, e2`: existential quantification. There is some i1 in type T1, i2 in type T2, ..., such that e1 holds and e2 holds. * Numeric * 123, -123, 3.1415, etc. : values of type `Integer`, `Int`, `Real`, `Float`. **NEED TO BE MORE SPECIFIC** * `\add(e1, e2)` : numeric addition. * `e1` and `e2` have the same numeric type. Note that there are no "automatic conversions" as there are in C. If the original expressions have different types, explicit casts must be inserted. * `\sub(e1, e2)` : subtraction * `\mul(e1, e2)` : multiplication * `\div(e1, e2)` : division * If both are integer types, the result is integer division. Otherwise it is real division. Need to define what happens for negative integers. * `\mod(e1, e2)` : integer modulus * `\neg(e)` : negative * `\lt(e1, e2)`, `\lte(e1, e2)`: less than/less than or equal to * Characters and Strings * 'a', 'b', ... : Char values. **UNICODE?** * `"abc"` : string literals: value of type `Array[Char, n+1]`, where n is the length of the string (the last element is the character `\0`) * Ranges and Domains * `\range(e1,e2,e3)` : value of type `Range` consisting of the integers e1, e1+e3, e1+2*e3, ... that are less than or equal to e2. * `\domain()` : value of type `Domain[n]`, the ordered Cartesian product of the n ranges (dictionary order) * `\hasnext(dom, )`: an expression of boolean type, testing if the domain `dom` contains any element after `` * Arrays * `\array(T, )`: value of type `Array[T, n]`, a literal array * `\array(T, n, e)`: value of type `Array[T,n]` in which each of the n elements is `e` * `\asub(e1, e2)` : array subscript expression. Note that `e1` must have array type, not pointer type. (This is different from C.) If `e1` has pointer type, use `\deref(\padd(e1, e2))` instead. * `\aslice(a, dom)`, where `a` is an expression of array type and `dom` is an expression of `Domain` type. The dimension of the array must match the dimension of the domain. This represents all memory units which are the cells in the array indexed by a tuple in `dom`. * `\bit_and(e1, e2)`, `\bit_or(e1, e2)`, `\bit_xor(e1, e2)`, `\bit_comp(e1)` : bit-wise operations: arguments are arrays of booleans * Tuples * `\tuple(S, )` : value of tuple type `S` (tuple literal) * `\tsel(e1, i)` : tuple selection of component i of e1. i must be a literal natural number. * Pointers and Memory * `NULL` : value of type `void*` * `\deref(e)` : pointer dereference * `\addr(e)` : address-of operator * `\padd(e1, e2)`: pointer addition. `e1` has pointer type and `e2` has an integer type or range type. If `e2` has integer type the result has pointer type. Otherwise, the result has `Mem` type. * `\psub(e1,e2)`: pointer subtraction * `\region(ptr)`, where `ptr` is an expression with a pointer type. This represents the set of all memory units reachable from `ptr`, including the memory unit pointed to by `ptr` itself. * `\mem_union(mem1, mem2)`, where `mem1` and `mem2` are expressions of type `Memory`. This is the union of the two memory sets. * `\mem_intersect(mem1, mem1)` : set intersection * `\mem_comp(mem1)` : set complement (everything not in `mem1`) * Scopes and Processes * `\root`, `\here` : values of type `Scope` * `\self`, `\proc_null` : values of type `Proc` * Other * variables * `\sizeof_type(t)` : the size of the dynamic type t; `Integer` type * `\sizeof_expr(e)` : the size of the value of expression `e`; `Integer` type * `\new(t)` : new (default) value of `Dytype` t * `\defined(e)` : is `e` defined? `Bool` type * `\cast(e, T)` : casts `e` to a value of the named type * need to list all of the legal casts and what they mean exactly * cast of integer to array-of-boolean, and vice-versa? * **Instead of casts would it be better to have explicit functions for each legal kind of cast?** * `\ite(e1, e2, e3)`: if-then-else (conditional) expression, equivalent to `e1?e2:e3` in C. * `e0(e1,...,en)` : a function invocation where `e` must evaluate to either an abstract or pure system function Notes * unlike C, there is no "array-pointer pun". If an array `a` needs to be converted to a pointer, you must use \addr(\asub(a, 0))`. == The Primitive Statements == * Assign: `assign e1,e2;` * Call: `call f, ;` and `\call e, f, ;` * call to a function which is not abstract and is not a pure system function * Spawn: `\spawn f, (e1,...,en);` and `\spawn e, f, (e1,...,en);` * Wait: `\wait e;` * Wailtall: `\waitall e, n` where `e` is a pointer to a process reference and `n` is the number of processes to be waited for * Allocation * `\allocate e, h, t, e0;`, where * `e` has type `Pointer` * `h` has type `Heap` * `t` has type `Dytype` * `e0` has integer type. * Allocates `e0` objects of type `t` on heap `h` * To translate the C `malloc` you first need to figure out the type of the elements being malloced. If the argument to malloc is `n`, then you first need to insert an assertion `\eq(\mod(n, \sizeof_type(t)), 0)`, and then `\allocate e, h, t, \div(n, \sizeof_type(t))`. * Free: `\free p;` * Expression statement: `e;`, where `e` is side effect free except that it might contain error/exception (e.g., array index out of bound, division by zero); * Noop: `;` * **Is there a need to add annotations for "true" or "false" branch, etc.?** If so, we can just make these parameters to the Noop. * Return: `\return;` and `\return e;` * Atomic_enter: `\atomic_enter;` * Atomic_exit: `\atomic_exit;` * Parfor_spawn: `\parspawn p, d, f;` where `p` is pointer to process reference, `d` has `Domain` type and `f` has `Function` type. * Domain iterator: `\next dom, ` updates `i`, `j`, ... to be the value of the inter tuple in `dom` after `` * For_dom_enter (for domains): `\for_enter dom;` == Declarations and Function Definitions == Function prototypes are considered to be declarations similar to variable declarations. Example of declaration of a function: {{{ f(x: Real, b: Bool): Float[32,33]; }}} Additional modifiers that may be placed on any of above: * `\pure` : the function has no side effects, but may be nondeterministic * `\abstract`: function is a pure, mathematical function: deterministic function of inputs * `\atomic_f`: function definition is atomic, and it never blocks System functions: * A function declaration which is not abstract and for which no definition is provided is a system function. * If the system function is called anywhere in the program, it must be defined by providing Java code in an Enabler and Executor. Failure to do so will result in an exception. * A system function may modify any memory it can reach. This includes allocating new data on heaps it can reach. * A system function may have a guard. Example of a declaration of a system function with guard. {{{ g(Real x, Bool y; Bool) { ... } f(Real x, Bool y; Integer) \guard {g}; }}} == Function Contracts == * event set expressions: {{{ EventSetExpression : \read(MemorySetExpression) | \write(MemorySetExpression) | \access(MemorySetExpression) | \calls(FunctionCallExpression) | \nothing | \everything | ‘(’ EventSetExpression ‘)’ | EventSetExpression + EventSetExpression | EventSetExpression - EventSetExpression | EventSetExpression & EventSetExpression }}} * depends clause: `\depends [condition] { event1, event2, ...}` * Example: {{{ \depends { \access(n) - (\calls(inc(MemorySetExpression)) + \calls(dec(MemorySetExpression))) } }}} * **absence of \depends clause**: * assigns-or-reads clause * assigns clause: `\assigns [condition] {memory-list}` * reads clause: `\reads [condition] {memory-list}` * `\reads {\nothing}` implies `\assigns {\nothing}` * `\reads {\nothing}` is equivalent to: `\reads {\nothing} \assigns {\nothing}` * `\assigns {X}` where `X != \nothing`, implies `\reads {X}` * `\assigns {X}` is equivalent to:` \assigns{X} \reads{X}` * absence: * absence of `\reads` clause: there is no assumption about the read access of the function, i.e., the function could read anything * absence of `\assigns` clause: similar to the absence of `\reads` clause * `\reads/\assigns {\nothing}` doesn’t necessarily means that the function never reads or assigns any variable. The function could still reads/assigns its “local” variables, including function parameters and any variable declared inside the function body. * For an independent function which has `\depends {\nothing}`, usually we also need to specify `\reads{nothing}`, for the purpose of reachability analysis. e.g., {{{ /* Returns the size of the given bundle b. */ \bundle_size(Bundle b; Int) \depends {\nothing} \reads {\nothing} ; }}} * Example of a function declaration with contracts: {{{ \atomic_f sendRecv(Int cmd, Pointer buf; Int) \depends [\eq(cmd, SEND)] {\write(buf)} \depends [\eq(cmd, RECV)] {\access(\deref(buf))} \assigns [\eq(cmd, SEND)] {\nothing} \assigns [\eq(cmd, RECV)] {\deref(buf)} \reads {\deref(buf)} { L0: when (\eq(cmd, SEND)) send(\deref(buf), ...); goto L1; when (\eq(cmd, RECV)) \deref(buf):=recv(...); goto L1; when (\and(\neq(cmd, SEND), \neq(cmd, RECV))) ; goto L1; L1: } }}} == Program == A program consists of a sequence of global variable declarations, which may include declarations annotated by `$input` and `$output`, followed by a sequence of function declarations and definitions. == Semantics == Semantics issues * define every possible cast * define every possible +, etc. * define every kind of pointer value and casts between pointer types * casts between pointer and integer types? === Values === === Transitions === == Libraries ==