wiki:IR2

Context Navigation

Version 87 (modified by siegel, 5 years ago) ( diff )
--

CIVL IR v2.0

Language principles

CIVL-IR is a subset of CIVL-C. A CIVL-IR program is a CIVL-C program, and has the same semantics.
CIVL-IR should be a small language. Whenever there is a way to express a new construct using other existing constructs, the preferred approach is to not add the new construct to the language.
CIVL-IR is to be as close as possible to a language representation of a CIVL "Model". This is the lowest-level representation of a CIVL program, one appropriate for efficient (symbolic) execution and/or model checking.
The basic representation of a function is as a "program graph"---a directed graph in which the nodes are locations and edges are atomic statements.
Expressions are side-effect free.
There are no automatic conversions. All conversions must be by explicit casts or other functions. Operations such as numeric addition (add) require that both operands have the exact same type.
In general, symbols can be used before they are defined, as long as they are in scope. For example, a function f can call g, even if g is defined after f in the same scope. There is no need for a "prototype". Similarly, a type definition can refer to a type defined later, or can refer to itself in its own definition.
Many of the core operations are implemented as functions in the standard library. This is similar to C. The header files for the standard library are an integral part of the language.
Evaluation of expressions and execution of statements should always result in a value or state. There is no concept of exception or error. Hence anything that should be checked should be checked by explicit assertions.

Questions

Do variables have default initial values?
- No, a declared variable must be initialized before it is used. Otherwise, the behavior is undefined.
How do you initialize a variable?
- By assigning a value to it. For example n=$new($int); will assign n an arbitrary integer, while n=0; will assign the integer 0 to n.
How is an array allocated?
- An array variable a is declared with a decl such as T a[];, and then a statement such as a=$new_array(n, T); will assign to a a new array value for an array of length n of elements of type T in which all entries are undefined. (Alternatively, a=$new(T[n]) will create an array of arbitrary defined values.) For heap-allocation, a pointer is declared with a decl such as T * p;, and a heap variable is also declared somewhere with a decl such as $heap heap; and then a statement such as p = $alloc(&heap, n, T); will add a new object to the heap and return a pointer to the first element. An $alloc-ed object can be deallocated with $free(p);.
Is there an "array-pointer pun", as in C?
- No, if you want a pointer to element 0 of an array, you have to explicitly write something like &a[0].
How to you translate between sequences and arrays?
- There are functions in the standard library to do that.
Can you make types values? (reification)
- For now, no. There are several statements and expressions which do take a type name as an argument, but there is no way for one to define new expressions or functions that do that.
How do you monitor reads and writes?
- There are functions in the standard library (mem.h) for this.
Is there a type for $state ?
- Coming soon.
How do you model malloc when the element type is not known?
- For now, you can't. [Note to self: consider creating a type like $byte for this purpose?]
How do you iterate over a domain?
- Coming soon.
Bitvectors?
- Coming soon.

Grammar

program: typedef* decl* function-definition+ ;
typedef:
  type-param-list?
  ( struct ID '{' decl* '}'
  | union ID '{' decl* '}'
  | 'typedef' type-specifier declarator
  )
  ';'
  ;
decl: type-param-list? qualifier* type-specifier declarator contract-clause* ';' ;
function-definition: type-param-list? qualifier* type-specifier declarator contract-clause* block ;
qualifier:
    '$input'  /* input variable; only for global decls */
  | '$output'  /* output variable; only for global decls */
  | '$abstract_f'  /* abstract function; only for function decls */
  | '$pure_f'  /* function is a mathematical function of its parameters */
  | '$state_f'  /* function is a mathematical function of the state */
  | '$atomic_f'  /* function invocations take place atomically */
  | '$system_f' '<' STRING ',' STRING '>'  /* function is defined in system code elsewhere */
  ;
block: '{' typedef* decl* function-definition* statement* '}' ;
statement: block | simpleStmt | chooseStmt ;
simpleStmt: label? dependency? guard? primitiveStmt gotoStmt? ;
chooseStmt: label? '$choose' '{' simpleStmt* '}' ;
label: ID ':' ;
guard: '$when' '(' expr ')' ;
dependency: '$depends_on' '(' expression-list? ')'
expression-list: expr (',' expr)* ;
gotoStmt: 'goto' ID ;
type-param-list: '<' ID (',' ID)* '>' ;
INT: ... /* integer constant */
ID: ... /* identifier */
STRING: ... /* string literal in double quotes */

Notes

A typedef can be used to define parameterized types. The type parameters are listed between angular brackets preceding the typedef. When the identifier is later used, it must be used with angular brackets and actual type names to replace the type parameters.
A declaration declares a variable to have either an object type or a function type. (An object type is any type that is not a function type.)
The optional type-param-list in a declaration may be applied only to a declaration of function type. It declares a "generic" function. When the function is called (or spawned), the call must include the angular brackets with a list of type names that replace the type parameters used in the declaration or definition of the function.
A variable of function type represents either an abstract or a system function. The declaration of such a variable must use either the qualifier $abstract_f or $system_f. Moreover, those two qualifiers can only be used in a declaration of function type.
An abstract function represents a new uninterpreted pure function.
A system function has no definition in the program, but is instead defined elsewhere (for example, in C or Java code). Such a function will always be executed atomically. The first string specifies a path (e.g., Java package) to the library containing the function, the second is the name of the library. These two Strings should be enough to tell CIVL where to find the system definition of the function.
$atomic_f can only be used in a function definition. It indicates that a defined function is to behave atomically, i.e., every call to such a function will be executed as if in atomic region. (An abstract or system function must necessarily be atomic.)
$pure_f can be applied to a system function or defined function only. (An abstract function is necessarily pure.) The use of $pure_f indicates that the function has no side-effects, and the value returned in a mathematical function of the arguments. If the function is not actually pure, the behavior is undefined.
$state_f can be applied to a system function or defined function only. It declares that the function has no side-effects, and the value returned is a mathematical function of the state in which the call occurred; i.e., the value returned may depend on the arguments in the call, the values of global variables, or any other component of the state. But, if called twice from the same state, it will always return the same value. If the function does not actually have this property, the behavior is undefined. Any $pure_f function is necessarily a $state_f, so at most one of these two qualifiers can occur in a declaration or definition.
Contract clauses can occur only with system functions and defined functions (not with abstract functions, not with variables).
A program must contain a function definition for a function named main.
$input and $output can be used only on global variable declarations (not on function definitions, not in block scope).
The expr in a guard must have type $bool.

Contracts

contract-clause:
    '$assigns' '(' expression-list? ')'   /* frame condition */
  | '$requires' '(' expr ')'  /* precondition */
  | '$ensures' '(' expr ')'  /* postcondition */
  | '$depends_on' '(' expression-list? ')'  /* dependency specification */
  | '$when '(' ID ')'  /* guard clause */
  ;

Notes

A guard clause can occur only in the declaration of a system function. The identifier is the name of the function that will be used as the guard in all calls to the system function. The guard function must have the same input signature as the system function, but must return $bool. Any call to the system function will block unless the guard function returns $true on the arguments used in the call. The execution of the system function and the return of "true" from the guard function will occur atomically, i.e., no state change will occur between the return of true and the call of the system function. The guard function must be a $state_f function; in particular, it must be side-effect free. The guard function may be a system function or a defined function; it cannot be abstract. If omitted, the guard is understood to be "true", i.e., a call to the system function will never block.
The expressions in a dependency clause must have pointer type. This declares that the following transition depends only on the objects pointed to.

Types

type-specifier:
    ID type-args?  /* type parameter or (possibly parameterized) typedef use */
  | '$int'  /* mathematical integers */
  | '$bool'  /* boolean type ($true and $false, unrelated to integers) */
  | '$char'  /* character type (Unicode characters, unrelated to integers) */
  | '$real'   /* mathematical reals */
  | '$float32'  /* IEEE 32-bit floating-point numbers */
  | '$float64'  /* IEEE 64-bit floating-point number */
  | '$herbrand' '<' type-name '>'  /* Herbrand type of non-Herbrand numeric type T */
  | '$proc'  /* process type */
  | '$bundle'  /* bundle type for sequence of any type (same as seq<T>?) */
  | '$heap'  /* heap type, for dynamic allocation */
  | '$mem'  /* set of memory locations */
  | 'struct' ID  ('<' type-list '>')?  /* structure type */
  | 'union' ID  ('<' type-list '>')?  /* union type */
  | 'void'   /* use as pointer element-type and return type of a function */
  | '$seq' '<' type-name '>'  /* sequence of T */
  | '$set' '<' type-name '>'  /* set of T */
  | '$map' '<' type-name ',' type-name '>'  /* partial function from T1 to T2 */
  | '$rel' type-args  /* relation: set of n-tuples with specified component types */
  ;
declarator '*'* direct-declarator ;
direct-declarator:
    ID  /* variable being declared */
  | direct-declarator '[' expr? ']'  /* array of ... */
  | direct-declarator  '(' parameter-type-list? ')'  /* function consuming ... and returning ... */
  | '(' declarator ')'
  ;
type-name: ... /* same as declarator but without the ID */
type-args: '<' type-name (',' type-name)* '>';
parameter-type-list: type-specifier declarator (',' type-specifier declarator)* ;

Notes

Sequences, sets, maps, relations, and $mem objects are immutable. An assignment using objects of this type creates a new copy of the object, just as with primitive types like $int.
Arrays are similar to sequences. The main differences are as follows:
- An object (i.e., a variable or component of an object) of array type is initialized once, then will never be assigned to again. Hence there cannot be a statement of the form a=... where a has array type.
- After initialization, an object or array type can appear only as the first (left) argument to the subscript operator [] or as the argument to the $length operator.
- Arrays are mutable. As in C, the left hand size of an assignment may have the form a[i], where a is an object of array type. Sequences are immutable.
- Elements of an array are addressable, i.e., one can form a pointer such as &a[i]. This is not possible with sequences, sets, maps, or relations---there is no way to have a pointer to any component of such a type.
- A function may neither consume nor return an object of array type. There is no such restriction for sequences.
The difference between the function type and map type: a function is really a procedure in the language, so it can modify the state as well as return a value. This is like the C notion of "function". A map is a logical partial function: it is defined on some subset of the domain type, it will always "return" the same value on a given input, and reading it cannot modify the state.

Statements

An lvalue is an expression that represents an object, as in C. It is either a variable, an array element expression, a struct or union field expression, or dereference expression.

primitiveStmt:
    ';'  /* noop */
  | lvalue '=' expr ';'  /* assignment */
  | (lvalue '=')? expr type-args? '(' expression-list? ')' ';'  /* function call */
  | (lvalue '=')? '$spawn' expr type-args? '(' expression-list? ')' ';'  /* process creation */
  | '$wait' expr ';'  /* wait until p terminates */
  | 'return' expr? ';'  /* return from function call */
  | (lvalue '=')? '$alloc' '(' expr ',' expr ',' type-name ')' ';'  /* heap allocation */
  | '$free' expr ';'  /* frees something that was $alloc-ed */
  | (lvalue '=')? '$new_array' '(' expr ',' type-name ')' ';'  /* allocation of array of undefined values*/
  | '$atomic_enter' ';'  /* enter atomic region */
  | '$atomic_exit' ';'  /* exit atomic region */
  | '$yield' ';'  /* release atomic lock */
  | '$assert' expr (',' expr)* ';'  /* assertion with optional error message */
  | '$assume' expr ';'  /* assumption */
  ;

Notes

For function calls and spawns, the first expression shall have a functional type. That function must be a system or defined function (not an abstract function).
The first expression following $alloc has type $heap*. It is a pointer to the heap that will be modified by allocating the new memory. The second expression has type $int and is the number of elements being allocated. This is followed by the element type. The function returns a pointer to the first element of an array, similar to C's malloc. It is deallocated using function$free.

Expressions

expr:
  | ID  /* variable or function identifier */
  | '$true'
  | '$false'
  | INT
  | REAL
  | FLOAT
  | CHAR
  | STRING
  | 'NULL'
  | '$proc_null'
  | '(' type-name ')' expr  /* cast */
  | '(' type-name ')' '{' expression-list? '}'  /* concrete array, struct, $seq, $set, $mem */
  | '(' type-name ')' '{' expr-pair-list? '}'  /* concrete map */
  | '(' type-name ')' '$lambda' '(' '$int' identifier-list ')' expr  /* array literal, aka array lambda */
  | expr '+' expr  /* numeric or pointer addition */
  | expr '-' expr  /* numeric or pointer subtraction */
  | expr '/' expr  /* division */
  | expr '%' expr  /* modulo */
  | expr '&&' expr  /* logical and */
  | expr '||' expr  /* logical or */
  | expr '==>' expr  /* logical implication */
  | expr '==' expr  /* equality */
  | '!' expr  /* logical not */
  | expr '<' expr  /* less than */
  | expr '<=' expr  /* less than or equal to */
  | expr '[' expr ']'  /* array element access */
  | expr '.' ID  /* field access */
  | '(' expr ')'
  | '-' expr  /* negative */
  | '*' expr  /* pointer dereference */
  | '&' lvalue  /* address-of */
  | expr type-args? '(' expression-list? ')'  /* application of abstract function (and perhaps other functions?) */
  | $new(type-name)  /* returns a new arbitrary value of the given type */
  | '$forall' '(' decl expr? ')' expr  /* universal quantification */
  | '$exists' '(' decl expr? ')' expr  /* existential quantification */
  | expr '?' expr ':' expr  /* if-then-else expression */
  | '$length' '(' expr ')'  /* length of array */
  | '$defined' '(' expr ')'  /* does evaluation of expr result in a well-defined value? */
  | '$valid '(' expr ')'  /* does a pointer point to a memory location capable of holding a value of its base type? */
  ;
expr-pair-list: expr-pair (',' expr-pair)* ;
expr-pair: '{' expr ',' expr '}' ;

Notes

For float types, which rounding mode is used? Round-to-nearest?
Comparisons likes == and < return a $bool (not an int, like C)
explanation of $defined: a bit is associated to each memory location. Memory locations are created by declaring variables and allocation. Every variable initially holds an undefined value. Memory locations become defined by assigning to them defined values.
- Constants are defined values. This includes integer and real constants, and NULL.
- An operation returns a defined value if all inputs are defined and the inputs are acceptable for the specific operation.
- It is arguable about whether we should make division by 0, out of bound array indexes, etc., undefined values. They could be defined.
- Once an array is allocated with $new_array, the array itself is still undefined and all elements are undefined. However $defined($length(a)) will return $true.
- Once an object is heap allocated with p=$alloc(...), p becomes defined, but *p, *(p+1), ..., are all undefined as they hold uninitialized data.
- A pointer that points to the location one past the last element of an array or object is defined, but not valid. The same holds for NULL.
- If a pointer points to an object on the heap and that object is freed, the pointer becomes undefined. Similarly, if it points into an object that goes out of scope, the pointer becomes undefined.
- If an object is heap allocated then a pointer p into that object will be defined, but *p will be undefined (as the memory location pointed to by p has uninitialized data), but $valid(p) will return $true, indicating it is safe to write to that location.

Allowed Casts

$int -> $real
$float32 -> $real
$float64 -> $real
T1* -> T2*. How to check a cast is valid? What happens if it is not? If there are no exceptions, something must be returned. Perhaps undefined value.
- this is OK if one is a pointer to element 0 of an array, and the other is a pointer to the array
- if one is a pointer to the first member of a struct or union, and the other is a pointer to the struct or union
- these rules can be applied recursively. This needs to be fleshed out.
others?

Standard CIVL Library

`civlc.cvh`

Work in progress. Refer to existing library.

$system void $exit(void);
$system $bool $terminated($proc p);
$system int $choose_int($int n); // how is this different than $new($int)?

bundle.cvh

$int $bundle_size($bundle b); // number of elements in bundle or size in bytes?
$bundle $bundle_pack(void *ptr, int size);
void $bundle_unpack($bundle bundle, void *ptr);
void $bundle_unpack_apply($bundle data, void *buf, $operation op, int count, void *result);

`math.cvh`

Work in progress...

Rounding modes:

0 = to nearest
1 = upward
2 = downward
3 = toward zero.

$int $round($real x); // round to nearest integer
$int $floor($real x); // greatest integer less than or equal to
$int $ceil($real x); // least integer greater than or equal to
$float32 $round32($real x, $int mode);  // round to float32 (see modes above)
$float64 $round64($real x, $int mode);  // round to float64 (see modes above)
<T> T $abs(T x);  // absolute value (T must be a numeric type)
<T> T $pow(T x, $int n);  // x to the n-th power.  T must be a numeric type; n>=0.
<T> T $eval($herbrand<T> x);  // returns value of type T obtained by evaluating all the delayed operations
<T> $herbrand<T> $herbrandize(T x);  // trivial symbolic expression consisting of a single node

Is it OK to have parameterized functions that only work for certain T? Would it better to expand for each numeric type? E.g., absi, abs, abs32, abs64, ...

`mem.cvh`

Work in progress...

mem_reach(ptr), where ptr is an expression with a pointer type.
- This represents the set of all memory units reachable from ptr, including the memory unit pointed to by ptr itself.
mem_union(mem1,mem2), where mem1 and mem2 are expressions of type Mem.
- This is the union of the two memory sets.
mem_isect(mem1,mem1) : set intersection
mem_comp(mem1) : set complement (everything not in mem1)
mem_slice(a,dom)
- where a is an expression of array type and dom is an expression of Domain type.
- The dimension of the array must match the dimension of the domain. This represents all memory units which are the cells in the array indexed by a tuple in dom.

`seq.cvh`

<T> $int $seq_length( $seq<T> a ); // length of a
<T> T $seq_get( $seq<T> a, $int i ); // get element i of a
<T> $seq<T> $seq_set( $seq<T> a, $int i, T x ); // seq obtained by replacing element i of a with x
<T> $seq<T> $seq_subseq( $seq<T> a, $int start, $int stop ); // subsequence from start to stop-1
<T> $seq<T> $seq_add( $seq<T> a, T x ); // sequence obtained by adding element x to the end of a
<T> $seq<T> $seq_append( $seq<T> a1, $seq<T> a2 ); //  sequence obtained by concatenating a1 and a2
<T> $seq<T> $seq_remove( $seq<T> a, $int i ); // sequence obtained by removing element at position i from a
<T> $seq<T> $seq_insert( $seq<T> a, $int i, T x ); // sequence obtained by inserting element x at position i in a
<T> void $seq_write( $int n, T * ptr, $seq<T> a );  // write the first n elements of a to ptr, ptr+1, ..., ptr+n-1
<T> $seq<T> $seq_read( $int n, T * ptr ); // form a sequence by reading ptr, ptr+1, ..., ptr+n-1

`set.cvh`

<T> $int $set_size( $set<T> s ); // number of elements in s
<T> $seq<T> $set_seq( $set<T> s ); // elements of s arranged in a sequence
<T> $bool $set_contains( $set<T> s, T x ); // does s contain x as member?
<T> $bool $set_containsAll( $set<T> s1, $set<T> s2 ); // is s2 a subset of s1?
<T> $set<T> $set_add( $set<T> s, T x ); // result of adding x to s
<T> $set<T> $set_union( $set<T> s1, $set<T> s2 ); // union of s1 and s2
<T> $set<T> $set_intersection( $set<T> s1, $set<T> s2 ); // intersection of s1 and s2
<T> $set<T> $set_difference( $set<T> s1, $set<T> s2 ); // elements of s1 not in s2

`map.cvh`

<K,V> $set<K> $map_domain( $map<K,V> m ); // returns the domain of the map (the set of keys)
<K,V> $set<V> $map_range( $map<K,V> m ); // returns the range of the map (the set of values)

<K,V> $seq<$pair<K,V>> $map_seq( $map<K,V> m ); // key value pairs of m arranged in a sequence

<K,V> $map<K,V> $map_add( $map<K,V> m, K k, V v ); // result of (re)defining k to map to v.
<K,V> $map<K,V> $map_add( $map<K,V> m, $set<$pair<K,V>> kvs); // result of (re)defining k to map to v for every pair (k,v) in kvs

<K,V> $map<K,V> $map_remove( $map<K,V> m, K k ); // result of declaring k undefined
<K,V> $map<K,V> $map_remove( $map<K,V> m, $set<K> ks ); // result of declaring k undefined for every k in ks

<K,V> $bool $map_contains( $map<K,V> m, K k ); // is k defined in the map?
<K,V> V $map_get( $map<K,V> m, K k ); // get the value defined at k. Should check $map_contains first (what do we do when k is not defined?)

<K,V> $bool $map_injective( $map<K,V> m ); // does m form an injection?
<K,V> $map<V,K> $map_inverse( $map<K,V> m ); // returns the inverse map of m. Should check $map_injective first (what do we do when m is not injective?)

<K,I,V> $map<K,V> $map_compose( $map<I,V> m2, $map<K,I> m1 ); // returns the composition m2.m1

`rel.cvh`

`concurrency.cvh`

void $parspawn($proc * proc_array, $domain d, void (*f)($int, ...));  /* parallel spawn returns immediately*/
void $waitall( $int nprocs, $proc * procs );  /* wait for all procs in list to terminate */

In$parspawn *f must have the function type that consumes n $ints, where n is the dimension of the domain. The function is spawned once for each element of the domain. References to the new processes are stored in the process array. The call to $parspawn returns immediately.

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text