wiki:IR

Version 55 (modified by zmanchun, 10 years ago) ( diff )

--

The CIVL-IR language. A program in this language is also known as a "CIVL model".

Properties of the language:

  • the language is not intended to be written by humans; it is an intermediate form constructed by CIVL. However it should be readable to help debug things
  • a CIVL-IR program represents a guarded-transition system explicitly
  • as in CIVL-C, there are functions, scopes, and functions can be defined in any scope
  • all blocks (including a function body) consist of the following elements, in this order:
    • a sequence of type definitions
    • a sequence of variable declarations with no initializers
    • a sequence of function definitions
    • a sequence of labeled statements. Each clause in the labeled statement is a \when statement with some guard and a primitive statement, followed by a \goto statement
  • an array is declared without any length expression. When it is initialized it can specify length.

Example:

f(): Integer {  // something like "f(x:Integer): Integer {" ?
  x: Real;
  y: Real;
  z: Float(16,23);
 
L1 :
  {
    when (g1) stmt1; goto L2;
    when (g2) stmt2; goto L3;
  }

  { // begin new scope
    Real x;

L2 :
    {
      when (g3) stmt3; goto L4;
      ...
    }
  } // end new scope
...
}
// etc.

Example:

{
  int x=3*y;
  int a[x+1];
}

translates to:

{
  x: Integer;
  a_t : Dytype[Array[Integer]]; // I thought Dytype could be a parameterized type, where the parameter is the static type the dytype refines
  a: Array[Integer];
L1:
  when (\true) x:=3*y; goto L2;
L2:
  when(\true) a_t:=\dytype(Array[Integer, x+1]); goto L3; // Array[Integer] is refined here using x+1
L3:
  when (\true) a:=\new(a_t); goto L4;
L4:
}

Types

The types are:

  • Bool : boolean type, values are \true and \false
  • Proc : process type
  • Scope : scope type
  • Char : character type. Alternatively, get rid of this and just use an integer type.
  • Bundle : type representing some un-typed chunk of data
  • Heap : heap type
  • Range : ordered set of integers
  • Domain : ordered set of tuples of integers
  • Domain[n], n is an integer at least 1; subtype of Domain in which all tuples have arity n.
  • Enum types.
    • different from integers or like C?
  • Integer : the mathematical integers
  • Int[lo,hi,wrap]
    • lo, hi are integers, wrap is boolean
    • finite interval of integers [lo,hi]. If wrap is true then all operations "wrap", otherwise, any operation resulting in a value outside of the interval results in an exception being thrown.
    • Do we want to allow lo and hi to be any values of type Integer, which means they are dynamic types, like complete array types?
  • HerbrandInt : Herbrand integers. Values are unsimplified symbolic expressions.
  • Real : the mathematical real numbers
  • Float[e,f], e, f are integers, each at least 1. Same question for e and f as for lo and hi.
    • IEEE754 floating point numbers
  • HerbrandReal : Herbrand real numbers. Values are unsimplified symbolic expressions.
  • Struct types remove struct?
    • Struct tag { decl1; ...; decln }; or
    • Struct tag;
    • structure type with named fields. Names may not seem necessary but if you want a subset of CIVL-C...
    • What about bit-widths?
  • Tuple[type0, type1, ...]: a tuple type which contains a sequence of sub-types
  • Union: similar to structs
  • Array[T] : array-of-T
  • Function[S1,...,Sn;T]
    • function consuming S1,...,Sn and returning T. T can be void. The actual notation is the horrible C notation.
  • Mem : type representing a memory set. May be thought of as a set of pointers.
  • Pointer : all pointers, a subtype of Mem
    • Pointer[T] : pointer-to-T, subtype of Pointer
  • Dytype: dynamic type

Type facts:

Pure types contain no values anywhere in the type tree. That is, there is no array length expression in the type. The pure types are the static types of the CIVL-IR. Each variable is declared to have some pure type.

Augmented types include all the pure types plus possible length expressions.

A type name is a syntactic element that names a (pure or augmented) type Examples include int[] and int[n*m]. This is the same as in C.

The expression \new(T) takes a type name and returns the initial value for an object of that type. The initial value of Integer and other primitive (non-compound) types is "undefined". The initial value of Array[Integer] is an array of length 0 of Integer. The initial value of Pointer[Real] is the undefined pointer to Real. The initial value of Array[Real, 10] is the array of length 10 in which each element is undefined. In general, the initial value of an array of length n is the sequence of length n in which every element is the initial value of the element type of the array. The initial value of a structure is the tuple in which each component is assigned the initial value for its type.

Example:

// type definitions
// can we come up with a better way than \type here.  Maybe typedef?  At least you know the original
// program can't use typedef as an identifier...
// also I changed := to = since this is not an assignment.  it is a definition.
\type S=Tuple[Array[Integer]];

// variable decls
n: Integer;
S_d: Dytype[S];
x1: S;
x2: S;

// statements (leaving out the chooses and whens for brevity)
n:=10;
S_d:=\dytype(Tuple[Array[Integer, n]]);
x1:=\new(S_d);
n:=20;
x2:=\new(S_d);

is semantically equivalent to the C code

int n = 10;
struct S { int a[n]; };
struct S x1;
n=20;
struct S x2;

Expressions

In the following list of expressions, e, e0, e1, etc., are expressions. T is a type name. t is an expression of type Dytype.

  • literals
    • \true, \false : values of type Bool
    • 123, -123, 3.1415, etc. : values of type Integer, Int, Real, Float
      • what particular notations for floating values?
    • 'a', 'b', ... UNICODE?
    • \cast(Array[T], {e0, e1, ...}) : values of type Array[T] THIS IS NOT A CAST. Need better notation
    • \cast(S, {e0, ...}) : values of type S (struct literal) THIS IS NOT A CAST. Need better notation
    • e1..e2, e1..e2#e3 : values of type \Range
    • \cast(Domain, {r1,...,rn}) : value of type Domain[n] NOT A CAST
    • "abc" : string literals: value of type Array[Char]
    • \root, \here : values of type Scope
    • \self, \proc_null : values of type Proc
    • NULL : value of type void*
  • variables
  • \sizeof_type(t) : the size of the dynamic type t
  • \sizeof(e) : the size of the value of expression e Should this be \sizeof_expr since the other one is \sizeof_type ?
  • \new(t) : new (default) value of specified dynamic type
  • \defined(e) : is e defined? Type is Bool
  • \has_next(dom, i, j, k, …): an expression of boolean type, testing if the domain dom contains any element after (i, j, k, ...)
  • \add(e1, e2) : numeric addition.
    • e1 and e2 have the same numeric type. Note that there are no "automatic conversions" as there are in C. If the original expressions have different types, explicit casts must be inserted.
  • \padd(e1, e2): pointer addition.
    • e1 has pointer type and e2 has an integer type.
  • \sub(e1, e2) : subtraction
  • \mul(e1, e2) : multiplication
  • \div(e1, e2) : division
    • If both are integer types, the result is integer division. Otherwise it is real division. Need to define what happens for negative integers.
  • \mod(e1, e2) : modulus
  • \array_subscript(e1, e2) : array subscript expression. Note that e1 must have array type, not pointer type. (This is different from C.) If e1 has pointer type, use \deref(\padd(e1, e2)) instead.
    • maybe shorten to \sub?
  • \deref(e) : pointer dereference
  • \address_of(e) : address-of. Or: \addressof ? After all, we have \sizeof.
  • \not(e) : logical not
  • \neg(e) : negative
  • \cast(T, e) : casts e to a value of the named type
    • need to list all of the legal casts and what they mean exactly
    • cast of integer to array-of-boolean, and vice-versa?
    • Instead of casts would it be better to have explicit functions for each legal kind of cast?
  • \eq(e1, e2), \neq(e1, e2): equality/inequality test
  • \and(e1, e2), \or(e1, e2): logical and/or operation
  • \cond(e1, e2, e3): conditional expression, equivalent to e1?e2:e3 in C.
  • \lt(e1, e2), \lte(e1, e2): less than/less than or equal to
  • e0(e1,...,en) : a function call where e must evaluate to an abstract or pure system function
  • \forall, \exists : \forall {Integer i | e0} e1. \exits is similar. Is this the best notation?
  • \tuple_read(e1, i), some natural number i (tuple read)
  • \bit_and(e1, e2), \bit_or(e1, e2), \bit_xor(e1, e2), \bit_comp(e1) : bit-wise operations: arguments are arrays of booleans
  • Memory set expressions: are these literal values of type Mem?
    • a left-hand-side expression, when used in certain contexts, is automatically converted to Mem. The contexts are: arguments to the built-in functions \access, \read, and \write described below, or occurrence in the list of an \assigns clause
    • \array_chunk(a, dom), where a is an expression of array type and dom is an expression of Domain type. The dimension of the array must match the dimension of the domain. This represents all memory units which are the cells in the array indexed by a tuple in dom.
    • \region(ptr), where ptr is an expression with a pointer type. This represents the set of all memory units reachable from ptr, including the memory unit pointed to by ptr itself.
      • \mem_reach ?
    • \mem_union(mem1, mem2), where mem1 and mem2 are expressions of type Memory. This is the union of the two sets.
    • could we use Frama-C notation p+(e1..e2) for example?

Pointers: unlike C, there is no "array-pointer pun". If an array a needs to be converted to a pointer, you must use \address_of(\address_of(\array_subscript(a, 0))).

The Primitive Statements

  • Assign: e1:=e2;
  • Call: \call e0, (e1,...,en); and \call e, e0, (e1,...,en);
    • regular function (one with flow graph)
    • distinguish this kind of call from an expression involving a pure system or abstract call???
  • Spawn: \spawn e0, (e1,...,en); and \spawn e, e0, (e1,...,en);
  • Wait: \wait e;
  • Wailtall: \waitall e where e is an array of process references (Proc) to be waited for;
  • Allocation
    • \allocate e, h,t,e0;, where
      • h as type Heap
      • t has type Dytype
      • e has integer type.
    • Allocates e objects of type t on heap h
    • To translate the C malloc you first need to figure out the type of the elements being malloced. If the argument to malloc is n, then you first need to insert an assertion \eq(\mod(n, \sizeof_type(t)), 0), and then \allocate(h,t,\div(n, \sizeof_t(t))).
  • Free: \free(p);
  • Expression statement: e;, where e is side effect free except that it might contain error/exception (e.g., array index out of bound, division by zero);
  • Noop: ;
    • Is there a need to add annotations for "true" or "false" branch, etc.? If so, we can just make these parameters to the Noop.
  • Return: \return; and \return e;
  • Atomic_enter: \atomic_enter;
  • Atomic_exit: \atomic_exit;
  • Parfor_spawn: \parfor_spawn(int i,j,..: dom) f(i,j,...); Maybe \parspawn ?
  • Domain iterator: \next(dom,i,j,…) updates i, j, ... to be the value of the inter tuple in dom after (i, j, ...)
  • For_dom_enter (for domains): \for_enter(i,j,k..: dom);

Declarations and Function Definitions

Declarations follow the C notation. Function prototypes are considered to be declarations similar to variable declarations.

Example of declaration of a function:

f(Real x, Bool y; Integer);

Additional modifiers that may be placed on any of above:

  • \pure : the function has no side effects, but may be nondeterministic
  • \abstract: function is a pure, mathematical function: deterministic function of inputs
  • \atomic_f: function definition is atomic, and it never blocks

System functions:

  • A function declaration which is not abstract and for which no definition is provided is a system function.
  • If the system function is called anywhere in the program, it must be defined by providing Java code in an Enabler and Executor. Failure to do so will result in an exception.
  • A system function may modify any memory it can reach. This includes allocating new data on heaps it can reach.
  • A system function may have a guard.

Example of a declaration of a system function with guard.

g(Real x, Bool y; Bool) { ... }
f(Real x, Bool y; Integer) \guard {g};

Function Contracts

  • event set expressions:
    EventSetExpression
     : \read(MemorySetExpression)
     | \write(MemorySetExpression)
     | \access(MemorySetExpression)
     | \calls(FunctionCallExpression)
     | \nothing
     | \everything
     | ‘(’ EventSetExpression ‘)’
     | EventSetExpression + EventSetExpression
     | EventSetExpression - EventSetExpression
     | EventSetExpression & EventSetExpression
    
  • depends clause: \depends [condition] { event1, event2, ...}
    • Example:
      \depends {
        \access(n) - (\calls(inc(MemorySetExpression)) + \calls(dec(MemorySetExpression)))
      }
      
    • absence of \depends clause:
  • assigns-or-reads clause
    • assigns clause: \assigns [condition] {memory-list}
    • reads clause: \reads [condition] {memory-list}
    • \reads {\nothing} implies \assigns {\nothing}
    • \reads {\nothing} is equivalent to: \reads {\nothing} \assigns {\nothing}
    • \assigns {X} where X != \nothing, implies \reads {X}
    • \assigns {X} is equivalent to: \assigns{X} \reads{X}
    • absence:
      • absence of \reads clause: there is no assumption about the read access of the function, i.e., the function could read anything
      • absence of \assigns clause: similar to the absence of \reads clause
    • \reads/\assigns {\nothing} doesn’t necessarily means that the function never reads or assigns any variable. The function could still reads/assigns its “local” variables, including function parameters and any variable declared inside the function body.
  • For an independent function which has \depends {\nothing}, usually we also need to specify \reads{nothing}, for the purpose of reachability analysis.

e.g.,

/* Returns the size of the given bundle b. */
\bundle_size(Bundle b; Int)
  \depends {\nothing}
  \reads {\nothing}
  ;
  • Example of a function declaration with contracts:
    \atomic_f sendRecv(Int cmd, Pointer buf; Int)
      \depends [\eq(cmd, SEND)] {\write(buf)}
      \depends [\eq(cmd, RECV)] {\access(\deref(buf))}
      \assigns [\eq(cmd, SEND)] {\nothing}
      \assigns [\eq(cmd, RECV)] {\deref(buf)}
      \reads {\deref(buf)}
    {
      if(\eq(cmd, SEND)){
         send(\deref(buf), ...);
      }else if(\eq(cmd, RECV)){
        \deref(buf):=recv(...);
      }
    }
    

Program

A program consists of a sequence of global variable declarations, which may include declarations annotated by $input and $output, followed by a sequence of function declarations and definitions.

Semantics

Semantics issues

  • define every possible cast
  • define every possible +, etc.
  • define every kind of pointer value and casts between pointer types
  • casts between pointer and integer types?

Values

Transitions

Libraries

Note: See TracWiki for help on using the wiki.