= PIL: Parallel Intermediate Language == Basic Properties * PIL programs are target of translators from C, Fortran, MPI/OpenMP/etc. They can also be written by humans. * PIL should be a high-level language like CIVL-C. * PIL programs can be easily translated to PFG. * PIL must also have a totally well-defined semantics and syntax. * PIL programs can have nested funtion definitions. * PIL programs can use preprocessor directives like in C * There are no automatic conversions. All must be done by explicit casts. * `$input`/`$output` variables in global scope (like CIVL-C) * Identifiers are like in C, e.g., `x`, `f10`, ... * Keywords, functions, etc. not in C start with `$` (like CIVL-C) * Libraries: similar to C, `#include `, but these name PIL libraries and there are additional PIL libraries * PIL programs can be divided into multiple files (translation units) * One TU can refer to a variable, function, type, in another TU. * the variable just needs to be declared somewhere in the TU * the function just needs a prototype somewhere in the TU * the use of "static" in a variable decl or function def makes it private to that TU (so two can have same name) * no need for "extern" * at most one of the TUs declaring the variable can have an initializer * at most one of the TUs can have a definition for a function * Program order: * a program is a sequence of variable, function, and type definitions. * the order is totally irrelevant. * a variable can be used anywhere it is in scope * a function can be called anywhere it is in scope == Types - type names look like `int[]`, `int*[](double)`, etc. - basic types: same as C: `_Bool`, `char`, `short`, `int`, `long`, `long long`, `unsigned`, `float`, `double`, ... - what are the limits of these types? to be decided - `$int`, `$real` (mathematical types) - `struct`: as in C - `union`: as in C - `T[]`: sequence of `T`. Note: no "T[n]". sequences can be assigned, returned, passed as arguments, etc. - `$set` : finite set of `T`, ops to add, contains, remove, size, ... - `$map` : finite map from `T1` to `T2`, ops to put, get, containsKey, ... - `$fun` : "logic functions": deterministic, total, side-effect free functions from `T1` to `T2` - `$tuple` : tuples of specified type (similar to struct) - `$rel` : relations of specified type (set of tuples) - `T*`: pointer to `T` - what can be pointed to: var, array element, struct element, union member, ... - Type definitions: `typedef typename ID;` == Functions There are two kinds of functions in PIL: 1. Imperative functions = "procedures". A procedure has a type of the form `R(T1,...,Tn)` where `R` is the return type and the Ti are the input types. 1. Logic functions. One of these has a type of the form `$fun`. === Procedures A procedure is similar to a C function. It consumes some values of a specified type and possibly returns a value of a specified type. Procedure definitions look like C function definitions: {{{ R f(T1 x1, ..., Tn xn) { stmts } }}} defines a procedure named `f` which consumes inputs of types `T1`, ..., `Tn` and returns a value of type `R`. `R` can be `void` if the procedure does not return a value. The definition above defines a '''constant''' `f` of type '''`R(T1, ..., Tn)`'''. Procedures are first-class values. One may declare a variable of type `R(T1, ..., Tn)`, a procedure may return a value of that type, a procedure may consume a value of that type, a value of that type may be assigned to a variable, etc. Hence the procedure type is just like any other type, and procedure definitions define new constants of that type, just as `1` is a constant of type `$int`. Note this is different from C in that C uses function pointers; PIL dispenses with the need for function pointers. A '''procedure call expression''' has the usual form `g(e1, ..., en)`. This is an expression that can be used anywhere an expression with side-effects is allowed. Here, `g` is an expression of functional type, say `R(T1, ..., Tn)`, and `ei` is an expression of type `Ti` (for i=1, ..., n). The procedure call expression has type `R`. Procedure calls can have side-effects, be nondeterministic, and the behavior can depend on non-local state; they may access any variable in scope, the statements may dereference pointers, etc. {{{ int f(int x) { return x+1; } // f is a constant of type int(int) int callon1( (int)int g ) { return g(1); } ... int y = callon1(f); // y is now 2 }}} Procedure definitions can be nested. It is an error to call a procedure `f` when `f` is not in scope. (This is similar to Gnu C.) In other words, if the call takes place in dyscope d, then the definition of `f` must be in d's static scope, or in the parent of d's static scope, or its parent, etc. A procedure definition can be templated: {{{ }}} This defines one procedure for each assignment of types to the Ti. There is a second way to specify a procedure, using a lambda expression, which is described below. === Logic functions Logic functions are a certain class of functions that have no side-effects, and are deterministic total functions of their arguments and the current state. A logic function has a type of the form `$fun`, logical functions which consume inputs of type `T1`, ..., `Tn` and return a value of type `R`. Logic functions are also first-class objects in PIL. An application of a logic function `f(x1,... ,xn)` is a side-effect-free expression that can be used anywhere an expression is allowed. A logic function is not necessarily pure, i.e., the value of an application may depend on any part of the state, not just the arguments. Despite the apparent similarity with procedures, logic functions and procedures are clearly distinguished and one cannot be converted to another. A logic function can be defined as follows: {{{ $logic R f(T1 x1, ..., xn) = expr; }}} where `expr` is a side-effect-free expression of type `R` and can refer to any variables in scope. === Lambda expressions Lambda expressions can be used to define functions that are anonymous and that are '''closures''', i.e., have an associated environment that persists for the life of the function. A lambda expression that specifies a procedure closure has the form: {{{ $lambda [U1 v1=init1; ... Um vm=initm;] R (T1 x1, ..., Tn xn) { S1; ... } }}} where * the `Ti` and `Uj` are types * the `xi` and `vj` are variables * R is a type (the return type), which may be void * {S1; ...} is a block (same as in a procedure definition) * if R is not void, the block must return a value of type R * the only variables that can occur free in the block are the `xi` and `vj`. The type of this expression is `R(T1, ..., Tn)`. The resulting value of this is type is a procedure which can be called or assigned to a variable, etc., just like any other procedure value. Note that the definition can only use the specified variables. Evaluating this expression yields a closure, which is a pair consisting of a dyscope and the body of the procedure. The dyscope has variables `vj`, which are initialized by evaluating the `initj` when the lambda expression is evaluated. The body of the procedure may read and write to the `vj`. That dyscope will live as long as the procedure is around. Hence a function may return a closure and that closure may still be called at any time, anywhere in the program, regardless of whether the original lambda expression is in scope. A lambda expression that specifies a logic function has the form {{{ $lambda [U1 v1=init1; ... Um vm=initm;] R (T1 x1, ..., Tn xn) expr }}} where * the `Ti` and `Uj` are types * the `xi` and `vj` are variables * R is a type (the return type), which cannot be void * `expr` is a side-effect-free expression of type R * the only variables that can occur free in `expr` are the `xi` and `vj`. The type of this expression is `$fun`. As with procedural lambdas, this yields a logic function with a dynamic scope that persists, so can be called anywhere, even after the lambda expression goes out of scope. In both cases, the initializer expressions `initj` are expressions using any variables in scope. == Tuples Non-mutating expressions: - `t1 == t2` - `t.i` - `($tuple){ x1, ... }` - `$logic $tuple $tuple_write($tuple t, $int i, Ti x);` Mutating expressions: - `t.i = x;` == Sets Non-mutating expressions: - `s1 == s2` - `($set)$empty` // empty set of type T - `$logic _Bool $set_in(T x, $set s);` // is x an element of s? - `$logic $set $set_with($set s, T x);` // s U {x} - `$logic $set $set_without($set s, T x);` // s - {x} - `$logic $set $set_union($set s1, $set s2);` // s1 U s2 - `$logic $set $set_difference($set s1, $set s2);` // s1-s2 - `$logic $set_intersection($set s1, $set s2);` // s1 \cap s2 - `$logic T[] $set_elements($set s);` - `$logic _Bool $set_isSubsetOf($set s1, $set s2);` - `$logic $set $set_map($set s, $fun f);` Mutating procedures: - ` _Bool $set_add($set * this, T x);` - ` _Bool $set_remove($set * this, T x);` - `void $set_addAll($set * this, $set that);` - `void $set_removeAll($set * this, $set that);` - `void $set_keepOnly($set * this, $set that);` == Sequences (arrays) Non-mutating expressions: - `a1 == a2` - `a[i]` - `(T[]){ x1, ... }` Logic functions: - `$logic T[] $seq_fun($int len, $fun<$int,T> f);` - `$logic T[] $seq_uniform($int n, T val);` - `$logic $int $length(T[] a);` // length of a - `$logic T[] $seq_write(T[] a, int i, T x);` // a[i:=x] - `$logic T[] $seq_subseq(T[] a, int i, int n);` // a[i..i+n-1] - `$logic T[] $seq_without(T[] a, int i);` // a with position i removed - `$logic T[] $seq_with(T[] a, int i, T x);` - `$logic T[] $seq_concat(T[] a1, T[] a2);` - `$logic U[] $seq_map(T[] a, $fun f);` - `$logic T[] $seq_filter(T[] a, $fun f);` - `$logic U $seq_foldl(T[] a, $fun<$tuple,U> f, U init);` - `$logic U $seq_foldr(T[] a, $fun<$tuple,U> f, U init); ` Mutating expressions: - `a[i]=x;` Mutating procedures: - `T $seq_remove(T[] * this, int i);` - `void $seq_insert(T[] * this, int i, T x);` - `void $seq_append(T[] * this, T[] that);` == Maps Non-mutating expressions: - `m1 == m2` - `($map)$empty` - `$logic V $map_get($map K key);` - `$logic _Bool $map_containsKey($map map, K key);` - `$logic _Bool $map_containsValue($map map, V val);` - `$logic $map $map_with($map map, K key, V val);` - `$logic $map $map_without($map map, K key);` - `$logic $set $map_keys($map map);` - `$logic $set<$tuple> $map_entries($map map);` Mutating procedures: - `V $map_put($map * this, K key, V val);` - `V $map_remove($map * this, K key);` - `void $map_removeAll($map * this, $set keys);` - `void $map_addAll($map * this, $map that);` == Heaps There is a single heap for dynamic allocation. The following built-in procedures are provided: - `T* $new()`: creates an object `A` of type `T` in heap and returns `&A`. - `T* $alloc(int n)`: creates an array `A` of length `n` of `T` in heap and returns `&A[0]`. - `void $free(T* p)`: frees the object referred to by `p`, the pointer returned by an earlier call to `$new` or `$alloc`. == Questions * how to deal with "undefined" values? * can there be nondeterministic expressions, i.e., expressions that evaluate to a set of values instead of one value? * how to implement C's malloc? Ideas on malloc: Union of all types occurring in program: {{{ typedef union { T1 t1; T2 t2; ... } BigUnion; }}} A call to malloc returns a memory block: - size (in bytes) (int) - num_elements (int) - sequence of tuples: - offset (int) - size (int) - value (type `BigUnion`)