source: CIVL/docs/dev/pil.md@ 6b26bbb

1.23 2.0 main
Last change on this file since 6b26bbb was 135e8cf, checked in by Youngjun Lee <youngjunlee7@…>, 3 weeks ago

Initial Markdown documents

  • Property mode set to 100644
File size: 20.5 KB
Line 
1# PIL: Parallel Intermediate Language
2
3## Basic Properties
4
51. PIL programs are target of translators from C, Fortran, MPI/OpenMP/etc. They can also be written by humans.
61. PIL should be a high-level language like CIVL-C.
71. PIL programs can be easily translated to PFG.
81. PIL must also have a totally well-defined semantics and syntax.
91. Every operation should have a well-defined outcome, even division by zero or an illegal pointer dereference. However, when analyzing a PIL program, a user can specify exactly which of these operations should be considered erroneous and reported. A reasonable default might be to make all of them erroneous.
101. PIL programs can have nested funtion definitions.
111. PIL programs can use preprocessor directives like in C
121. There are no automatic conversions. There is no "array-pointer pun". All conversions must be done by explicit casts or other expressions. To convert an array `a` to a pointer to element 0 of `a`, write `&a[0]`.
131. PIL supports `$input`/`$output` variables in global scope (like CIVL-C)
141. PIL identifiers are the same as C's, e.g., `x`, `f10`, ...
151. Keywords, built-in or library functions, constants, and types not in C start with `$` (like CIVL-C)
161. PIL supports libraries: similar to C, `#include <stdlib.h>`, but these name PIL libraries and there are additional standard PIL libraries not in the C standard library
171. Program order:
18 1. A program is a set of variable, function, and type definitions.
19 1. The order is totally irrelevant. [NOTE: except for initializers?]
20 1. A variable can be used anywhere it is in scope.
21 1. A function can be called anywhere it is in scope.
221. PIL programs can be divided into multiple files (translation units)
23 1. One TU can refer to a variable, function, or type, in another TU.
24 1. The variable just needs to be declared somewhere in the TU.
25 1. The function just needs a prototype somewhere in the TU.
26 1. The use of `static` in a variable declaration or function definition makes it private to that TU (so two can have same name), as in C.
27 1. There is no need for `extern` so this is not in PIL.
28 1. At most one of the TUs declaring the variable can have an initializer.
29 1. At most one of the TUs can have a definition for a function.
30
31## Types
32
33### Basic properties of types
34
35**type names** are used for all declarations. There are no C declarators. Examples:
36
37* `$int[] a`: declares `a` to be an array of integer
38* `$int* p`: pointer to integer
39* `$int*[] b`: array of pointer to integer
40* `$int[]* q`: pointer to array of integer
41* `$int*[]($real) f`: function from Real to array of pointer to integer
42
43Type definitions have the form: `typedef typename ID;`.
44
45Every type has a default value. The result of evaluating an erroneous expression is the default value of that type. (But as explained above, an analyzer may check for all errors.)
46
47### The types
48
49* basic types:
50 * `$bool` : the set consisting of `$true` and `$false`. Default value: `$false`.
51 * `$int`: the set of integers. Default value: 0.
52 * `$real`: the set of Reals. Default value: 0.0.
53 * `$int<N>`: set of integers [-N, N-1]. `N` is a constant positive integer expression so is statically known. Underflows or overflows are erroneous. Default value: 0.
54 * `$uint<N>`: set of integers [0,N-1]. `N` is a constant positive integer expression so is statically known. Arithmetic is performed modulo `N` (like C's unsigned integer types). Default value: 0.
55 * `$float<p,emax>`: the set of IEEE binary floating point numbers with precision `p` and emax `emax`. Both `p` and `emax` are constant integer expressions. Default value: 0.0.
56* `T*`: pointer to `T`. Default value: `NULL`.
57* `struct TAG { T1 f1; ... Tn fn; }`: a C struct. Default value: the struct value in which each field has its default value.
58* `union`: a C union. Default value: the union value in which the first field has its default value.
59* `T[]`: sequence of `T`. Note: there is no "T[n]". Sequences are first-class values: they can be assigned, returned, passed as arguments, etc. Default value: the sequence of length 0.
60* `R(T1, ..., Tn)`: the type of a procedure which consumes values in `T1`, ..., `Tn` and returns a value in `R`. This is basically C's function type. Default value: the procedure which does nothing (has no side-effects) and returns the default value of `R` for any inputs.
61* `$fun<T1,T2>` : "logic functions": deterministic, total, side-effect free functions from `T1` to `T2`. Note however the function may depend on the state (i.e., the state should be considered a hidden input). A `$pure` function is a function that does not depend on the state. Default value: the function that returns the default value of `T2` on any input.
62* `$set<T>` : finite set of `T`. Default value: the empty set.
63* `$map<T1,T2>` : finite map from `T1` to `T2`. A map is a set of ordered pairs `(x,y)` with the property that if `(x,y)` and `(x,z)` are in the map, then `y`=`z`. Default value: the empty map.
64* `$tuple<T1,...>` : tuples of specified type. This is similar to `struct`, but there is no tag and the fields do not have names. Default value: the tuple in which each component has its default value.
65* `$obj` : the object type. This is a type for representing a sequence of bytes holding arbitrary values at different intervals. It is used to model C objects, such as the object created by a call to C's `malloc`.
66
67### Sized types
68
69The types above are the *unsized types* A *sized type* comprises an unsized type and a positive integer, which represents the size of the type in bytes. Syntactically, a sized type is specified
70
71```
72 $size(expr) typename
73```
74
75where expr is an integer constant expression and typename is the name of an unsized type. For example,
76
77```
78typedef $size(1) $int char;
79typedef $size(4) $int int;
80typedef $size(8) $int long;
81typedef $size(4) $real float;
82typedef $size(8) $real double;
83```
84
85defines 5 distinct types. Certain operations require a sized type.
86
87Restrictions on sized types:
88
89* an array type cannot be sized
90* the type `$obj` cannot be sized
91* if a struct type is sized and all of its fields have sized types then the size of the struct
92must be the sum of the sizes of the fields
93* if a union type is sized then the size of the union must be greater than or equal to the
94size of any sized type field
95
96Note on translation from C: C structs can have "padding" after each field. This can be modeled
97in PIL by making the padding explicit, e.g.,
98```
99struct Pair {
100 int i;
101 char[2] _padding1;
102 double x;
103 char[4] _padding2;
104```
105
106Except... you can do things like `char[2]` in PIL, hmmm, have to think about this.
107
108
109## Functions
110
111There are two kinds of functions in PIL:
112
1131. Procedural functions = "procedures". A procedure has a type of the form `R(T1,...,Tn)` where `R` is the return type and the `Ti` are the input types.
1141. Logic functions. One of these has a type of the form `$fun<R,T1,...,Tn>`.
115
116### Procedures
117
118A procedure is similar to a C function. It consumes some values of a specified type and possibly returns a value of a specified type. Procedure definitions look like C function definitions:
119```
120 R f(T1 x1, ..., Tn xn) { stmts }
121```
122defines a procedure named `f` which consumes inputs of types `T1`, ..., `Tn` and returns a value of type `R`. `R` can be `void` if the procedure does not return a value.
123
124The definition above defines a **constant** `f` of type **`R(T1, ..., Tn)`**. Procedures are first-class values. One may declare a variable of type `R(T1, ..., Tn)`, a procedure may return a value of that type, a procedure may consume a value of that type, a value of that type may be assigned to a variable, etc. Hence the procedure type is just like any other type, and procedure definitions define new constants of that type, just as `1` is a constant of type `$int`. Note this is different from C in that C uses function pointers; PIL dispenses with the need for function pointers and just uses functions.
125
126A **procedure call expression** has the usual form `g(e1, ..., en)`. This is an expression that can be used anywhere an expression with side-effects is allowed. Here, `g` is an expression of functional type, say `R(T1, ..., Tn)`, and `ei` is an expression of type `Ti` (for i=1, ..., n). The procedure call expression has type `R`.
127
128Procedure calls can have side-effects, be nondeterministic, and the behavior can depend on non-local state; they may access any variable in scope, the statements may dereference pointers, etc.
129
130```
131 int f(int x) { return x+1; } // f is a constant of type int(int)
132 int callon1( (int)int g ) {
133 return g(1);
134 }
135 ...
136 int y = callon1(f); // y is now 2
137```
138
139Procedure definitions can be nested. It is an error to call a procedure `f` when `f` is not in scope. (This is similar to GNU C.) In other words, if the call takes place in dyscope d, then the definition of `f` must be in d's static scope, or in the parent of that scope, or its parent, etc.
140
141There is a second way to specify a procedure, using a lambda expression, which is described below.
142
143### Logic functions
144
145Logic functions are a certain class of functions that have no side-effects, and are deterministic total functions of their arguments and the current state. A logic function has a type of the form `$fun<R,T1,...,Tn>`, which signifies the set of logical functions which consume inputs of type `T1`, ..., `Tn` and return a value of type `R`.
146
147Logic functions are also first-class objects in PIL. An application of a logic function `f(x1,... ,xn)` is a side-effect-free expression that can be used anywhere an expression is allowed. A logic function is not necessarily pure, i.e., the value of an application may depend on any part of the state, not just the arguments.
148
149Despite the apparent similarity with procedures, logic functions and procedures are clearly distinguished and one cannot be converted to another.
150
151A logic function can be defined as follows:
152```
153 $logic R f(T1 x1, ..., Tn xn) = expr;
154```
155where `expr` is a side-effect-free expression of type `R` and can refer to any variables in scope.
156
157### Misc.
158
159Both procedure and logic function definitions can be templated, e.g.,
160```
161 <T1,T2> int f($map<T1,T2> f, T1 x) { ... }
162 <T1,T2> $logic int g($map<T1,T2> f, T1 x) { ... }
163```
164This defines one procedure or logic function for each assignment of types to the `Ti`.
165
166Both kinds of functions can be declared without providing definitions, indicating that the definition can be found in a different translation unit:
167```
168 int f(int x);
169 $logic int g(int x);
170```
171
172### Lambda expressions
173
174Lambda expressions can be used to define functions that are anonymous and that are **closures**, i.e., have an associated environment that persists for the life of the function.
175
176A lambda expression that specifies a procedure closure has the form:
177```
178 $lambda [U1 v1=init1; ... Um vm=initm;] R (T1 x1, ..., Tn xn) { stmts }
179```
180where
181
182* the `Ti` and `Uj` are types
183* the `xi` and `vj` are variables
184* the `initj` are expressions that can refer to any variables in scope
185* R is a type (the return type), which may be void
186* {S1; ...} is a block (same as in a procedure definition)
187* if R is not void, the block must return a value of type R
188* the only variables that can occur free in the block are the `xi` and `vj`.
189
190The type of this expression is `R(T1, ..., Tn)`. The resulting value is a procedure which can be called or assigned to a variable, etc., just like any other procedure value.
191
192Note that the definition can only use the specified variables. Evaluating this expression yields a closure, which is a pair consisting of a dyscope and the body of the procedure. The dyscope has variables `vj`, which are initialized by evaluating the `initj` when the lambda expression is evaluated. The body of the procedure may read and write to the `vj`. That dyscope has no parent and will live as long as the procedure is around. Hence a function may return a closure and that closure may still be called at any time, anywhere in the program, regardless of whether the original lambda expression is in scope.
193
194When a procedure closure is called, a new dyscope is created whose parent dyscope is the dyscope of the closure. In the new dyscope, the formal parameters are assigned the actual values and procedure is executed in that scope. When it returns, the new dyscope is removed.
195
196A lambda expression that specifies a logic function has the form
197```
198 $lambda [U1 v1=init1; ... Um vm=initm;] R (T1 x1, ..., Tn xn) expr
199```
200where
201
202* the `Ti` and `Uj` are types
203* the `xi` and `vj` are variables
204* the `initj` are expressions that can refer to any variables in scope
205* R is a type (the return type), which cannot be void
206* `expr` is a side-effect-free expression of type R
207* the only variables that can occur free in `expr` are the `xi` and `vj`.
208
209The type of this expression is `$fun<R, T1, ..., Tn>`. As with procedural lambdas, this yields a logic function with a dynamic scope that persists, so can be called anywhere, even after the lambda expression goes out of scope.
210
211## Tuples
212
213Expressions: these are all side-effect-free, assuming `expr1`, ... are side-effect-free.
214
215* `t1 == t2`
216 * the two tuples have the same type and corresponding components are equal
217* `t.(i)`
218 * the value of component `i` (an integer constant) of tuple `t`. Syntax error if `i` is out of range for the tuple type.
219* `($tuple<T1,...>){ expr1, ... }`
220 * the literal tuple with the given list of components.
221* `t[.(i) := expr1]`
222 * the tuple which is the same as `t`, except for component `i` (an integer constant), which has value `expr1`. Syntax error if `i` is out of range.
223
224Mutating expressions:
225
226* `t.(i) = expr` : sets component `i` of tuple `t` to `expr`. Here `i` is an integer constant; it is a syntax error if `i` is out of range.
227
228## Sets
229
230Side-effect free expressions:
231
232* `s1 == s2`
233 * the two sets have the same type and contain the same elements
234* `($set<T>)$empty`
235 * empty set of type T
236
237Logic functions:
238
239* `$pure $logic $bool $set_contains($set<T> s, T x);`
240 * is x an element of s?
241* `$pure $logic $set<T> $set_with($set<T> s, T x);`
242 * s U {x}
243* `$pure $logic $set<T> $set_without($set<T> s, T x);`
244 * s - {x}
245* `$pure $logic $set<T> $set_union($set<T> s1, $set<T> s2);`
246 * s1 U s2
247* `$pure $logic $set<T> $set_difference($set<T> s1, $set<T> s2);`
248 * s1-s2
249* `$pure $logic $set_intersection($set<T> s1, $set<T> s2);`
250 * s1 \cap s2
251* `$pure $logic T[] $set_elements($set<T> s);`
252 * returns the set of elements of s as an array in some deterministic order
253* `$pure $logic $bool $set_subset($set<T> s1, $set<T> s2);`
254 * is s1 a subset of s2?
255* `$pure $logic $set<U> $set_map($set<T> s, $fun<T,U> f);`
256 * { f(x) | x in s}
257* `$pure logic $set<T> $set_comprehension($set<T>, $fun<T,$bool> p);`
258 * { x in s | p(x) }
259
260Mutating procedures:
261
262* `$bool $set_add($set<T> * this, T x);`
263 * adds `x` to `this`. Returns `$true` iff `x` was not in `this`, i.e., if the operation results in a change to the set.
264* `$bool $set_remove($set<T> * this, T x);`
265 * removes `x` from `this`. Returns `$true` iff `x` was in `this`, i.e., if the operation results in a change to the set.
266* `void $set_addAll($set<T> * this, $set<T> that);`
267 * adds all of the elements of `that` to `this`
268* `void $set_removeAll($set<T> * this, $set<T> that);`
269 * removes all of the elements of `that` from `this`
270* `void $set_keepOnly($set<T> * this, $set<T> that);`
271 * removes any elements of `this` which is not in `that` from `this`
272
273## Sequences (arrays)
274
275Expressions:
276
277* `a1 == a2`
278 * the two sequences have the same type and length and corresponding elements are equal
279* `a[i]`
280 * the `i`-th element of `a`. Here `a` is an expression of type `T[]` and `i` is an expression of type `$int`. Requires `i` to be in range, i.e., `i` is in [0,n-1], where n is the length of `a`. (Otherwise the expression is erroneous.)
281* `a[i:=x]`
282 * the sequence which is the same as `a`, except in position `i`, where it has value `x`. Here `a` is an expression of type `T[]`, `i` is an expression of type `$int`, and `x` is an expression of type `T`. Requires `i` to be in range for `a`.
283* `(T[]){ expr1, ... }`
284 * the sequence obtained by evaluating the given expressions, each of which has type `T`
285
286Logic functions:
287
288* `$pure $logic T[] $seq_fun($int len, $fun<$int,T> f);`
289 * the sequence of length `len` whose value at position `i` is `f(i)`
290* `$pure $logic T[] $seq_uniform($int n, T val);`
291 * the sequence of length `n` in which every component is `val`
292* `$pure $logic $int $length(T[] a);`
293 * the length of `a`
294* `$pure $logic T[] $seq_subseq(T[] a, int i, int n);`
295 * the sequence of length n, a[i..i+n-1]. Requires that i..i+n-1 are all in range for `a`
296* $pure $logic T[] $seq_with(T[] a, int i, T x);`
297 * the sequence obtained from `a` by inserting `x` at position `i` and shifting the subsequent elements up. Requires `i` to be in [0,n], where `n` is the length of `a`.
298* `$pure $logic T[] $seq_without(T[] a, int i);`. Requires `i` to be in [0,n-1], where `n` is the length of `a`.
299 * the sequence obtained from `a` by deleting the element in position `i` and shifting subsequent elements down
300* `$pure $logic T[] $seq_concat(T[] a1, T[] a2);`
301 * the sequence obtained by concatenating the two given ones
302* `$pure $logic U[] $seq_map(T[] a, $fun<T,U> f);`
303 * the sequence obtained by applying `f` to every element of `a`
304* `$pure $logic T[] $seq_filter(T[] a, $fun<T,$bool> f);`
305 * the sequence obtained from `a` by removing every element for which `f(a[i])` is `$false`
306* `$pure $logic U $seq_foldl(T[] a, $fun<$tuple<T,U>,U> f, U init);`
307 * the sequence b which has the same length as `a`, and where `b[0]`=`f(init, a[0])`, `b1`=`f(b[0], a[1])`, `b2=f(b[1], a[2])`, etc.
308* `$pure $logic U $seq_foldr(T[] a, $fun<$tuple<T,U>,U> f, U init);`
309 * the sequence b which has the same length as `a`, and where `b[n-1]`=`f(a[n-1], init)`, `b[n-2]`=`f(a[n-2], b[n-1])`, etc.
310
311Mutating expressions:
312
313* `a[i]=x;`
314 * assigns value of `x` to `a[i]` and returns value of `x`. Here `a` has array type, say `T[]`, `x` is an expression of type `T`, and `i` is an expression of type `$int`
315
316Mutating procedures:
317
318* `T $seq_remove(T[] * this, int i);`
319* `void $seq_insert(T[] * this, int i, T x);`
320* `void $seq_append(T[] * this, T[] that);`
321
322## Maps
323
324Side-effect-free expressions:
325
326* `m1 == m2`
327 * the two maps have the same type and the same entries
328* `($map<K,V>)$empty`
329 * the empty map from `K` to `V`
330
331Logic functions:
332
333* `$pure $logic V $map_get($map<K,V> K key);`
334* `$pure $logic $bool $map_containsKey($map<K,V> map, K key);`
335* `$pure $logic $bool $map_containsValue($map<K,V> map, V val);`
336* `$pure $logic $map<K,V> $map_with($map<K,V> map, K key, V val);`
337* `$pure $logic $map<K,V> $map_without($map<K,V> map, K key);`
338* `$pure $logic $set<K> $map_keys($map<K,V> map);`
339* `$pure $logic $set<$tuple<K,V>> $map_entries($map<K,V> map);`
340
341Mutating procedures:
342
343* `V $map_put($map<K,V> * this, K key, V val);`
344* `V $map_remove($map<K,V> * this, K key);`
345* `void $map_removeAll($map<K,V> * this, $set<K> keys);`
346* `void $map_addAll($map<K,V> * this, $map<K,V> that);`
347
348## Heaps
349
350There is a single heap for dynamic allocation. The following built-in procedures are provided:
351
352* `T* $new<T>()`
353 * creates an object `A` of type `T` in heap and returns `&A`.
354* `T* $alloc<T>(int n)`
355 * creates an array `A` of length `n` of `T` in heap and returns `&A[0]`.
356* `void $free<T>(T* p)`
357 * frees the object referred to by `p`, the pointer returned by an earlier call to `$new` or `$alloc`.
358
359## Objects
360
361* `$obj $empty_obj($int size);`
362 * this function returns the empty object spanning `size` bytes. Only values of a sized type can be written to an object.
363* `void* $obj_base($obj* optr);`
364 * returns a pointer to byte 0 of the object. That pointer can be cast to different pointer types in order to access the object.
365
366An object is modified through a pointer into the object, for example:
367```
368typedef $size(4) $int int;
369typedef $size(8) $real double;
370$obj o1 = $empty_obj(1000);
371double* dp = (double*)$obj_base(&o1); // points to byte 0
372*dp = 3.14;
373*(dp+7) = 2.718;
374int* ip = (int*)dp;
375*(ip+2) = 17;
376```
377The operations above place the double value 3.14 in `o1` at positions 0..7, the double value 2.718 at positions 28..35, and the integer value 17 at positions 8..11.
378
379If a write to an object overlaps the interval of an existing value, the old value is removed.
380
381The values in an object are also read using pointers.
382
383
384
385## Questions
386
387* how to deal with "undefined" values?
388* can there be nondeterministic expressions, i.e., expressions that evaluate to a set of values instead of one value?
389* can there be nondeterministic statements, i.e., statements which result in multiple transitions
390* how to implement C's malloc?
391
Note: See TracBrowser for help on using the repository browser.