| 3 | | Questions: |
| 4 | | * How to handle pre-processor macros? Include in AST? |
| 5 | | * How abstract should the AST be? |
| 6 | | * Should it contain semantic information, e.g., types, and variables? |
| 7 | | * How will it handle things like (foo)*bar: this could be either a cast of *bar to type foo, or it could be the product of foo and bar; you need to know whether foo defines a type, which is some semantic information |
| 8 | | * cf. http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y |
| 9 | | * approach: just choose one way, then change in later pass when analyzing |
| 10 | | |
| 11 | | == Elements of a TASS AST == |
| 12 | | |
| 13 | | See description of CIL AST: |
| 14 | | * http://www.cs.berkeley.edu/~necula/cil/api/ |
| 15 | | * http://www.cs.berkeley.edu/~necula/cil/api/Cil.html |
| 16 | | |
| 17 | | |
| 18 | | Preprocessing: before an AST is created, the source file(s) are preprocessed to create a stream of tokens with complete source information. This stream is fed to the parser which creates the AST. |
| 19 | | |
| 20 | | All AST nodes have source information. |
| 21 | | |
| 22 | | Types of AST Node |
| 23 | | * identifiers |
| 24 | | * name: string |
| 25 | | * type node |
| 26 | | * void |
| 27 | | * integer |
| 28 | | * integer sub-types, specified by parameters |
| 29 | | * real |
| 30 | | * real sub-stypes, specified by parameters (e.g., IEEE754 floating point numbers) |
| 31 | | * boolean |
| 32 | | * char |
| 33 | | * array of t |
| 34 | | * and possible extent? |
| 35 | | * pointer to t |
| 36 | | * record ("struct" in C) |
| 37 | | * name |
| 38 | | * sequence of fields, each with type and name |
| 39 | | * function from T1 to T2 |
| 40 | | * enumeration type |
| 41 | | * name |
| 42 | | * sequence of elements |
| 43 | | * function declaration node (no body) |
| 44 | | * function definition node (body) |
| 45 | | * type definition node (typedef...) |
| 46 | | * statements (may have label) |
| 47 | | * assign (translate x+=a, x*=a, ...) |
| 48 | | * assert |
| 49 | | * assume |
| 50 | | * pragma (any kind of pragma, represented as just a string) |
| 51 | | * string |
| 52 | | * assert, assume, invariant, input, output, ... |
| 53 | | * case statement (select) |
| 54 | | * if-then, if-then-else |
| 55 | | * while |
| 56 | | * for |
| 57 | | * until |
| 58 | | * break |
| 59 | | * continue |
| 60 | | * goto |
| 61 | | * return |
| 62 | | * no-op |
| 63 | | * block ({...}) |
| 64 | | * variable declaration section |
| 65 | | * sequence of statements |
| 66 | | * variable declaration |
| 67 | | * with possible initialization expression |
| 68 | | * possible array extents information and other information modifying type? |
| 69 | | * storage class: automatic, static, extern, ...? |
| 70 | | * expressions (side-effect-free) |
| 71 | | * literals (including named literals): integers, reals, strings, chars |
| 72 | | * variable |
| 73 | | * operators |
| 74 | | * +,-,*,/,%,<,<=,>,>=,==,!=,!,&&, | |, (x?y:z), bitand, bitor, bitxor, lshift, rshift, & (address-of), * (dereference) |
| 75 | | * cast-to-t |
| 76 | | * a[i] (array index) |
| 77 | | * x.a (record navigation) |
| 78 | | * quantifiers: \forall, \exists, \uniform, \sum |
| 79 | | * initializers (special kind of expression used to initialize variables in their decl) |
| 80 | | * sizeof (type, expression, or string literal) |
| 81 | | * startOf(a): &a[0] -- that is the Cil way of converting an array to pointer to first element |
| 82 | | * function invocation f(x) when f is abstract (pure) function |
| 83 | | * expressions with side-effects |
| 84 | | * assignments: x=expr, x++, x--, ++x, --x |
| 85 | | * function invocation f(x) when f is concrete and return type of f is non-null |
| 86 | | |
| 87 | | |
| 88 | | |
| 89 | | == Processing Stages == |
| 90 | | |
| 91 | | The AST can be used to represent the program at different stages of translation. |
| 92 | | |
| 93 | | Stages |
| 94 | | 1. The input source file(s) are pre-processed, creating a stream of tokens with source information |
| 95 | | 1. The token stream is parsed to produce an AST in which all variables, types, etc., are represented simply as identifiers |
| 96 | | 1. Symbol table information is associated to the identifiers in the AST |
| 97 | | 1. (Static) types are created and associated to every expression (use the types in the model package?) |
| 98 | | 1. Variable objects are created and inserted into AST |
| 99 | | 1. Side effects are removed by introducing temp variables |
| 100 | | 1. --Local variables are all lifted to outermost scope?-- Scopes will be added to model layer |
| 101 | | 1. Model is produced |
| 102 | | |
| 103 | | |
| 104 | | Modules |
| 105 | | * CPreProcessor: an ANTLR-generated parser that takes source file and produces token stream, applies pre-processing rules, produces new token stream |
| 106 | | * CParser: an ANTRL-generated parser that takes stream of sourced tokens from pre-processor and produces AST stage 0 |
| 107 | | * AST, ASTNode |