Changes between Initial Version and Version 1 of AST


Ignore:
Timestamp:
04/18/11 11:54:05 (15 years ago)
Author:
Stephen Siegel
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AST

    v1 v1  
     1= Abstract Syntax Tree =
     2
     3Requirements:
     4 * Language neutral: to the extent possible one common AST should be used for C, C++, Fortran(s)
     5 * Should be easy to generate model from AST
     6 * Should be easy to generate AST from parse tree
     7 * AST may go through many processing stages; at each stage it is still an AST
     8
     9Questions:
     10 * How to handle pre-processor macros?   Include in AST?
     11 * How abstract should the AST be?
     12 * Should it contain semantic information, e.g., types, and variables?
     13 * How will it handle things like (foo)*bar: this could be either a cast of *bar to type foo, or it could be the product of foo and bar; you need to know whether foo defines a type, which is some semantic information
     14   * cf. http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y
     15   * approach: just choose one way, then change in later pass when analyzing
     16
     17== Elements of a TASS AST ==
     18
     19See description of CIL AST:
     20 * http://www.cs.berkeley.edu/~necula/cil/api/
     21 * http://www.cs.berkeley.edu/~necula/cil/api/Cil.html
     22
     23Preprocessing: before an AST is created, the source file(s) are preprocessed to create a stream of tokens with complete source information.  This stream is fed to the parser which creates the AST.
     24
     25All AST nodes have source information.
     26
     27Types of AST Node
     28 * identifiers
     29    * name: string
     30 * type node
     31    * void
     32    * integer
     33       * integer sub-types, specified by parameters
     34    * real
     35       * real sub-stypes, specified by parameters (e.g., IEEE754 floating point numbers)
     36    * boolean
     37    * char
     38    * array of t
     39       * and possible extent?
     40    * pointer to t
     41    * record ("struct" in C)
     42       * name
     43       * sequence of fields, each with type and name
     44    * function from T1 to T2
     45    * enumeration type
     46       * name
     47       * sequence of elements
     48 * function declaration node (no body)
     49 * function definition node (body)
     50 * type definition node (typedef...)
     51 * statements (may have label)
     52   * assign (translate x+=a, x*=a, ...)
     53   * assert
     54   * assume
     55   * pragma (any kind of pragma, represented as just a string)
     56      * string
     57      * assert, assume, invariant, input, output, ...
     58   * case statement (select)
     59   * if-then, if-then-else
     60   * while
     61   * for
     62   * until
     63   * break
     64   * continue
     65   * goto
     66   * return
     67   * no-op
     68   * block ({...})
     69      * variable declaration section
     70      * sequence of statements
     71   * variable declaration
     72      * with possible initialization expression
     73      * possible array extents information and other information modifying type?
     74      * storage class: automatic, static, extern, ...?
     75   * expressions (side-effect-free)
     76      * literals (including named literals): integers, reals, strings, chars
     77      * variable
     78      * operators
     79         * +,-,*,/,%,<,<=,>,>=,==,!=,!,&&, | |, (x?y:z), bitand, bitor, bitxor, lshift, rshift, & (address-of), * (dereference)
     80       * cast-to-t
     81       * a[i] (array index)
     82       * x.a (record navigation)
     83      * quantifiers: \forall, \exists, \uniform, \sum
     84      * initializers (special kind of expression used to initialize variables in their decl)
     85      * sizeof (type, expression, or string literal)
     86      * startOf(a): &a[0] -- that is the Cil way of converting an array to pointer to first element
     87      * function invocation f(x) when f is abstract (pure) function
     88   * expressions with side-effects
     89      * assignments: x=expr, x++, x--, ++x, --x
     90      * function invocation f(x) when f is concrete and return type of f is non-null
     91
     92     
     93
     94== Processing Stages ==
     95
     96The AST can be used to represent the program at different stages of translation.
     97
     98Stages
     99 1. The input source file(s) are pre-processed, creating a stream of tokens with source information
     100 1. The token stream is parsed to produce an AST in which all variables, types, etc., are represented simply as identifiers
     101 1. The pragma strings are parsed and processed, and those AST nodes are filled out
     102 1. Symbol table information is associated to the identifiers in the AST
     103 1. (Static) types are created and associated to every expression (use the types in the model package?)
     104 1. Variable objects are created and inserted into AST
     105 1. Side effects are removed by introducing temp variables
     106 1. Model is produced
     107
     108
     109== Modules ==
     110 * CPreProcessor: an ANTLR-generated parser that takes source file and produces token stream, applies pre-processing rules, produces new token stream
     111 * CParser: an ANTRL-generated parser that takes stream of sourced tokens from pre-processor and produces AST stage 0
     112 * AST, ASTNode