Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of AST

Timestamp:: 04/18/11 11:54:05 (15 years ago)
Author:: Stephen Siegel
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

AST

               v1
+= Abstract Syntax Tree =
+Requirements:
+ * Language neutral: to the extent possible one common AST should be used for C, C++, Fortran(s)
+ * Should be easy to generate model from AST
+ * Should be easy to generate AST from parse tree
+ * AST may go through many processing stages; at each stage it is still an AST
+Questions:
+ * How to handle pre-processor macros?   Include in AST?
+ * How abstract should the AST be?
+ * Should it contain semantic information, e.g., types, and variables?
+ * How will it handle things like (foo)*bar: this could be either a cast of *bar to type foo, or it could be the product of foo and bar; you need to know whether foo defines a type, which is some semantic information
+   * cf. http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y
+   * approach: just choose one way, then change in later pass when analyzing
+== Elements of a TASS AST ==
+See description of CIL AST:
+ * http://www.cs.berkeley.edu/~necula/cil/api/
+ * http://www.cs.berkeley.edu/~necula/cil/api/Cil.html
+Preprocessing: before an AST is created, the source file(s) are preprocessed to create a stream of tokens with complete source information.  This stream is fed to the parser which creates the AST.
+All AST nodes have source information.
+Types of AST Node
+ * identifiers
+    * name: string
+ * type node
+    * void
+    * integer
+       * integer sub-types, specified by parameters
+    * real
+       * real sub-stypes, specified by parameters (e.g., IEEE754 floating point numbers)
+    * boolean
+    * char
+    * array of t
+       * and possible extent?
+    * pointer to t
+    * record ("struct" in C)
+       * name
+       * sequence of fields, each with type and name
+    * function from T1 to T2
+    * enumeration type
+       * name
+       * sequence of elements
+ * function declaration node (no body)
+ * function definition node (body)
+ * type definition node (typedef...)
+ * statements (may have label)
+   * assign (translate x+=a, x*=a, ...)
+   * assert
+   * assume
+   * pragma (any kind of pragma, represented as just a string)
+      * string
+      * assert, assume, invariant, input, output, ...
+   * case statement (select)
+   * if-then, if-then-else
+   * while
+   * for
+   * until
+   * break
+   * continue
+   * goto
+   * return
+   * no-op
+   * block ({...})
+      * variable declaration section
+      * sequence of statements
+   * variable declaration
+      * with possible initialization expression
+      * possible array extents information and other information modifying type?
+      * storage class: automatic, static, extern, ...?
+   * expressions (side-effect-free)
+      * literals (including named literals): integers, reals, strings, chars
+      * variable
+      * operators
+         * +,-,*,/,%,<,<=,>,>=,==,!=,!,&&, | |, (x?y:z), bitand, bitor, bitxor, lshift, rshift, & (address-of), * (dereference)
+       * cast-to-t
+       * a[i] (array index)
+       * x.a (record navigation)
+      * quantifiers: \forall, \exists, \uniform, \sum
+      * initializers (special kind of expression used to initialize variables in their decl)
+      * sizeof (type, expression, or string literal)
+      * startOf(a): &a[0] -- that is the Cil way of converting an array to pointer to first element
+      * function invocation f(x) when f is abstract (pure) function
+   * expressions with side-effects
+      * assignments: x=expr, x++, x--, ++x, --x
+      * function invocation f(x) when f is concrete and return type of f is non-null
+== Processing Stages ==
+The AST can be used to represent the program at different stages of translation.
+Stages
+. The input source file(s) are pre-processed, creating a stream of tokens with source information
+. The token stream is parsed to produce an AST in which all variables, types, etc., are represented simply as identifiers
+. The pragma strings are parsed and processed, and those AST nodes are filled out
+. Symbol table information is associated to the identifiers in the AST
+. (Static) types are created and associated to every expression (use the types in the model package?)
+. Variable objects are created and inserted into AST
+. Side effects are removed by introducing temp variables
+. Model is produced
+== Modules ==
+ * CPreProcessor: an ANTLR-generated parser that takes source file and produces token stream, applies pre-processing rules, produces new token stream
+ * CParser: an ANTRL-generated parser that takes stream of sourced tokens from pre-processor and produces AST stage 0
+ * AST, ASTNode