| Version 4 (modified by , 15 years ago) ( diff ) |
|---|
Abstract Syntax Tree
Requirements:
- Language neutral: to the extent possible one common AST should be used for C, C++, Fortran(s)
- Should be easy to generate model from AST
- Should be easy to generate AST from parse tree
- AST may go through many processing stages; at each stage it is still an AST
Questions:
- How to handle pre-processor macros? Include in AST?
- How abstract should the AST be?
- Should it contain semantic information, e.g., types, and variables?
- How will it handle things like (foo)*bar: this could be either a cast of *bar to type foo, or it could be the product of foo and bar; you need to know whether foo defines a type, which is some semantic information
- cf. http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y
- approach: just choose one way, then change in later pass when analyzing
Elements of a TASS AST
See description of CIL AST:
Preprocessing: before an AST is created, the source file(s) are preprocessed to create a stream of tokens with complete source information. This stream is fed to the parser which creates the AST.
All AST nodes have source information.
Types of AST Node
- identifiers
- name: string
- type node
- identifier
- this is a type that is just a name, e.g., resulting from a C typedef. This will node will eventually be replaced by a more specific type tree once type definitions are processed.
- void
- integer
- integer sub-types, specified by parameters
- real
- real sub-stypes, specified by parameters (e.g., IEEE754 floating point numbers)
- boolean
- char
- array of t
- and possible extent?
- pointer to t
- record ("struct" in C)
- name
- sequence of fields, each with type and name
- function from T1 to T2
- enumeration type
- name
- sequence of elements
- identifier
- variable definition
- scope
- name (together with scope, this uniquely identifies the variable)
- type
- variable declaration: each declaration has an associated definition. There can be many decls associated to one def. This is a linking thing: a single variable that is to be used in several files will be defined in one file but be declared in all
- scope
- name
- type
- function definition: a function with a body. As with variables, one def. can have many decls, but each decl is associated to at most one definition.
- name
- list of formal names and types
- return type
- statement (body)
- function declaration: no body, just "prototype" in C
- type definition node
- name: string
- theType: type (type being assigned a name)
- statements (may have label)
- assign: lhs:=rhs; Note statements like x+=a, x*=a will be immediately translated to x:=x+1, x:=x*a, etc.
- lhs: LeftHandSideExpression
- rhs: expression
- assert
- expr: Expression
- assume
- expr: Expression
- pragma (any kind of pragma, represented as just a string)
- string: this is an initial form, where the text of the pragma is uninterpreted. After a certain processing stage, TASS pragmas will be parsed and replaced with more specific nodes. Non-tass pragmas will just remain as strings and will be ignored by TASS.
- assert, assume, invariant, input, output, ...
- switch statement: this can be used to model C's switch
- expression: expression that is evaluated
- list of cases
- value
- statement (or something like that)
- if-then, if-then-else
- while
- for
- until
- break
- continue
- goto
- return
- no-op
- block ({...})
- variable declaration section
- sequence of statements
- variable declaration
- with possible initialization expression
- possible array extents information and other information modifying type?
- storage class: automatic, static, extern, ...?
- expressions: these are considered a kind of statement (as in C)
- pure expressions (side-effect-free)
- literals (including named literals): integers, reals, strings, chars
- variable
- operators
- +,-,*,/,%,<,<=,>,>=,==,!=,!,&&, | |, (x?y:z), bitand, bitor, bitxor, lshift, rshift, & (address-of), * (dereference)
- cast-to-t
- a[i] (array index)
- x.a (record navigation)
- quantifiers: \forall, \exists, \uniform, \sum
- initializers (special kind of expression used to initialize variables in their decl)
- sizeof (type, expression, or string literal)
- startOf(a): &a[0] -- that is the Cil way of converting an array to pointer to first element
- function invocation f(x) when f is abstract (pure) function
- expressions with side-effects
- assignments: x=expr, x++, x--, ++x, --x
- function invocation f(x) when f is concrete and return type of f is non-null
- pure expressions (side-effect-free)
- assign: lhs:=rhs; Note statements like x+=a, x*=a will be immediately translated to x:=x+1, x:=x*a, etc.
Processing Stages
The AST can be used to represent the program at different stages of translation.
Stages
- The input source file(s) are pre-processed, creating a stream of tokens with source information
- #include
- #define X 100 /* object-like macro */
- #define min(X,Y) (X<=Y ? X : Y) /* function-like macro */
- #ifdef...#endif
- The token stream is parsed to produce an AST in which all variables, types, etc., are represented simply as identifiers
- The pragma strings are parsed and processed, and those AST nodes are filled out
- Symbol table information is associated to the identifiers in the AST
- (Static) types are created and associated to every expression (use the types in the model package?)
- Variable objects are created and inserted into AST
- Side effects are removed by introducing temp variables
- Model is produced
Modules
- CPreProcessor: an ANTLR-generated parser that takes source file and produces token stream, applies pre-processing rules, produces new token stream
- CParser: an ANTRL-generated parser that takes stream of sourced tokens from pre-processor and produces AST stage 0
- AST, ASTNode
Note:
See TracWiki
for help on using the wiki.
