J. Xue✬ ✫ ✩ ✪ COMP3131/9102: Programming Languages and Compilers Jingling Xue School of Computer Science and Engineering The University of New South Wales Sydney, NSW 2052, Australia http://www.cse.unsw.edu.au/~cs3131 http://www.cse.unsw.edu.au/~cs9102 Copyright @2018, Jingling Xue COMP3131/9102 Page 257 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Feedback for Assignment 2 • Not required to handle errors. The output is specified precisely in the spec (”successful” or ”unsuccessful”) • The empty program (i.e., ǫ) is legal (as in C, C++ an Java) • The following declaration is (syntactically) legal: void v; • To build a recursive-descent parser, we may need to transform the grammar to eliminate common prefixes and left recursion. But L(The VC grammar) = L(your transformed grammar) COMP3131/9102 Page 258 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Assignment 3 • Modify your recogniser to build a parser (that constructs the AST for the program being compiled) • Important to complete your Assignment 2 on time • The spec will be available on the coming Monday COMP3131/9102 Page 259 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing 2. LL(1) table-driven parsing 3. Parser generators 4. Recursive-descent parsing revisited 5. Error recovery Grammar G Eliminating left recursion & common prefixes The Transformed Grammar G′ Constructing First, Follow and Select Sets for G′ A Recursive-Descent Parser The LL(1) Parsing Table The LL(1) Table-Driving Parser• The red parts done last week • The the blue parts today COMP3131/9102 Page 260 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The micro-English Grammar Revisited 1 〈sentence〉 → 〈subject〉 〈predicate〉 2 〈subject〉 →NOUN 3 | ARTICLENOUN 4 〈predicate〉 →VERB 〈object〉 5 〈object〉 →NOUN 6 | ARTICLENOUN The English Sentence PETER PASSED THE TEST COMP3131/9102 Page 261 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The micro-English Grammar Revisited (Cont’d) • The Leftmost Derivation: 〈sentence〉 =⇒lm 〈subject〉 〈predicate〉 by P1 =⇒lm NOUN 〈predicate〉 by P2 =⇒lm NOUN VERB 〈object〉 by P4 =⇒lm NOUN VERB ARTICLE NOUN by P6 • The Rightmost Derivation: 〈sentence〉 =⇒rm 〈subject〉 〈predicate〉 by P1 =⇒rm 〈subject〉 VERB 〈object〉 by P4 =⇒rm 〈subject〉 VERB ARTICLE NOUN by P6 =⇒rm NOUN VERB ARTICLE NOUN by P2 COMP3131/9102 Page 262 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The Role of the Parser PETER PASSED THE TEST Scanner NOUN1 VERB ARTICLE NOUN2 Parser 〈sentence〉 〈subject〉 〈NOUN〉 PETER 〈predicate〉 VERB PASSED 〈object〉 ARTICLE THE NOUN TEST COMP3131/9102 Page 263 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Two General Parsing Methods 1. Top-down parsing – Build the parse tree top-down: • Productions used represent the leftmost derivation. • The best known and widely used methods: – Recursive descent – Table-driven – LL(k) (Left-to-right scan of input, Leftmost derivation, k tokens of lookahead). – Almost all programming languages can be specified by LL(1) grammars, but such grammars may not reflect the structure of a language – In practice, LL(k) for small k is used • Implemented more easily by hand. • Used in parser generators such as JavaCC 2. Bottom-up parsing – Build the parse tree bottom-up: • Productions used represent the rightmost derivation in reverse. • The best known and widely used method: LR(1) (Left-to- right scan of input, Rightmost derivation in reverse, 1 token of lookahead) • More powerful – every LL(1) is LR(1) but the converse is false • Used by parser generators (e.g., Yacc and JavaCUP). COMP3131/9102 Page 264 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Lookahead Token(s) • Lookahead Token(s): The currently scanned token(s) in the input. • In Recogniser.java, currentToken represents the lookahead token • For most programming languages, one token lookahead only. • Initially, the lookahead token is the leftmost token in the input. COMP3131/9102 Page 265 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Top-Down Parse Of NOUN VERB ARTICLE NOUN TREE 〈sentence〉 ↑ INPUT NOUN ↑ TREE 〈sentence〉 〈subject〉 ↑ 〈predicate〉 INPUT NOUN ↑ Notations: • ↑ on the tree indicates the nonterminal being expanded or recognised • ↑ on the sentence points to the lookahead token – All tokens to the left of ↑ have been read – All tokens to the right of ↑ have NOT been processed COMP3131/9102 Page 266 March 26, 2018 J. Xue✬ ✫ ✩ ✪ TREE 〈sentence〉 〈subject〉 NOUN ↑ 〈predicate〉 INPUT NOUN ↑ TREE 〈sentence〉 〈subject〉 NOUN 〈predicate〉 ↑ INPUT NOUN VERB ↑ COMP3131/9102 Page 267 March 26, 2018 J. Xue✬ ✫ ✩ ✪ TREE 〈sentence〉 〈subject〉 NOUN 〈predicate〉 VERB ↑ 〈object〉 INPUT NOUN VERB ↑ TREE 〈sentence〉 〈subject〉 NOUN 〈predicate〉 VERB 〈object〉 ↑ INPUT NOUN VERB ARTICLE ↑ COMP3131/9102 Page 268 March 26, 2018 J. Xue✬ ✫ ✩ ✪ TREE 〈sentence〉 〈subject〉 NOUN 〈predicate〉 VERB 〈object〉 ARTICLE ↑ NOUN INPUT NOUN VERB ARTICLE ↑ TREE 〈sentence〉 〈subject〉 NOUN 〈predicate〉 VERB 〈object〉 ARTICLE NOUN ↑ INPUT NOUN VERB ARTICLE NOUN ↑ COMP3131/9102 Page 269 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Top-Down Parsing • Build the parse tree starting with the start symbol (i.e., the root) towards the sentence being analysed (i.e., leaves). • Use one token of lookahead, in general • Discover the leftmost derivation I.e, the productions used in expanding the parse tree represent a leftmost derivation COMP3131/9102 Page 270 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Predictive (Non-Backtracking) Top-Down Parsing • To expand a nonterminal, the parser always predict (choose) the right alternative for the nonterminal by looking at the lookahead symbol only. • Flow-of-control constructs, with their distinguishing keywords, are detectable this way, e.g., in the VC grammar: 〈stmt〉 → 〈compound-stmt〉 | if ”(” 〈expr〉 ”)” (ELSE 〈stmt〉)? | break ”;” | continue ”;” · · · • Prediction happens before the actual match begins. COMP3131/9102 Page 271 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Bottom-Up Parse Of NOUN VERB ARTICLE NOUN TREE INPUT NOUN ↑ TREE 〈subject〉 NOUN INPUT NOUN ↑ TREE 〈subject〉 NOUN INPUT NOUN VERB ↑ COMP3131/9102 Page 272 March 26, 2018 J. Xue✬ ✫ ✩ ✪ TREE 〈subject〉 NOUN INPUT NOUN VERB ARTICLE ↑ TREE 〈subject〉 NOUN INPUT NOUN VERB ARTICLE NOUN ↑ TREE 〈subject〉 NOUN 〈object〉 ARTICLE NOUN INPUT NOUN VERB ARTICLE NOUN ↑ Note: What if the parser had chosen 〈subject〉→ARTICLENOUN instead of 〈object〉→ARTICLENOUN? In this case, the parser would not make any further process. Having read a 〈subject〉 andVERB, the parser has reached a state in which it should not parse another 〈subject〉. COMP3131/9102 Page 273 March 26, 2018 J. Xue✬ ✫ ✩ ✪ TREE 〈subject〉 NOUN 〈predicate〉 VERB 〈object〉 ARTICLE NOUN INPUT NOUN VERB ARTICLE NOUN ↑ TREE 〈sentence〉 〈subject〉 NOUN 〈predicate〉 VERB 〈object〉 ARTICLE NOUN INPUT NOUN VERB ARTICLE NOUN ↑ COMP3131/9102 Page 274 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Bottom-Up Parsing • Build the parse tree starting with the the sentence being analysed (i.e., leaves) towards the start symbol (i.e., the root). • Use one token of lookahead, in general. • The basic (smallest) language constructs recognised first, then they are used to discover more complex constructs. • Discover the rightmost derivation in reverse — the productions used in expanding the parse tree represent a rightmost derivation in reverse order COMP3131/9102 Page 275 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing √ 2. LL(1) table-driven parsing 3. Parser generators 4. Recursive-descent parsing revisited COMP3131/9102 Page 276 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Predictive Non-Recursive Top-Down Parsers • Recursion = Iteration + Stack • Recursive calls in a recursive-descent parser can be implemented using – an explicit stack, and – a parsing table • Understanding one helps your understanding the other COMP3131/9102 Page 277 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The Structure of a Table-Driven LL(1) Parser • Input parsed from left to right • Leftmost derivation • 1 token of lookahead source code Scanner Parser Stack Parsing Table LL(1) Table Generator grammar token get next token tree • LR(1) parsers (almost always table-driven) also built this way COMP3131/9102 Page 278 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The Model of an LL(1) Table-Driven Parser INPUT a + b $ LL(1) Parsing Program LL(1) Parsing Table STACK X Y Z $ Output S → T T → XY Z · · · Output: • The productions used (representing the leftmost derivation), or • An AST (Lecture 6) COMP3131/9102 Page 279 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The LL(1) Parsing Program Push $ onto the stack Push the start symbol onto the stack WHILE (stack not empty)DO BEGIN Let X be the top stack symbol and a be the lookahead symbol in the input IF X is a terminalTHEN IF X = a then pop X and get the next token /* match */ ELSE error ELSE /∗ X is a nonterminal ∗/ IF Table[X, a] nonblankTHEN Pop X Push Table[X, a] onto stack in the reverse order ELSE error END The parsing is successful when the stack is empty and no errors reported COMP3131/9102 Page 280 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Building a Table-Driving Parser from a Grammar Grammar G Eliminating left recursion & common prefixes (Slide 255) The Transformed Grammar G′ Constructing First, Follow and Select Sets for G′ Constructing the LL(1) Parsing Table The LL(1) Table-Driving Parser • Follow Slide 255 to eliminate left recursion to get a BNF grammar • Can build a table-driven parser for EBNF as well (but not considered) COMP3131/9102 Page 281 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The Expression Grammar • The grammar with left recursion: Grammar 1: E → E + T | E − T | T T → T ∗ F | T/F | F F → INT | (E) • The transformed grammar without left recursion: Grammar 2: E → TQ Q→ +TQ | − TQ | ǫ T → FR R→ ∗FR | /FR | ǫ F → INT | (E) COMP3131/9102 Page 282 March 26, 2018 J. Xue✬ ✫ ✩ ✪ First and Follow Sets for Grammar 2 • First sets: First(TQ) = First(FR) = {(, i} First(Q) = {(+,−, ǫ} First(R) = {(∗, /, ǫ} First(+TQ) = {+} First(−TQ) = {−} First(∗FR) = {∗} First(/FR) = {/} First((E)) = {(} First(i) = {i} • Follow sets: Follow(E) = {$, )} Follow(Q) = {$, )} Follow(T ) = {+,−, $, )} Follow(R) = {+,−, $, )} Follow(F ) = {+,−, ∗, /, $, )} COMP3131/9102 Page 283 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Select Sets for Grammar 2 Select(E→TQ) = First(TQ) = {(, INT} Select(Q→+ TQ) = First(+TQ) = {+} Select(Q→− TQ) = First(−TQ) = {−} Select(Q→ǫ) = (First(ǫ)− {ǫ}) ∪ Follow(Q) = {), $} Select(T→FR) = First(FR) = {(, INT} Select(R→∗ FR) = First(+FR) = {∗} Select(R→/FR) = First(/FR) = {/} Select(R→ǫ) = (First(ǫ)− {ǫ}) ∪ Follow(T ) = {+,−, ), $} Select(F→INT) = First(INT) = {INT} Select(F→(E)) = First((E)) = {(} COMP3131/9102 Page 284 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The Rules for Constructing an LL(1) Parsing Table For every production of the A→α in the grammar, do: for all a in Select(A→α), set Table[A, a] = α COMP3131/9102 Page 285 March 26, 2018 J. Xue✬ ✫ ✩ ✪ LL(1) Parsing Table for Grammar 2 INT + − ∗ / ( ) $ E TQ TQ Q +TQ −TQ ǫ ǫ T FR FR R ǫ ǫ ∗FR /FR ǫ ǫ F INT (E) The blanks are errors. COMP3131/9102 Page 286 March 26, 2018 J. Xue✬ ✫ ✩ ✪ An LL(1) Parse on Input i+i: INT ⇐⇒ i STACK INPUT PRODUCTION DERIVATION $E i+i$ E→TQ E=⇒lmTQ $QT i+i$ T→FR =⇒lmFRQ $QRF i+i$ F→i =⇒lmiRQ $QRi i+i$ pop and go to next token $QR +i$ R→ǫ =⇒lmiQ $Q +i$ Q→+ TQ =⇒lmi+ TQ $QT+ +i$ pop and go to next token $QT i$ T→FR =⇒lmi+ FRQ $QRF i$ F→i =⇒lmi+ iRQ $QRi i$ pop and go to next token $QR $ R→ǫ =⇒lmi+ iRQ $Q $ Q→ǫ =⇒lmi+ iQ $ $ PARSE TREE E T F i R ǫ Q + T F i R ǫ Q ǫ COMP3131/9102 Page 287 March 26, 2018 J. Xue✬ ✫ ✩ ✪ An LL(1) Parse on an Erroneous Input ”()” STACK INPUT PRODUCTION DERIVATION $E ()$ E→TQ E=⇒lmTQ $QT ()$ T→FR E=⇒lmFRQ $QRF ()$ F→(E) E=⇒lm(E)RQ $QR)E( ()$ pop and go to next token $QR)E )$ ∗ ∗ ∗ Error: no table entry for [E, )] A better error message: expression missing inside ( ) COMP3131/9102 Page 288 March 26, 2018 J. Xue✬ ✫ ✩ ✪ LL(1) Grammars and Table-Driven LL(1) Parsers • Like recursive descent, table-driven LL(1) parsers can only parse LL(1) grammars. Conversely, only LL(1) grammars can be parsed by the table-driven LL(1) parsers. • Definition of LL(1) grammar given in Slide 237 • Definition of LL(1) grammar – using the parsing table: A grammar is LL(1) if every table entry contains at most one production. COMP3131/9102 Page 289 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Why Table-Driven LL(1) Parsers Cannot Handle Left Recursions? • A grammar with left recursion: 〈expr〉 → 〈expr〉 + id | id • Select Sets: Select(〈expr〉+id) = {id} Select(id) = {id} • The parsing table: id $ 〈expr〉 〈expr〉 + id id Table[〈expr〉, id] contains two entries! • Any grammar with left recursions is not LL(1) COMP3131/9102 Page 290 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Why Table-Driven LL(1) Parsers Cannot Handle Left Recursions (Cont’d)? • Eliminating the left recursion yields an LL(1) grammar: 〈expr〉 → id 〈expr-tail〉 〈expr-tail〉 → ǫ | + id 〈expr-tail〉 • Select Sets: Select〈expr〉→id 〈expr-tail〉) = {id} Select(〈expr-tail〉→ǫ) = = {$} Select(〈expr-tail〉→+id 〈expr-tail〉) = {+} • The parsing table for the transformed grammar: id + $ 〈expr〉 id 〈expr-tail〉 〈expr-tail〉 + id 〈expr-tail〉 ǫ COMP3131/9102 Page 291 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Why LL(1) Table-Driven Parsers Cannot Handle Common Prefixes? • A grammar with a common prefix: S → if (E) S | if (E) S else S | s E → e • Select sets: Select(S→if (E) S) = {if} Select(S→if (E) S else S) = {if} • Any grammar with common prefixes is not LL(1) • Eliminating the common prefix does not yield an LL(1) grammar: S → if (E) SQ | s Q → else S | ǫ E → e COMP3131/9102 Page 292 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Why LL(1) Table-Driven Parsers Cannot Handle Common Prefixes (Cont’d)? • Select sets: Select(S→if(E)SQ) = {if} Select(S→s) = {s} Select(Q→elseS) = {else} Select(Q→ǫ) = {else, ǫ} Select(E→e) = {e} • The parsing table: if ( e ) s else $ S if E then SQ s E e Q else Sǫ ǫ • This modified grammar, although having no common prefixes, is still ambiguous. You are referred to Week 6 Tutorial. To resolve the ambiguity in the grammar, we make the convention to select else S as the table entry. This effectively implements the following rule: Match an else to the most recent unmatched then COMP3131/9102 Page 293 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Recognise Palindromes Easily • Grammar: S → (S) | ǫ • Parsing Table: ( ) $ S (S) ǫ ǫ • Try to parse the following three inputs: a. (()) b. (() c. ()) • Cannot design a DFA/NFA to recognise the language L(S) COMP3131/9102 Page 294 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing √ 2. LL(1) table-driven parsing √ 3. Parser generators 4. Recursive-descent parsing revisited COMP3131/9102 Page 295 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The Expression Grammar • The grammar with left recursion: Grammar 1: E → E + T | E − T | T T → T ∗ F | T/F | F F → INT | (E) • Eliminating left recursion using the Kleene Closure Grammar 3: E → T (”+” T | ”-” T )∗ T → F (”*” F | ”/” F )∗ F → INT | “(” E “)” All tokens are enclosed in double quotes to distinguish them for the regular operators: (, ) and ∗ • Compare with Slide 279 COMP3131/9102 Page 296 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Parser Generators (Generating Top-Down Parsers) Grammar G Parser Generator Recursive-Descent Parser LL(k) Parsing Tables Tool Grammar Accepted Parsers and Their Implementation Languages JavaCC EBNF Recursive-Descent LL(1) (with some LL(k) portions) in Java COCO/R EBNF Recursive-Descent LL(1) in Pascal, C, C++,, Java, etc. ANTLR Predicated LL(k) Recursive-Descent LL(k) in C, C++,, Java • These and other tools can be found on the internet • Predicated: a conditional evaluated by the parser at run time to determine which of the two conflicting productions to use Q → if (lookahead is ”else”) else S | ǫ where the condition inside the box resolves the dangling-else problem. COMP3131/9102 Page 297 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Parser Generators (Generating Bottom-Up Parsers) Grammar G Parser Generator LALR(1) Parsing Tables Tool Grammar Accepted Parsers and Their Implementation Languages Yacc BNF LALR(1) table-driven in C JavaCUP BNF LALR(1) table-driven in Java • These and other tools can be found on the internet • Will not deal with LR parsing in this course COMP3131/9102 Page 298 March 26, 2018 J. Xue✬ ✫ ✩ ✪ The JavaCC Spec for Grammar 3 /* * Parser.jj * * The scanner and parser for Grammar 3 * * Install JavaCC from https://javacc.dev.java.net/ * * 1. javacc Parser.jj * 2. javac Parser.java * 3. java Parser */ options { LOOKAHEAD=1; } PARSER_BEGIN(Parser) public class Parser { public static void main(String args[]) throws ParseException { Parser parser = new Parser (System.in); while (true) { System.out.print("Enter Expression: "); System.out.flush(); try { switch (parser.one_line()) { COMP3131/9102 Page 299 March 26, 2018 J. Xue✬ ✫ ✩ ✪ case -1: System.exit(0); default: System.out.println("Compilation was successful."); break; } } catch (ParseException x) { System.out.println("Exiting."); throw x; } } } } PARSER_END(Parser) SKIP : { " " | "\r" | "\t" } TOKEN : { < EOL: "\n" > } TOKEN : /* OPERATORS */ { COMP3131/9102 Page 300 March 26, 2018 J. Xue✬ ✫ ✩ ✪ < PLUS: "+" > | < MINUS: "-" > | < MULTIPLY: "*" > | < DIVIDE: "/" > } TOKEN : { < CONSTANT: ()+ > | < #DIGIT: ["0" - "9"] > } int one_line() : {} { Expr() { return 1; } | { return 0; } | { return -1; } } void Expr() : { } { Term() (( | ) Term())* } COMP3131/9102 Page 301 March 26, 2018 J. Xue✬ ✫ ✩ ✪ void Term() : { } { Factor() (( | ) Factor())* } void Factor() : {} { | "(" Term() ")" } COMP3131/9102 Page 302 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing √ 2. LL(1) table-driven parsing √ 3. Parser generators √ 4. Recursive-descent parsing revisited COMP3131/9102 Page 303 March 26, 2018 J. Xue✬ ✫ ✩ ✪ Reading • Table-driven parsing: pages 186 – 192 of Red Dragon or §4.4.4 of Purple Dragon • JavaCC, ANTLR and COCO/R: available on the Web • JavaCUP also available on the Web http://www.cs.princeton.edu/~appel/modern/java/CUP/ • Error recovery for LL parsers: – Using acceptance sets: ∗ Pages 192 – 195 of Red Dragon / Pages §4.4.5 of Purple Dragon ∗ http://teaching.idallen.com/cst8152/98w/panic_mode.html – Using continuation (pages 136 – 142 of Grune et al’s 2000 compiler book. ISBN: 0-471-97697-0) Next Class: Abstract Syntax Trees and Assignment 3 COMP3131/9102 Page 304 March 26, 2018