Solved-Project 4 -Solution

$35.00 $24.00

Abstract The goal of this project is to give you some hands-on experience with implementing a small compiler. You will write a compiler for a simple language. You will not be generating assembly code. Instead, you will generate an intermediate representation (a data structure that represents the program). The execution of the program will be…

You’ll get a: . zip file solution

 

 
Categorys:
Tags:

Description

5/5 – (2 votes)

Abstract

The goal of this project is to give you some hands-on experience with implementing a small compiler. You will write a compiler for a simple language. You will not be generating assembly code. Instead, you will generate an intermediate representation (a data structure that represents the program). The execution of the program will be done after compilation by interpreting the generated intermediate representation.

  • Introduction

You will write a small compiler that will read an input program and represent it in an internal data structure. The data structure consists of two parts: (1) a representation of instructions to be executed and (2) a representation of the memory of the program (locations for variables). Instructions are represented by a data structure that includes the operand(s) of the instruction (if any) and that specify the next instruction to be executed. After the data structures are generated by your compiler, your compiler will execute the generated instructions representation by interpreting it. This means that the program will traverse the data structure and at every node it visits, it will execute the node by changing the content of memory locations corresponding to operands and deciding what is the next instruction to execute (program counter). The output of your compiler is the output that the input program should produce. These steps are illustrated in the following gure

1

The remainder of this document is organized as follows:

  1. Grammar De nes the programming language syntax including grammar.

  1. Execution Semantics Describe statement semantics for if, while, switch and print state-ments.

  1. How to generate the intermediate representation Explains step by step how to generate the intermediate representation (data structure). You should read this sequentially and not skip around.

  1. Executing the intermediate representation Basically, you have two options. If you are using C/C++, you should use the code we provide to execute the intermediate representation. If you are using Java, it describes the strict rules to follow in executing the intermediate representation. Those rules will be enforced.

  1. Requirements Lists the programming languages you are allowed to use in your solution (C/C++ or Java) and other requirements.

  1. Grading Describes the grading scheme.

  1. Bonus Project Describes the requirements for the bonus project.

  • Grammar

The grammar for this project is a simpli ed form of the grammar from the previous project, but there are a couple extensions.

program

!

var

section

body

var

section

!

id

list SEMICOLON

id

list

!

ID COMMA id

list j ID

body

!

LBRACE stmt

list RBRACE

stmt

list

!

stmt stmt

list

j stmt

stmt

!

assign

stmt

j

print

stmt j while

stmt j if

stmt j switch

stmt

assign

stmt

!

ID

EQUAL

primary SEMICOLON

assign

stmt

!

ID

EQUAL

expr SEMICOLON

expr

!

primary op

primary

primary

!

ID

j NUM

op

!

PLUS j MINUS j MULT j DIV

print

stmt

!

print ID SEMICOLON

while

stmt

!

WHILE condition body

if

stmt

!

IF

condition

body

condition

!

primary relop primary

relop

! GREATER j LESS j NOTEQUAL

switch

stmt

!

SWITCH ID LBRACE case

list RBRACE

switch

stmt

!

SWITCH ID

LBRACE case

list default

case RBRACE

2

for

stmt

!

FOR LPAREN

assign stmt SEMICOLON condition SEMICOLON RPAREN body

case

list

!

case case

list

j case

case

!

CASE NUM

COLON body

default

case

!

DEFAULT COLON body

Some highlights of the grammar:

  1. Expressions are greatly simpli ed and are not recursive.

  1. There is no type declaration section.

  1. Division is integer division and the result of the division of two integers is an integer.

  1. if statement is introduced. Note that if stmt does not have else.

  1. for statement is introduced. Note that for has a very general syntax similar to that of the C language for loop.

  1. A print statement is introduced. Note that the print keyword is in lower case, but other keywords are all upper-case letters.

  1. There is no variable declaration list. There is only one id list in the global scope and that contains all the variables.

  1. There is no type speci ed for variables. All variables are int by default.

  • Execution Semantics

All statements in a statement list are executed sequentially according to the order in which they appear. Exception is made for body of if stmt, while stmt and switch stmt as explained below. In what follows, I will assume that all values of variables as well as constants are stored in locations. This assumption is used by the execution procedure that we provide. This is not a restrictive assumption. For variables, you will have locations associated with them. For constants, you can reserve a location in which you store the constant (this is like having an unnamed immutable variable).

3.1 Assignment Statement

To execute an assignment statement, the expression on the lefthand side of the equal sign is evalu-ated and the result is stored in the location associated with the righthand side of the expression.

3.2 Expression

To evaluate an expression, the values in the locations associated with the two operands are obtained and the expression operator is applied to these values resulting in a value for the expression.

3

3.3 Boolean Condition

A boolean condition takes two operands as parameters and returns a boolean value. It is used to control the execution of while and if statements. To evaluate a condition, the values in the locations associated with the operands are obtained and the relational operator is applied to these values resulting in a true or false value. For example, if the values of the two operands a and b are 3 and 4 respectively, a < b evaluates to true.

3.4 If statement

if stmt has the standard semantics:

  1. The condition is evaluated.

  1. If the condition evaluates to true, the body of the if stmt is executed, then the next statement following the if in the stmt list is executed.

  1. If the condition evaluates to false, the statement following the if in the stmt list is executed These semantics apply recursively to nested if stmt.

3.5 While statement

while stmt has the standard semantics.

  1. The condition is evaluated.

  1. If the condition evaluates to true, the body of the while stmt is executed. The next statement to execute is the while stmt itself.

  1. If the condition evaluates to false, the body of the while stmt is not executed. The next statement to execute is the next statement following the while stmt in the stmt list.

These semantics apply recursively to nested while stmt. The code block:

WHILE condition

f

stmt list

g

is equivalent to:

label: IF condition

f

stmt list

goto label

g

4

Note that goto statements do not appear in the input program, but our intermediate represen-tation includes GotoStatement which is used in conjunction with IfStatement to represent while and switch statements.

3.6 For statement

The for stmt is very similar to the for statement in the C language. The semantics are de ned by giving an equivalent construct.

FOR ( assign stmt 1 ; condition ; assign stmt 2 )

f

stmt list

g

is equivalent to:

assign stmt 1

WHILE condition

f

stmt list

assign stmt 2

g

For example, the following snippet of code:

FOR ( a = 0; a < 10; a = a + 1; )

{

print a;

}

is equivalent to:

a = 0;

WHILE a < 10

{

print a;

a = a + 1;

}

3.7 Switch statement

switch stmt has the following semantics1:

  • The switch statement in the C language has di erent syntax and semantics. It is also more dangerous!

5

  1. The value of the switch variable is checked against each case number in order.

  1. If the value matches the number, the body of the case is executed, then the statement following the switch stmt in the stmt list is executed.

  1. If the value does not match the number, next case is evaluated.

  1. If a default case is provided and the value does not match any of the case numbers, then the body of the default case is executed and then the statement following the switch stmt in the stmt list is executed.

  1. If there is no default case and the value does not match any of the case numbers, then the statement following the switch stmt in the stmt list is executed.

These semantics apply recursively to nested switch stmt. The code block:

SWITCH var f

CASE n1 : f stmt list 1 g

CASE nk : f stmt list k g

g

is equivalent to:

IF var == n1 f

stmt list 1

goto label

g

IF var == nk f

stmt list k

goto label

g

label:

And for switch statements with default case, the code block:

SWITCH var f

CASE n1 : f stmt list 1 g

CASE nk : f stmt list k g

DEFAULT : f stmt list default g

g

is equivalent to:

6

IF var == n1 f

stmt list 1

goto label

g

IF var == nk f

stmt list k

goto label

g

stmt list default

label:

3.8 Print statement

The statement

print a;

prints the value of variable a at the time of the execution of the print statement.

  • How to generate the code

The intermediate code will be a data structure (a graph) that is easy to interpret and execute. I will start by describing how this graph looks for simple assignments then I will explain how to deal with while statements.

Note that in the explanation below I start with incomplete data structures then I ex-

plain what is missing and make them more complete. You should read the whole explanation.

4.1 Handling simple assignments

A simple assignment is fully determined by: the operator (if any), the id on the left-hand side, and the operand(s). A simple assignment can be represented as a node:

struct AssignmentStatement {

struct ValueNode* left_hand_side;

struct ValueNode* operand1;

struct ValueNode* operand2;

ArithmeticOperatorType op; // operator

}

For assignment without an expression on the right-hand side, the operator is set to OPERATOR NONE and there is only one operand. To execute an assignment, you need the values of the operand(s), apply the operator, if any, to the operands and assign the resulting value of the right-hand side to the left hand side. For literals (NUM), the value is the value of the number. For variables, the value is the last value stored in the location associated with the variable. Initially, all variables are initialized to 0. In this representation, the locations associated with variables as well as the

7

locations in which constants are stored are value nodes ( struct valueNode), a node that contains a value. The assignment statement contains pointers to the value nodes of the left-hand side and the operand(s).

Multiple assignments are executed one after another. So, we need to allow multiple assignment nodes to be linked to each other. This can be achieved as follows:

struct AssignmentStatement {

struct ValueNode* left_hand_side;

struct ValueNode* operand1;

struct ValueNode* operand2;

ArithmeticOperatorType op; // operator

struct AssignmentStatement* next;

}

This structure only accepts ValueNode as operands. To handle literal constants (NUM), you need to create ValueNode for them and initialize the value stored in these value nodes to the constant value. This initialization, as well as the initialization of values in locations associated with variable is done while parsing.

This will now allow us to execute a sequence of assignment statements represented in a linked-list: we start with the head of the list, then we execute every assignment in the list one after the other.

Begin Note It is important to distinguish between compile-time initialization and runtime execution. For example, consider the program

  • b;

{

a = 3;

b = 5;

}

The intermediate representation for this program will have have two assignment statements: one to copy the value in the location that contains the value 3 to the location associated with a and one to copy the value in the location that contains the value 5 to the location associated with b. The values 3 and 5 will not be copied to the locations of a and b at compile-time. End Note

This is simple enough, but does not help with executing other kinds of statements. We consider them one at a time.

4.2 Handling print statements

The print statement is straightforward. It can be represented as

struct PrintStatement

{

struct ValueNode* id;

}

Now, we ask: how can we execute a sequence of statements that are either assignment or print statement (or other types of statements)? We need to put both kinds of statements in a list and not

8

just the assignment statements as we did above. So, we introduce a new kind of node: a statement node. The statement node has a eld that indicates which type of statement it is. It also has elds to accommodate the remaining types of statements. It looks like this:

struct StatementNode {

StatementType type; // NOOP_STMT, GOTO_STMT, ASSIGN_STMT, IF_STMT, PRINT_STMT

union {

struct AssignmentStatement* assign_stmt;

struct PrintStatement* print_stmt;

struct IfStatement* if_stmt;

struct GotoStatement* goto_stmt;

};

struct StatementNode* next;

}

This way we can go through a list of statements and execute one after the other. To execute a particular node, we check its type. If it is PRINT STMT, we execute the print stmt eld, if it is ASSIGN STMT, we execute the assign stmt eld and so on. With this modi cation, we do not need a next eld in the AssignmentStatement structure (as we explained above), instead, we put the next eld in the statement node.

This is all ne, but we do not yet know how to generate the list to execute later. The idea is to have the functions that parses non-terminals return the code for the non-terminals. For example for a statement list, we have the following pseudecode (missing many checks):

struct StatementNode* parse_stmt_list()

{

struct StatementNode* st; // statement

struct StatementNode* stl; // statement list

st = parse_stmt();

if (nextToken == start of a statement list)

{

stl = parse_stmt_list();

append stl to st; // this is pseudecode

return st;

}

else

{

ungetToken();

return st;

}

}

9

And to parse body we have the following pseudecode:

struct StatementNode* parse_body()

{

struct StatementNode* stl;

match LBRACE

stl = parse_stmt_list();

match RBRACE

return stl;

}

4.3 Handling if and while statements

More complications occur with if and while statements. The structure for an if statement can be as follows:

struct IfStatement {

ConditionalOperatorType condition_op;

struct ValueNode* condition_operand1;

struct ValueNode* condition_operand2;

struct StatementNode* true_branch;

struct StatementNode* false_branch;

}

The condition op, condition operand1 and condition operand2 elds are the operator and operands of the condition of the if statement. To generate the node for an if statement, we need to put together the condition, and stmt list that are generated in the parsing of the if statement.

The true branch and false branch elds are crucial to the execution of the if statement. If the condition evaluates to true then the statement speci ed in true branch is executed otherwise the one speci ed in false branch is executed. We need one more type of node to allow loop back for while statements. This is a GotoStatement.

struct GotoStatement {

struct StatementNode* target;

}

To generate code for the while and if statements, we need to put a few things together. The outline given above for stmt list needs to be modi ed as follows (this is missing details and shows only the main steps).

10

struct StatementNode* parse_stmt()

{

create statement node st

if next token is IF

{

st->type = IF_STMT;

create if-node; // note that if-node is pseudecode and is not

// a valid identifier in C, C++ or Java

st->if_stmt = if-node;

parse the condition and set if-node->condition_op, if-node->condition_operand1 and if-node->condition_operand2

if-node->true_branch = parse_body();

// parse_body returns a

pointer to a list of statements

create no-op node

// this is a node that does not result

// in any action being taken

append no-op node to the body of the if

// this requires a loop

to get to the end of

// if-node->true_branch

by following the next field

// you know you reached

the end when next is NULL

// it is very important

that you always appropriately

// initialize fields of any data structures

// do not use uninitialized pointers

set if-node->false_branch to point to no-op node

set st->next to point to no-op node

} else …

}

The following diagram shows the desired structure for the if statement:

11

The stmt list code should be modi ed because of the extra no-op node:

struct StatementNode* parse_stmt_list()

{

struct StatementNode* st; // statement

struct StatementNode* stl; // statement list

st = parse_stmt();

if (nextToken == start of a statement list)

{

stl = parse_stmt_list();

if st->type == IF_STMT

{

append stl to the no-op node that follows st

  • st

  • |

  • V

  • no-op

  • |

  • V

  • stl

}

else

{

append stl to st;

  • st

  • |

  • V

  • stl

}

return st;

}

else

{

ungetToken();

return st;

}

}

Handling while statement is similar. Here is the outline for parsing a while statement and creating the data structure for it:

create statement node st

if next token is WHILE

{

st->type = IF_STMT; // handling WHILE using if and goto nodes

create if-node // if-node is not a valid identifier see

// corresponding comment above

st->if_stmt = if-node

parse the condition and set if-node->condition_op, if-node->condition_operand1 and if-node->condition_operand2

if-node->true_branch = parse_body();

create a new statement node gt

gt->type = GOTO_STMT;

create goto-node

gt->goto_stmt = goto-node;

goto-node->target = st;

  • This is of type StatementNode

  • This is of type GotoStatement

  • to jump to the if statement after

  • executing the body

12

append gt to the body of the while // append gt to the body of the while

// this requires a loop. check the comment

// for the if above.

create no-op node

set if-node->false_branch to point to no-op node

set st->next to point to no-op node

}

The following diagram shows the desired structure for the while statement:

4.4 Handling switch and for statements

You can handle the switch and for statements similarly, but you should gure that yourself. Use a combination of IfStatement and GotoStatement to support the semantics of the switch and for statements. See sections 3.7 and 3.6 for the semantics of the switch and for statements.

  • Executing the intermediate representation

After the graph data structure is built, it needs to be executed. Execution starts with the rst node in the list. Depending on the type of the node, the next node to execute is determined. The general form for execution is illustrated in the following pseudo-code.

13

pc = first node

while (pc != NULL)

{

switch (pc->type)

{

case ASSIGN_STMT: // code to execute pc->assign_stmt …

pc = pc->next

case IF_STMT: // code to evaluate condition …

  • depending on the result

  • pc = pc->if_stmt->true_branch

  • or

  • pc = pc->if_stmt->false_branch

case NOOP_STMT: pc = pc->next

case GOTO_STMT: pc = pc->goto_stmt->target

case PRINT_STMT: // code to execute pc->print_stmt …

pc = pc->next

}

}

C/C++ implementations If you are developing in C/C++, we have provided you with the data structures and the code to execute the graph and you must use it. There are two les compiler.h and compiler.cc, you need to write your code in separate le(s) and #include compiler.h. The entry point of your code is a function declared in compiler.h:

struct StatementNode* parse_generate_intermediate_representation();

You need to implement this function.

The main() function is provided in compiler.cc:

int main()

{

struct StatementNode * program;

program = parse_generate_intermediate_representation();

execute_program(program);

return 0;

}

It calls the function that you will implement which is supposed to parse the program and generate the intermediate representation, then it calls the execute program function to execute the program. You should not modify any of the given code. In fact if you write your program in C/C++, you should only submit the le(s) that contain your own code and we will add the given part and compile the code before testing. If you write your program in Java, you should strictly follow the guidelines for executing the intermediate representation.

JAVA Implementation: If you write your solution in Java, executing the graph should be done non-recursively and without any function calls. Even helper functions are not

14

allowed for the execution of the graph. This is a requirement that will be checked by inspecting your code. Little credit will be assigned if this requirement is not met.

  • Requirements

    1. Write a compiler that generates intermediate representation for the code and write an inter-preter to execute the intermediate representation. The interpreter is provided for C/C++ implementations.

    1. Language: You can use Java, C, or C++ for this assignment.

    1. Any language other than Java, C or C++ is not allowed for this project.

    1. If you use C/C++ for this project, you should use the provided code and only implement the required functions.

    1. If you use Java, you will need to write everything yourself but the requirements on how to execute the intermediate representation will be checked manually when grading.

    1. Platform: As previous projects, the reference platform is CentOS 6.7

    1. You can assume that there are no syntax or semantic errors in the input program.

  • Grading

The test cases provided with the assignment, do not contain any test case for switch and for statements. However, test cases with switch and for statements will be added for grading the project. Make sure you test your code extensively with input programs that contain switch and for statements.

The test cases (there will be multiple test cases in each category, each with equal weight) will be broken down in the following way (out of 100 points):

Assignment statements: 20

If statements: 25

While statements: 25

Switch and For statements: 20

All statements: 10

Note that all test cases depend on successful implementation of print statements (otherwise your program cannot generate correct output). Also, assignment statements are needed for if, while, switch, statement test cases all depend on assignment statements.

15

  • Bonus Project

8.1 Bonus Project Options

You have two options for the bonus project:

  1. Resubmit project 2. For this option you are given another chance to submit project 2. The grade you obtain will be reduced by 30% and replaces the grade you already obtained on project 2. So, if your grade for project 2 was 20 and your grade for the replacement is 90, the grade for project 2 will be changed from 20 to 63 (63 = 90 reduced by 30%)

  1. Do a new project that replaces the lowest grade between project 2 and project 3. The grade you obtain under this option will replace the lower of the two grades that you obtained on project 2 or 3 (if the grade you obtain on the bonus is lower than what you already got on projects 2 and 3, no replacement is made).

If you make multiple submissions, only the last submission will count.

8.2 New Bonus Project (option 2)

Support the following grammar:

program

!

var

section

body

var

section

!

VAR

int

var

decl

array

var

decl

int

var

decl

!

id

list

SEMICOLON

array

var

decl

!

id

list

COLON ARRAY

LBRAC NUM

RBRAC

SEMICOLON

id

list

!

ID COMMA id

list j ID

body

!

LBRACE

stmt

list

RBRACE

stmt

list

!

stmt

stmt

list

j stmt

stmt

!

assign

stmt

j

print

stmt

j while

stmt j

if

stmt j

switch

stmt

assign

stmt

!

var

access

EQUAL

expr

SEMICOLON

var

access

!

ID j ID LBRAC expr RBRAC

expr

!

term PLUS expr

expr

!

term

term

!

factor MULT term

term

!

factor

factor

!

LPAREN

expr

RPAREN

factor

!

NUM

factor

!

var

access

print

stmt

!

print

var

access

SEMICOLON

while

stmt

!

WHILE condition

body

if

stmt

!

IF condition body

condition

!

expr

relop

expr

relop

! GREATER j LESS j NOTEQUAL

switch

stmt

!

SWITCH

var

access

LBRACE

case

list

RBRACE

switch

stmt

!

SWITCH

var

access

LBRACE

case

list

default

case RBRACE

16

case

list

!

case case

list

j case

case

!

CASE NUM

COLON body

default

case

!

DEFAULT COLON body

Note that LBRAC is “[” and LBRACE is “f”. The former is used for arrays and the latter is used for body. Assume that all arrays are integer arrays and are indexed from 0 to size – 1, where size is the size of the array speci ed in the var section after the ARRAY keyword and between “[” and “]”.

The data structures and code that we have provided for the regular assignment will not be enough for the bonus, you will need to modify those to support arrays. Submit all code les for the bonus project (including the modi ed compiler.h and compiler.cc).

All restrictions imposed on the execution of the intermediate representation for the regular project apply to the bonus project as well. You are not allowed to call any functions while executing the intermediate representation. You are not allowed to execute the program recursively.

17