EECS665

Language Specification

Lexical | Syntactic | Types | IR Semantics | Output Semantics

Lexical Details

This section defines lexical details of the Levi language.

Reserved Sequences (Keywords)
Identifiers
Integer Literals
String Literals
Symbol Operators
Comments: Text starting with # up to the end of the line is a comment (except of course if those characters are inside a string literal). For example:
# this is a comment # and so is # this # and so is # this %$!#

The scanner should recognize and ignore comments (there is no COMMENT token).
Whitespace: Spaces, tabs, and newline characters are whitespace. Whitespace separates tokens and changes the character counter, but should otherwise be ignored (except inside a string literal).
Illegal Characters: Any character that is not whitespace and is not part of a token or comment is illegal.
Length Limits: No limit may be assumed on the lengths of identifiers, string literals, integer literals, comments, etc. other than those limits imposed by the underlying implementation of the compiler's host language.

Which Token to Produce

For the most part, the token to produce should be self-explanatory. For example, the + symbol should produce the CROSS token, the - symbol should produce the DASH token, etc. The set of tokens can be found in frontend.hh or in the switch in tokens.cpp. The LCURLY token refers to a left curly brace, {. the RCURLY refers to a right curly brace, }.

The lexical structure of Levi is fairly straightforward, and largely similar to C. There are a few notable details, some of which are departures from C:

The sequence \\(-o-)// produces the THRASH token.
The sequence ... produces the SINK token.
The sequence >> and << produce INPUT and OUTPUT, respectively.
The sequences thrash, and sink, are keywords of the language. They produce the THRASH, and SINK, tokens, respectively.
The sequence file produces the FILE token
The string && and || are NOT in the language . Instead "logical and" is represented by the string and and "logical or" is represented by the string or.

Program Behavior

Additional details (like what the behavior and syntax of the tokens unique to Levi will be specified as future projects approach.

Syntactic Details

This section described the syntax of the Levi language.

Basics

The basic syntax of Levi is designed to evoke a simplified variant C. Levi is a block-structured language, with most blocks delimited by curly braces. Variables and functions may be declared in the global scope, most statements and declarations are delimited by semicolons.

Notable Differences from C

While the canonical reference for Levi syntax is its context-free grammar, there are a couple of "standout" points which deserve special attention for their deviation from C:

Function declarations look like


myfn : (a:int, b:int, c:int) bool { }

instead of

bool myFunction(int a, int b, int c) { }

Statements besides declarations are not allowed in the global scope (i.e. outside of a function body). Thus

i:int;
i = 4;

is not a legal program, but

i:int;
fn : () void {
  i = 4;
}

is legal.

File is a primitive type in Levi, so a variable can be declared of type file.

Levi uses similar stream operators to C++, so a statement like

     ofile << "hello";

or

     ifile >> var;

is legal.

Context-Free Grammar for Levi

/*********************************************************************
 Grammar for programs in the Levi language
 ********************************************************************/
program         ::= globals

globals         ::= globals decl
                | /* epsilon */

decl		::= varDecl SEMICOL
       		 |  fnDecl

varDecl         ::= name COLON type
       		 |  name COLON type ASSIGN initializer

type            ::= datatype
                 |  IMMUTABLE datatype

datatype        ::= primtype LBRACKET INTLITERAL RBRACKET
                 |  primtype

primType        ::= INT
                 |  BOOL
                 |  FILE
                 |  VOID

fnDecl          ::= name COLON LPAREN maybeformals RPAREN type LCURLY stmtList RCURLY

maybeformals   ::= formalsList
                 |  /* epsilon */

formalsList     ::= formalDecl
                 |  formalsList COMMA formalDecl

formalDecl	::= name COLON type

stmtList        ::= stmtList stmt SEMICOL
                |   stmtList blockStmt
                |   /* epsilon */

blockStmt	::= WHILE LPAREN exp RPAREN LCURLY stmtList RCURLY
		| IF LPAREN exp RPAREN LCURLY stmtList RCURLY
		| IF LPAREN exp RPAREN LCURLY stmtList RCURLY ELSE LCURLY stmtList RCURLY

stmt            ::= varDecl
                | loc ASSIGN exp
                | callExp
                | loc POSTDEC
                | loc POSTINC
                | loc OUTPUT exp
                | loc INPUT loc
                | SINK name
                | RETURN exp
                | RETURN

exp             ::= exp DASH exp
                | exp CROSS exp
                | exp STAR exp
                | exp SLASH exp
                | exp AND exp
                | exp OR exp
                | exp EQUALS exp
                | exp NOTEQUALS exp
                | exp GREATER exp
                | exp GREATEREQ exp
                | exp LESS exp
                | exp LESSEQ exp
                | NOT exp
                | DASH term
                | term

callExp         ::=  loc LPAREN RPAREN   // fn call with no args
                | loc LPAREN actualsList RPAREN  // with args

actualsList     ::= exp
                | actualsList COMMA exp
                
term            ::= loc
                | literal
                | THRASH
                | LPAREN exp RPAREN
                | callExp

literal 	: INTLITERAL
                | STRINGLITERAL
                | TRUE
                | FALSE

litList		: literal
		| literal COMMA litList

initializer	: literal
		| LBRACKET litList RBRACKET

loc		::= name
		| loc LBRACKET exp RBRACKET

name           ::= ID

Type System Details

This section defines details of the Levi type system.

Type Promotions

Any type is promotable to itself.

Operands

The following shows the atomic operands and the types that they are assigned:

numeric literals (e.g. 7, 3) are of type int
bool literals (i.e. true, false) are of type bool
string literals are of string type
identifiers have the type of their declaration, which is determined during name analysis

Operators

The operators in the language are divided into the following categories:

logical: not, and, or
arithmetic: plus, minus, times, divide, negation, postincrement, postdecrement
equality: equals, not equals
relational: less than (<), greater than (>), less then or equals (<=), greater than or equals (>=)
indexing: index ([ ])
assignment: assign (=)

The type rules of the language are as follows:

logical operators and conditions are legal if and only if:
- All operands are bool
The result type is bool in legal cases, ERROR otherwise.
arithmetic operations are legal if and only if:
- Operands are both int - the result type is int
In all illegal cases, the result type is ERROR.
relational operations are legal if and only if:
- Both operands are int
The result type is bool in legal cases, ERROR otherwise.
equality operations are legal if and only if:
- Both operands are of the same primitive type
- Neither operand is a function type
- Neither operand is an array
The result type is bool in legal cases, ERROR otherwise.
assignment operations are legal if and only if:
- Both types are the same and the LHS is an lvalue - the result type is that of the LHS
It is LEGAL to assign arrays of the same base type but different lengths regardless if the LHS or RHS is bigger

The result type is that of the LHS operand in legal cases, ERROR otherwise.
in >> operations are legal if and only if:
- The operand is of int or bool .
out << operations are legal if and only if:
- The operand is of type int
- The operand is of type bool
- The operand is of type string
indexing operations are legal if and only if:
- The operand is int
The result type is the base type of the array in legal cases, and ERROR otherwise.
function calls are legal if and only if:
- The identifier is of function type (i.e., the callee is actually a function).
- The number of actuals must match the number of formals.
- The type of each actual must match the type of the corresponding formal.
If the identifier is not a function name, the result type is ERROR. Otherwise, it is the return type of the function even if the arguments are ill-typed.
function returns: The follow rules hold for function returns:
- A return statement in a void function may not have a value.
- A return statement in a non-void function must have a value.
- The return expression must match the function's declared type.
It is LEGAL for a non-void function to skip a return statement. For example, this code is ok:
f : () int { //I should return an int, but I don't }

Compiler Construction

Reserved Sequences (Keywords)

Identifiers

Integer Literals

String Literals

Symbol Operators

Comments

Whitespace

Illegal Characters

Length Limits

Which Token to Produce

Program Behavior

Basics

Notable Differences from C

Context-Free Grammar for Levi