Language Specification
Lexical | Syntactic | Types | IR Semantics | Output Semantics

Lexical Details

This section defines lexical details of the Levi language.

Reserved Sequences (Keywords)

The following character sequences are considered reserved, and should be yield a distinct token (rather than be counted as an identifier).

and
bool
else


false
file
if
immutable


int
or
return
sink


thrash
true
void
while

Identifiers

Any sequence of one or more letters and/or digits, and/or underscores, starting with a letter or underscore, should be treated as an identifier token. Identifiers must not be reserved sequences, but may include a proper substring that is a reserved word, e.g. while1 is an identifier but while is not.

Integer Literals

Any sequence of of one or more digits yields an integer literal token as long as it is not part of an identifer or string.

String Literals

Any string literal (a sequence of zero or more string characters surrounded by double quotes) should yield a string literal token. A string character is either
  • an escaped character: a backslash followed by any one of the following characters:
    1. n
    2. t
    3. a double quote
    4. another backslash
    or
  • a single character other than newline or double quote or backslash.

Examples of legal string literals:

""
"&!88"
"use \n to denote a newline character"
"use \" to  for a quote and \\ for a backslash"
Examples of things that are not legal string literals:
"unterminated
"also unterminated \"
"backslash followed by space: \ is not allowed"
"bad escaped character: \a AND not terminated

Symbol Operators

Any of the following character symbol sequences constitute a distinct token:

=
:
,
+
-


==
>>
>
>=
[
<<


\\(-o-)//
{
<
<=
(
!


!=
--
++
]
}
)


;
...
/
*

Comments

Text starting with # up to the end of the line is a comment (except of course if those characters are inside a string literal). For example:
 
# this is a comment
# and so is # this
# and so is # this %$!#

The scanner should recognize and ignore comments (there is no COMMENT token).

Whitespace

Spaces, tabs, and newline characters are whitespace. Whitespace separates tokens and changes the character counter, but should otherwise be ignored (except inside a string literal).

Illegal Characters

Any character that is not whitespace and is not part of a token or comment is illegal.

Length Limits

No limit may be assumed on the lengths of identifiers, string literals, integer literals, comments, etc. other than those limits imposed by the underlying implementation of the compiler's host language.

Which Token to Produce

For the most part, the token to produce should be self-explanatory. For example, the + symbol should produce the CROSS token, the -  symbol should produce the DASH token, etc. The set of tokens can be found in frontend.hh or in the switch in tokens.cpp. The LCURLY token refers to a left curly brace, {. the RCURLY refers to a right curly brace, }.

The lexical structure of Levi is fairly straightforward, and largely similar to C. There are a few notable details, some of which are departures from C:

  • The sequence \\(-o-)// produces the THRASH token.
  • The sequence ... produces the SINK token.
  • The sequence >> and << produce INPUT and OUTPUT, respectively.
  • The sequences thrash, and sink, are keywords of the language. They produce the THRASH, and SINK, tokens, respectively.
  • The sequence file produces the FILE token
  • The string && and || are NOT in the language . Instead "logical and" is represented by the string and and "logical or" is represented by the string or.

Program Behavior

Additional details (like what the behavior and syntax of the tokens unique to Levi will be specified as future projects approach.

Syntactic Details

This section described the syntax of the Levi language.

Basics

The basic syntax of Levi is designed to evoke a simplified variant C. Levi is a block-structured language, with most blocks delimited by curly braces. Variables and functions may be declared in the global scope, most statements and declarations are delimited by semicolons.

Notable Differences from C

While the canonical reference for Levi syntax is its context-free grammar, there are a couple of "standout" points which deserve special attention for their deviation from C:

  • Function declarations look like
    myfn : (a:int, b:int, c:int) bool { }
    instead of
    bool myFunction(int a, int b, int c) { }
  • Statements besides declarations are not allowed in the global scope (i.e. outside of a function body). Thus
    i:int;
    i = 4;
    

    is not a legal program, but

    i:int;
    fn : () void {
      i = 4;
    }
    

    is legal.

  • File is a primitive type in Levi, so a variable can be declared of type file.
  • Levi uses similar stream operators to C++, so a statement like
         ofile << "hello";
    
    or
         ifile >> var;
    
    is legal.


  • Context-Free Grammar for Levi

    /*********************************************************************
     Grammar for programs in the Levi language
     ********************************************************************/
    program         ::= globals
    
    globals         ::= globals decl
                    | /* epsilon */
    
    decl		::= varDecl SEMICOL
           		 |  fnDecl
    
    varDecl         ::= name COLON type
           		 |  name COLON type ASSIGN initializer
    
    type            ::= datatype
                     |  IMMUTABLE datatype
    
    datatype        ::= primtype LBRACKET INTLITERAL RBRACKET
                     |  primtype
    
    primType        ::= INT
                     |  BOOL
                     |  FILE
                     |  VOID
    
    fnDecl          ::= name COLON LPAREN maybeformals RPAREN type LCURLY stmtList RCURLY
    
    maybeformals   ::= formalsList
                     |  /* epsilon */
    
    formalsList     ::= formalDecl
                     |  formalsList COMMA formalDecl
    
    formalDecl	::= name COLON type
    
    stmtList        ::= stmtList stmt SEMICOL
                    |   stmtList blockStmt
                    |   /* epsilon */
    
    blockStmt	::= WHILE LPAREN exp RPAREN LCURLY stmtList RCURLY
    		| IF LPAREN exp RPAREN LCURLY stmtList RCURLY
    		| IF LPAREN exp RPAREN LCURLY stmtList RCURLY ELSE LCURLY stmtList RCURLY
    
    stmt            ::= varDecl
                    | loc ASSIGN exp
                    | callExp
                    | loc POSTDEC
                    | loc POSTINC
                    | loc OUTPUT exp
                    | loc INPUT loc
                    | SINK name
                    | RETURN exp
                    | RETURN
    
    exp             ::= exp DASH exp
                    | exp CROSS exp
                    | exp STAR exp
                    | exp SLASH exp
                    | exp AND exp
                    | exp OR exp
                    | exp EQUALS exp
                    | exp NOTEQUALS exp
                    | exp GREATER exp
                    | exp GREATEREQ exp
                    | exp LESS exp
                    | exp LESSEQ exp
                    | NOT exp
                    | DASH term
                    | term
    
    callExp         ::=  loc LPAREN RPAREN   // fn call with no args
                    | loc LPAREN actualsList RPAREN  // with args
    
    actualsList     ::= exp
                    | actualsList COMMA exp
                    
    term            ::= loc
                    | literal
                    | THRASH
                    | LPAREN exp RPAREN
                    | callExp
    
    literal 	: INTLITERAL
                    | STRINGLITERAL
                    | TRUE
                    | FALSE
    
    litList		: literal
    		| literal COMMA litList
    
    initializer	: literal
    		| LBRACKET litList RBRACKET
    
    loc		::= name
    		| loc LBRACKET exp RBRACKET
    
    name           ::= ID