:::' ####:::::' ######::' ######:::::
::::' ##:::::: ##::: ##: ##::: ##::::
::::: ##:::::: ##::: ##: ##::::::::::
::::: ##:::::: ##::: ##: ##:: ###::::
::::. ##:: ##: ##::: ##: ##::. ##::::
:::: ########:. ######::. ######:::::
::::........:::......::::......::::::

Milk Programming Language

Over the past year I have been delving deeper into programming languages. In my journey so far I have created a few languages based on Lox and Monkey. And I am now at a point where I want to create my first “serious” toy language. A language for me to learn more concepts and try out different approaches.

The language I am talking about has the working name “Milk.” Milk is a language that I am currently writing in the programming language C. A language I have limited knowledge in, but a language I have always wanted to learn.

The high level architecture looks something like this.

Milk overview

The Lexer is responsible for taking a stream of characters as input and outputs a flat sequence of tokens. And what is a token? Think of tokens as words in our language. Some tokens are a single character like = or !, others are several characters long, like keywords (if, return, …), numbers (619), identifiers (my_identifier), and string literals ("Hello, World!").

Next up is parsing. Now it is time to convert our flat sequence of tokens into a parse tree, or Abstract Syntax Tree (AST). This is where we compose larger expressions and statements based on the tokens. You could say we give our language its grammar.

There are many ways to write a parser. I am going to go with the approach that is most familiar to me: Recursive Descent Parsing. I have learned that if you are going to write your own parser this approach is one of the simpler ways to go about it. And even though it is “simple”, this way of parsing is still fast and robust. I also want to avoid using a generator (like YACC or ANTLR).

I have only created dynamically typed languages before, therefore I want to try and make Milk statically typed. So after the AST has been created I want to perform static analysis or type checking. Type checking can be performed on dynamically typed languages too, but these are usually performed later at runtime. I want to know what types the variables are at compile time. I will achieve this by specifying variables with what kind of type it is. For example it may look something like this var my_variable: string = "my value!";, here we specify that my_variable is of the type string. Type inference is an approach that allows you to know the type of a variable without specifying it. This is not something I will explore for this project.

Now for this project’s behemoth, the big unknown. I want to use LLVM to output the program as an executable for whatever target architecture one might be on. There are still many question marks regarding how this will be implemented, but I will try for sure.

Anyway, this is the plan. It is not set in stone and will most definitely change during the course of the implementation.

Milk