SQL2EOS tutorial

SQL2EOS is built and developed on a custom made Meta_II meta-compiler, so before explaining SQL2EOS I need to explain Meta_II. If you don’t care about the whole stack and just want to skip to the SQL2EOS, feel free to do so.

I was always fascinated by computer languages and learned many over the years. I also read some thick books about compilers and language design. Not light read. Meta_II was a refreshing surprise as it is very light weight compared to the compiler books I’ve read. It is based on a really short scientific paper:

META II: A Syntax-Oriented Compiler Writing Language.
Schorre, D. V.
In Proceedings of the 1964 19th ACM National Conference,
ACM Press, New York, NY, 41.301-41.3011, 1964.
available as http://doi.acm.org/10.1145/800257.808896

Before reading that paper, I actually found this page which explains Meta_II in depth.

I would like to write a detailed explanation of Meta_II but I want to start from the bottom and work my way up from there. Also, I will assume you know javascript but unlike the other web page or the original paper which assume knowledge of assembly language, I will explain the pseudo-assembler and how to easily understand it.

The whole SQL2EOS code is written in javascript but that is tricky to understand since it is a generic interpreter for a custom made assembly-flavor language. The way it works, is that there is an assembly language program in a string called code and a string with an SQL command called input. These 2 strings are the parameters for the interpreter which is a javascript function. The function runs the interpreter, which deciphers the code and extracts the info from the input. The function returns a string called output which contains the EOSIO smart contract in plain CPP source code based on the /smart-contract-guides/data-persistence from the developers.eos.io web site.

So, without further ado, let’s dive in to the assembler. This assembler is not “real” it is not intended to be compiled to machine code and will never run on a CPU. It is a very simple language that helps glue all the different parts of the Meta_II system together.

1st rule in the assembly language is that all commands appear on a single line preceded with a tab character.

All commands are evaluated sequentially, line by line. Some commands have a parameter, others don’t.

There are some hidden variables in assembler, they are considered global and are few and frequently used.

There are special commands that make the program jump to other lines instead of going to the next line in the file. These jump or goto commands have a string parameter called label which tells the interpreter to look for the label in a line which doesn’t start with a tab character and the line contains nothing else except the label.

Oh, I almost forgot, there is also a function call construct which jumps to a label but uses a stack, a simple javascript array which gets some info pushed to it and when the end of the function is reached – marked by a return command, the stack is popped with info of where to return to and what to do with the result.

Since this is a made up assembler, it is not dealing with the primitive data types of a typical CPU or RAM but our basic building blocks of interest, literal strings, numbers and special identifiers.

Here is a short list of commands in our custom assembler,

CommandA little explainer
TST ‘abc’this command checks if the next part of the input is exactly equal the string parameter, in this example: abc
IDskip white space and if the next part of the input is a legal identifier, save it in a global variable for later use and set the global flag to true
B L12branch to label L12, in other words, jump to the next command after the label L12
BT L13similar to the above, but only if the global flag (called flag obviously) is True
BF L14Branch if flag is False
CL ‘xyz’There is a global variable to assist in editing the next line of output, Copy Literal to that output buffer
CICopy the Identifier from the global variable holding the latest scanned identifier to the output buffer
CLL FUNCMake a function call, save the return location and some additional variables and goto the next command after the label
RReturn from the function call

Now if you want to read some assembly code, go to the https://amiheines.com/meta_ii-playground-page/ and click the line with “ca05 added STORE AND LOAD VARIABLES”

That page is organised from left to right with 3 large text areas,
input – code – output.

You will see the Meta_II compiler which is hard coded on that page.

If you write a suitable input, the interpreter will run automatically 3 seconds after the last key stroke in the input of the code text areas.

Instead of typing an input, you can click on the line above the text area “ia15. meta_ii with loops, use with ca05” and the interpreter will immediately run. No need to wait 3 seconds, since you’re not manually editing but clicking a pre-made input.

The first command is ADR which is like a function call but a special one, it is like the main() function in C or CPP which is the starting point of the program.

The matching special command is END which is the end of the program.

Other commands I neglected to tell you about are:

BEBranch if the error global variable is true to a special error reporting function and exit
OUToutput the global variable output buffer to the actual output string of the program

Now you can read most of the program and make sense of it! Right?

	ADR METAII
METAII
	TST '.SYNTAX'
	BF L1
	ID
	BE
	CL 'ADR '
	CI
	OUT
L2
	CLL ST
	BT L2
	SET

Let me describe the first few lines in the snippet above. The first line is the beginning of the program. The name of the program is METAII and it jumps to the label on the next line with the same name.

There is nothing to do on a label line, so we move to the following line which says to test for a string in the input file, “.SYNTAX”

Well, the input file does start with .SYNTAS so this is working! The input pointer is moved past the string and the flag is set to true.

The next line has a Branch if flag is false, but since the flag is actually true, we ignore this command and move to the next line.

The next line is ID, so we check if there is a “legal” identifier after skipping any white spaces. A legal identifier is like a variable in many computer languages, starting with a letter in the range a-z and followed by optional additional letters or digits. Well, we see a legal identifier in the input:

METAIIWITHLOOPS

We store it for later use in a global variable which is a part of the interpreter for the assembler and move on to the next command.

BE, or Branch if Error is also ignored, there was no error in the previous step, moving on…

CL ‘ADR ‘ will copy the literal “ADR ” to the output buffer.

CI will append the latest identifier “METAIIWITHLOOPS” to the output buffer.

OUT will take the output buffer and copy it to the output string and clear the output buffer, effectively finalizing the output of a line.

L2 is just a label, so ignored,

CLL ST is a function call. The program jumps to the ST label and will return once that call is completed with an R command.

BT L2 means jump if the global flag is true to L2 which we recently passed!

This is the assembler equivalent of a do-while loop:

do {
  st();
} while (flag);

As long as the variable named flag is True, go back and re run the body of the loop, which simply contains the call to the st() function.

Eventually, the flag will be false and the loop will end.

The next command is SET which simply sets the global flag to true.

This is in preparation for the next commands which will be surprised if the flag is not true which is the “normal” state. After all, the loop ended and nothing unusual happened, so SET makes sure the rest of the code agrees.

As you can see, assembly is very verbose and tedious. Programs tend to be very long and not so human friendly.

We had enough of assembly. Now I’m contemplating on where to move next, there are two possibilities. One is to move up in the abstraction stack and discuss the high-level-language which Meta_II definition as well as the SQL2EOS compiler are written in.

The other is to move deeper in the stack to the lowest level – the interpreter which makes the assembler work.

Lets start with the latter. The whole interpreter is written in a single javascript file, https://github.com/amih/META_II/blob/master/index.js and is not too difficult to follow. It has 180 lines and contains a single object called vm.

If you look at the bottom of the file, you will see the META_II key which is the main function taking two variables, an input string and a code string. This function initializes many variables and then enters the main loop which processes the commands, one after another here:

vm.InterpretOp();

The loop stops if the global variable for the exitLevel is true and then the function returns the string in outbuf:

if(vm.exitlevel){ return vm.outbuf; }

Most of the javascript code is broken down to the handling of each of the assembly commands. I hope the code is easy enough to decypher.

The special property of the Meta_II compiler, is that it is actually a whole class of compilers which share the peculiar property that you can produce them as output based on a high-level definition.

If you use the ia15… input and the ca05… code, you will see that the output is very similar to the code. Now the surprising property of Meta_II is that if you replace the code with the new code from the output text area, the interpreter will trigger another run and will produce new output. The new output is identical to the code!!!

If you think of the code in the center as a function transforming the input text to an output text, then this is a stable point where no matter how many times you copy the output to the code area, the result won’t change anymore.

This is the main technique for extending and manipulating the Meta_II code without manually dealing with the assembly implementation. The main focus should be on the high definition of the language which is made of syntax equations mixed with output commands in curly brackets.

I want to very briefly explain this high level language, soon ew reach the SQL2EOS compiler, I promise.

Each Meta_II high-level definition starts with a line with the exact string:
“.SYNTAX” followed by the language name and ends with a line which contains just “.END”, see for example:

.SYNTAX METAIIWITHLOOPS

... some syntax equations here...
... some syntax equations here...
... some syntax equations here...

.END

Between the start and end there are BNF syntax equations, which define the rules of the language we are defining a compiler for.

The equations start with the rule name, followed by an equal sign, followed by the rule definition and finished with a semicolon.

Each one can use other equations in the program. This is not a sequential language but a declarative language. If a rule fails, no harm done, other rules are checked. Once a rule succeeds, there are output commands interspersed which tell the compiler to output either literal strings or use recently identified parts of the source.

Consider the followinf syntax equation:

METAIIWITHLOOPS = '.SYNTAX' .ID {'ADR ' *} $ ST '.END' {'END'};

This line defines a syntax rule, called “METAIIWITHLOOPS”

we will ignore the output commands in {} for now. This rule works successfully if the input has the literal string “.SYNTAX” followed by a legal identifier followed by zero or more expressions matching the ST rule, followed by the string “.END”.

Now, if this rule passes all these tests, it will output a new line with the string: “ADR ” followed by the content of the identifier, and after the zero or more ST rules are satisfied and the “.END” is found, it will output the string “END” in a separate line.

The high level language has a very succinct syntax. It uses $ to signify “zero-or-more” and a / to signify either-or sub-rules.

I hope you understand the connection between the high level language and the low level assembler.

So, finally we get to the SQL2EOS compiler.

This should be easier now that we have all the above under our belt.

If you follow the first four steps in the https://amiheines.com/meta_ii-playground-page/ page, you have created the SQL2EOS compiler. On the left side is the high-level definition of the SQL2EOS compiler, in the center text area is the assembly code of the Meta_II compiler used to make the output which is on the right side.

The Meta_II used is an extended version with 2 additional features.

  1. Saving identifiers to dynamically named variables or to dynamically named arrays
  2. Looping over an array where the body of the loop has an index variable to be able to refer to items in the array

These new features are required to parse and use the table name in different parts of the output and recall the fields and their types in several different places so they can be correctly used in the output.

This is how a simple:

CREATE TABLE posts (postid uint64, title string, createdat uint64)

Is turned into a full blown EOSIO compatible smart contract with a multi_index table, and insert, update and del actions.

The compiler is not production ready. The primary index is hard coded as “user” and there are no secondary indexes.

In the next iteration, the SQL will have multiple TABLE definitions and the Meta_II can have cleaner syntax.

I think this is too long as it is, so I will say, good night!

2,413 Words

Toy compiler in javascript

I’ve always wanted to build my own compiler. It seemed a hard problem to tackle and a growing opportunity for me as a developer. I needed a good reason to actually get my hands dirty and stop reading books and ebooks on how to do it.

The best starting point I wanted to use is META_II. There is something magical about the short program in a completely bizarre language which compiles to itself and can be extended and preserve the property of being able to compile itself.

The first complete compiler I created yesterday is still very limited in its ability and lacks important language features. It is very ugly in its rules and syntax but it works for the use case I need it and it is all mine 🙂

It is not a polished product for general public consumption but very useful and a good starting point for additional extensions and improvements.

Now that I have the basic features I wanted and they actually work, I can put it to use and create a compiler that reads SQL flavor table definitions and converts that to eosio smart contract C++ which can then be further developed or uploaded as is to an EOS compatible blockchain.

The idea of SQL2EOS is that you write simplified SQL CREATE TABLE and it will have all the info it needs to output a valid C++ smart contract code with a multi-index table.

The SQL syntax is cleaner and doesn’t have repeating parts where as the C++ code does have longer, more verbose and confusing links between different parts of the code to reach the same result – a working table for user data.

Anyway, the META_II I extended and melded to my liking is still a work in progress and has some ugly parts. It is fairly easy to rewrite and I will do it soon and explain how, so you can follow along and use it for something else if you want.

The problem with META_II is that you need to be able to think in at least 4 different languages at the same time, this is hard. There is the “substrate language” which in my case is javascript and the usage is pretty straight forward, so not much brain power invested there. The second is the pseudo-assembler which is written on top of the substrate language, I chose to implement it as an interpreter. The javascript runs on the code file which is a text file which superficially looks like assembler but has special op-codes or commands which are processed in a large switch case in a loop in the javascript. Next up in the abstraction chain is the META_II high level language which is a mix of BNF equations that define the input language, intermixed with output commands that will create the output file in the target language.

Oh my! After bolding the different languages and explaining the relationships between them, I think there are actually 5 languages…

Not only that, but you need to be able to reasonably do “dry runs” in the different languages and debug all of them, sometimes with the help of caveman debugging to level the playing field of the multiple languages sitation.

Since it takes time to grok a programming language and feel fluent in it, and since META_II introduces at least 2 languages – the pseudo-assembler and the META_II high level language, it takes time and effort to overcome the initial learning curve.

The worst part is that the 5 languages are not static. Meaning, once you understand them and know what you can do, you realize they need to be extended and you keep changing them to suite your needs. This is not like adding a function, but adding a language concept altogether. Do you need a missing control structure? Are all variables global and you need the mechanism of scoping them to functions? Hint: META_II is brutally minimalistic and you need to add loops, variables, operators and more…

I added 2 things I needed for my task, one is a way to store dynamic arrays of values and the other is a looping construct for the output.

I also separated the stages of collecting info from the source SQL language and outputting the needed C++ smart contract since it wasn’t an easy one-to-one transformation from the SQL to the EOSIO C++ smart contract.

Want to play with the META_II and the demo SQL2EOSIO compiler? Go to this page.

751 Words

Blockchain for IT

Blockchain is a new technology. Bitcoin is the most familiar example of a blockchain, it is the oldest most used one today.

The concept of a blockchain can be described in a simplified way as a collection of “blocks” of data, which are added to a list of such “blocks” over time and recording all the details about data manipulation for the blockchain. The blocks are linked in a special way which makes it virtually impossible to change the data in the blocks after the fact.

Blockchain technology has a reputation of being slow, expensive, complicated to manage and out in the open for all to see.

Most of these first impressions are not necessarily true. EOSIO is a tool which allows installing and managing a blockchain on a private server, fully controlled and owned by one entity, and it can replace the database and the server side code.

Why would you want to replace your database and server side code with new unfamiliar technology, you ask?

Well, there are some interesting unique benefits you can gain by adopting this technology.

181 Words

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start writing!

15 Words