
In this post I want to elaborate on the files, merging, scoping, name resolution and synchronisation aspects of the Merg-E language. A Domain Specific Language that I'm currently working on as DSL for Web 3.0 bots and L2 nodes. This is not a complete language description of the current draft design, but a focus on these specific five aspects of the language design. In future blog posts I hope to extend on this post with other important language aspects.
Files
A Merg-E program consists of one "mrg" program file plus one or more "mrm" module files. No extended options with directories, just files in a single directory. No packaging system. Merg-E is meant to be a simple DSL for Web 3.0 bots, so we aren't going to tie all kind of fancy real full language packaging or app structure.
- myapp
- myapp.mrg
- utils.mrm
- hive.mrm
- jsonrpc.mrm
Reserved keywords
Reserved keywords are a pain. When a programming language grows, the people behind the language tend to be very conservative regarding adding new keywords because well, a user might have named stuff in his code with exactly that name. In the previous draft Merg-E had a little over a handful of reserved words. We are bring that back to just one: scope. The scope is unique per scope. While the single reserved word might be a bit cumbersome at first, it creates a level of flexibility that is needed for a DSL that is not the primary focus of my to-do list. The DSL is there for the projects that embed it, not as a thing on it's own.
coding style
This may be a hard sell, especialy given the criticism that Python tends to get for it's idea that indentation is semantically meaningful, but in Merg-E proper indentation is required even if it's
not semanically menaingfull. The idea is that unclear code is unsafe code, so we mandate a form of Ratliff style indentation for Merg-E with an indent size of two spaces.
An example:
app myApp lang ambient {
lang.assert lang.version.major == 0 && lang.version.minor == 0;
resolve_order lang.flow, lang.type, lang.share, lang, imported, .;
use ambient {
io.cout as cout;
io.cerr as cerr;
io.endl as endl;
};
mutable main ()::{cout: cout, cerr: cerr, endl: endl}{
int max_prime = 1000;
mutable int counter = 0;
shared mutable int ok_count = 0;
mutable awaitable all_primes;
reentrant mutable merge utils.is_prime as is_prime(x int)::{ok_count: ok_count}{max_prime: max_prime}@[range_error];
while counter < 100 {
counter += 1;
all_primes += is_prime(counter);
}
await_all all_primes;
cout "counted " ok_count " prime numbers" endl;
}!!{
switch scope.exceptions[-1] {
range_error : {
cerr "Something went wrong" endl;
};
};
};
Don't try to grasp everything yet, just note the Ratliff style.
Source-file scope.
The outer scope of a Merg-E program, the scope where the file starts in, is called the source-file scope. The scope object is a tree with a number of primary branches that are defined in source-file scope depending on the type of file (app or module).
| branch | meaning | mrg | mrm |
|---|---|---|---|
| app | define application scope and delegate keywords to it | YES | NO |
| ns | define module scope and delegate keyword to it | NO | YES |
| lang | the no authority non-keyword main part of the language as a tree | inactive | inactive |
| ambient | all of the process it's ambient authority | inactive | NO |
There is not much to do in the source-file scope. We can pass down lang and ambient to the application scope, or we can pass down lang to module scope. That is pretty much it.
Note that we can't decompose or attenuate at this level yet because lang and ambient are unactivated.
App file scope
scope.app myApp scope.lang as lang, scope.ambient as ambient {
...
}
Module file scope.
scope.ns myApp scope.lang as lang, scope.ambient as ambient {
...
}
Name resolution
Let's look at our previous example again.
scope.app myApp scope.lang as lang, scope.ambient as ambient {
...
}
We say: here comes the application scope, the App is called myApp and please make lang and ambient available in the inner (App) scope and activate them.
But they wont exist on the same level in the inner scope. Let's investigate:
| source-file scope | app scope |
|---|---|
| scope.app | scope.imported.app |
| scope.ambient | scope.imported.ambient |
While it is safest to always use the full specified names, name resolution will look in the right place if we don't, and use the original names if we don't specify. So we can rewrite to
app myApp lang, ambient {
...
}
Once inside of application scope we can set name resolution. This is only possible at application and module scope level because resolve_order only exists in the scope at this level
app myApp lang, ambient {
scope.resolve_order scope.imported.lang.flow, scope.imported.lang.type, scope.imported.lang.share, scope.imported, scope;
...
}
If no resolve_order is set, the application scope will act as if the following was defined:
app myApp lang, ambient {
scope.resolve_order scope.imported, scope;
...
}
This also means we can write the first as:
app myApp lang, ambient {
resolve_order lang.flow, lang.type, lang.share, imported, .;
...
}
But one thing we need to remember is why we have a language with only one reserved keyword. Imagine a new version of Merg-E was to extend lang.share with lang.share.ambient. It's not likely that will happen,
but it is possible and it will mess up the working of old code. To avoid this, set imported as first:
app myApp lang, ambient {
resolve_order imported, lang.flow, lang.type, lang.share, imported, .;
...
}
An alternative is to use assert and check that no major language updates have occured:
app myApp lang, ambient {
lang.assert lang.version.major == 0 && lang.version.minor == 0;
resolve_order lang.flow, lang.type, lang.share, imported, .;
...
}
Application scope
In application scope lang and ambient are both unlocked and you could do some program code here, but there are two caveats.
One is that in application scope everything that works is guaranteed to be synchronous, so anything that the runtime thinks must be
asynchronous or parallel will break your program, what is not a good idea if you want your program to work on multiple (current or future) runtimes.
The second caveat is that you are working with full ambient authority here.
So instead of programming directly in application scope, we do something else. At the end of the application scope body, Merg-E will implicitly invoke any main that is defined.
So the intended use for application scope is populating scope.imported with some ambient authority to give main, and to define main in such a way that it will receive this authority through selective capturing.
Please note that lang will always be implicitly captured because it is both deep frozen and const (more on that in a future post).
app myApp lang ambient {
lang.assert lang.version.major == 0 && lang.version.minor == 0;
resolve_order lang.flow, lang.type, lang.share, lang, imported, .;
use ambient {
io.cout as cout;
io.cerr as cerr;
io.endl as endl;
};
mutable main ()::{cout: cout, cerr: cerr, endl: endl}{
};
The use function (lang.use) takes some small chunks from the authority carying ambient and aliases them under scope.imported.
To understand what happens when main gets invoked we need to zoom into scope.imported at this point scope.imported will contain the following:
| name | const | deep-frozen | source |
|---|---|---|---|
| lang | YES | YES | app designation |
| ambient | NO | NO | app designation |
| cout | NO | NO | use |
| cerr | NO | NO | use |
| endl | NO | NO | use |
When main gets invoked, by default all of the outer scope scope.imported that is marked as either const or deep frozen will implicitly be captured into the inner scope, in this case the main scope.
Note that main itself is not in scope.imported. It is important to realize that main doesn't live inside of the application scope or namespace, it lives in its own scope-ending scope where it will be invoked. This means that main can NOT be captured in any closure. This is different for normal functions as we will see later.
Note the structure of the main declaration. The ()::{}{}; that declares the main has three parts. The empty () at the beginning defines function arguments. The main has none, but for notational consistency the () is still there. The last pair of braces are the executable child context. But in between we see the ::{}, in this case written as ::{cout: cout, cerr: cerr, endl: endl}, this part explicitly captures additional outer scope that isn't either const or deep-frozen.
It's not just about least authority. Sometimes we just want to use more convenient aliases so we don't need to use the long names with all the dots everywhere.
app myApp lang ambient {
lang.assert lang.version.major == 0 && lang.version.minor == 0;
resolve_order lang.flow, lang.type, lang.share, lang, imported, .;
use ambient {
io.cout as cout;
io.cerr as cerr;
io.endl as endl;
};
use lang {
data_state.mutable as mutable;
uint64_t as int;
await.all as await_all;
exception.range_error as range_error;
};
mutable main ()::{cout: cout, cerr: cerr, endl: endl}{
};
In this example we expand the application scope with extra aliases that will get captured by main because they are const and deep frozen.
exec scope
The main scope is an example of an exec scope. Merg-E has a number of commands that yield an exec scope:
- main
- function
- def
- lock
and in a way
- merge
- actor
Though merge is more of a wrapper for def and actor is more of a wrapper arround def or function, more on that in later posts.
Basically an exec scope isn't much different from the application scope unless the scope captures from two files as is the case with most def scopes. An exec scope is defined by Merge-E's "no assumprions about parallelism" mantra. Write and use it assuming it might run asynchronously or it might run synchronously. It's up to the runtime. But most of this isn't relevant for scoping.
It is relevant for access though, because anything mutable that isn't explicitly marked as shared is implicitly moved into the inner scope. Unless differently documented, assume that leave nodes from ambient aren't shared, so once you explicitly let them be captured by an inner scope, any attempt to access them will result in an exception.
Given that we are going to do some real coding in an exec scope, one thing we likely will be doing is defining consts and variables.
int max_prime = 100;
mutable int counter = 0;
shared mutable int ok_count = 0;
mutable awaitable all_primes;
The first is a constant, we define max_prime as 100. Because this is a constant, it will get implicitly captured by most inner scopes we can define. There is one exception though, an inner scope that has been merged in. Multi-file merging is contract based and thus implicit capturing is done only for same file inner scope without an explicit const capture syntax.
int max_prime = 100;
mutable int counter = 0;
shared mutable int ok_count = 0;
mutable awaitable all_primes;
mutable merge utils.is_prime as is_prime(x int)::{ok_count: ok_count}{max_prime: max_prime};
The merge function allows us to import a def from another file with a maching contract. in this case one that looks like this:
ns utils lang {
lang.use lang {
type.uint64_t as int;
};
mutable def is_prime = (x int)::{ok_count: int}{max_prime: int} {
...
}
}
As you will understand with contract based merging, implicit capturing of all outer scope const and frozen namespace entries would be rather fragile, so it is defined as single file.
Multi file scope merging
As you may have noticed in the previous section, int in is_prime had to be aliassed a second time inside of the utils module file. This is meant to be the default, though it is possible to do it differently:
We could do:
mutable merge utils.is_prime as is_prime(x int)::{ok_count: ok_count}{int: int, max_prime: max_prime};
and
ns utils {
mutable def is_prime = (x int)::{ok_count: int}{int: type_of_type, max_prime: int} {
...
}
}
This is absolutely not recommended though.
What is important to know though is that we consider trust to be bidirectional. The language itself is meant to be considered more trustworthy than either the outer or the inner scope. So while the contract trumps the source-file scope, we define that name collisions between the contract and the source-file scope of a module ar not permitted. No precedence rules, just a hard compile or load error depending on the runtime.
Syncing scopes
An execution scope does not return. At least it does not return a return value nor something that can be assigned to some type of a future. It does return something, but that something can only be collected.
So far we have seen the shared mutable defined, but let's see it in action to see what it actually allows for.
int max_prime = 1000;
mutable int counter = 0;
shared mutable int ok_count = 0;
mutable awaitable all_primes;
mutable merge utils.is_prime as is_prime(x int)::{ok_count: ok_count}{max_prime: max_prime};
while counter < 100 {
counter += 1;
all_primes += is_prime(counter);
}
await_all all_primes;
cout "counted " ok_count " prime numbers" endl;
The result returned from our exec scope on is_prime (that likely is run asynchronically) gets collected in an awaitable for all 10 invocations.
The await_all (aliassed from lang.await.all) blocks untill all ten execution contexts are completed. This way, in a parallel execution environment, the 10 invocations could complete in parallel, and the number printed would be the correct one.
The combination of the shares mutable with the await function allow scopes to sync.
If there is no parallelism implied, alternative instead of shared we can use borrowed.
int max_prime = 1000;
mutable int counter = 0;
borrowed mutable int ok_count = 0;
mutable awaitable all_primes;
mutable merge utils.is_prime as is_prime(x int, c int)::{}{max_prime: max_prime};
while counter < 100 {
counter += 1;
all_primes += is_prime(counter, ok_count);
lang.await.reclaim.all all_primes;
}
cout "counted " ok_count " prime numbers" endl;
This looks mostly the same but it changes things quite a bit. The borrowed makes that the ok_count gets moved after all, so accessing it should throw an exception, and it will.
Note we no longer let is_prime capture ok_count. Instead we provide it as a second argument allowing borrowing to do it's magic.
The difference with move is that any awaitable tied to the execution context is going to inherit the mutable once the execution context terminates, and any new invocation after that will result in a new borrow.
The lang.await.reclaim.all here is there so sync things up. It is similar to the lang.await.all, but it will reclaim any mutable borrowed by other scopes. So we have a runtime borrowing system in place that relies on synchronisation to work, so we are giving up on being massively parallel by choosing this option.
Now let's say we wanted to know for only one number if it was a prime.
int max_prime = 1000;
borrowed mutable int ok_count = 0;
mutable awaitable all_primes;
mutable merge utils.is_prime as is_prime(x int, c int)::{}{max_prime: max_prime};
all_primes += is_prime(1395679, ok_count);
lang.await.reclaim.all all_primes;
cout "counted " ok_count " prime numbers" endl;
In this case there is a smarter way where move will do the trick:
mutable function report (count int)::{cout: cout, endl: endl}{
cout "counted " ok_count " prime numbers" endl;
};
int max_prime = 1000;
borrowed mutable int ok_count = 0;
mutable awaitable all_primes;
mutable merge utils.is_prime as is_prime(x int)::{report: report}{max_prime: max_prime};
all_primes += is_prime(1395679, ok_count);
The great thing here: We don't need to await anything. Drawback though: report eats cout and endl and is_prime eats report, so we can't access these in any other way. But next to that something anoying happens too that may not be obvious. It is highly likely that the runtime will decide to run is_prime synchronously. Merg-E tels us to make no assumptions about parallelism, and it means it. Part or ambient is going to exist of reentrant code, part of it however won't be reentrant and it will be marked as such. Non-rentrant is transative between mutable functions unless a function defines itself to be reentrant AND proves to be so by wrapping access to a non-renintrant resource in a lock.
reentrant mutable function report (count int)::{cout: cout, endl: endl}{
lock(cout, endl) {
cout "counted " ok_count " prime numbers" endl;
};
};
This breaks the non-reintrancy transitiveness (if cout or endl was non-reintant to begin with), and allows us to ship synchronisation if we don't think we need it. It can allow more parallelism for the runtime, so use of the reentrant prefix is higly encouraged. Please note that lock is to be expected to be asynchronous itself, just like all exec context commands. If you need it to be synchronous, be sure to make it so yourself by adding the return of lock to an awaitable and awaiting it.
error scope
Let us combine everything we've discussed so far into a sample program.
Let's start with our module.
ns utils lang def {
resolve_order lang.flow, lang.type, lang.const, lang.share, lang.data_state, lang, imported, .;
lang.use lang {
type.uint64_t as int
await.all as await_all;
type.exception.range_error as range_error;
};
reentrant mutable def is_prime = (x int)::{ok_count: int}{max_prime: int}@[range_error] {
if x < 1 || x > max_prime {
raise range_error;
}!!{};
mutable int candidate=2;
mutable bool isprime = true;
if x ==1 {
isprime = False;
};
while candidate * candidate <= x {
if x % candidate == 0 {
isprime = False;
};
candidate += 1;
};
if isprime {
mutable blocker lck;
lck += lock(ok_count) {
ok_count +=1;
};
};
await lck;
};
};
The module has a simple def implemented to test if an integer is prime. It is defined as reentrant and it can back this up by its use of lock, but the most important part here is that it can throw an exception. The exception type is also part of the contract.
Given that the exception can get thrown, let's look at our actual program:
lang.assert lang.version.major == 0 && lang.version.minor == 0;
resolve_order lang.flow, lang.type, lang.share, lang, imported, .;
use ambient {
io.cout as cout;
io.cerr as cerr;
io.endl as endl;
};
mutable main ()::{cout: cout, cerr: cerr, endl: endl}{
int max_prime = 1000;
mutable int counter = 0;
shared mutable int ok_count = 0;
mutable awaitable all_primes;
reentrant mutable merge utils.is_prime as is_prime(x int)::{ok_count: ok_count}{max_prime: max_prime}@[range_error];
while counter < 100 {
counter += 1;
all_primes += is_prime(counter);
}
await_all all_primes;
cout "counted " ok_count " prime numbers" endl;
}!!{
switch scope.exceptions[-1] {
range_error : {
cerr "Something went wrong" endl;
};
};
};
We see everything come together here. The complete contract between the merge in the app and the def in the module that must be an exact match, but most importantly the error scope.
The error scope denoted as !!{} inherits the complete scope of the exec scope that it links to with an extra addition to scope, the exceptions list.
Exceptions should be handled locally when possible. We are trying to adhere to POLA, so what bubbles up when an exception isn't cought is minimal. Exceptions bubble up stepwise only on synchronisation. Without synchronisation, if there is no awaitable it is bound to, or if the awaitable was in a scope that just ended and that scope has no awaitable tied to it, an exception will go straight to the main error context.
The scope.exceptions is a list of exceptions where an exception is just a type from the lang.type.exception hierarchy. No messages added, no custom sub types. This is very limited, especially when exceptions aren't handled locally, but we are trying to adhere to POLA, so there is a price for that.
Comming up
In this post I'v addressed the current design of an important part of the Merg-E draft language design, namely the files, merging, scoping, name resolution and synchronisation. I'll need a few more posts to talk about things like freezing, attenuation and decomposition, parallelism models, actors, capability patterns, and a few more. Things are still very fluid on a number of these, and the descriptions in this post are already different from my first post on the subject a week ago.
I'm not going to update that post, it were the most early ideas, but this post is meant as documentation untill I get to the 1.0 of the language specs, so any input I'll get from this post and the upcoming ones, I'm going to update this series as I come along.