If you know about my slowly ongoing CoinZdense project, you might know I've recently switched from Rust for the native implementation of the core library back to C++ because of my struggles with language bindings in Rust. In my long term roadmap, there was Monte and Elixir, two really interesting languages that I've worked way too little with, so adding them to the list of programming languages to support is going to be a challenge.

I'm a big fan of Object-Capability languages, of Functional programming languages, and of the Actor Model Of Computation, and because CoinZdense is going to include a least-authority sub-key management API, what better languages to (eventually) write Web3.0 bots and Web 3.0 Layer-2 nodes in than an ocap language or a functional actor model language.

For years I've been fantasising about what my ideal language would look like, with bits of E, Haskel, Erlang and C++, and one of the main features I pondered about was a curious little bit of C++ deprecated smart pointer, the auto_ptr, a subject that I wrote a blog post about way back in 2011. Other subjects that I have been pondering about were that while I love the concept of closures, their pervasive use leads to huge monolithic source files.

Mix in principles for safe defaults, and I had quite a stack of features that I felt my ideal language would have to have. While I don't have the time to create a complete general purpose language like Monte or Elixir, the thought came to me that that shouldn't be my goal. My goal is to have CoinZdense and access to any chain that in the future may want to integrate CoinZdense in a language suitable for writing Web 3.0 bots. A Domain Specific Language or DSL. A minimal language with strict design principles and usage patterns that we aren't going to deviate from because deviation would blow up the specs and make the idea that having a CoinZdense centered DSL be more work and effort than trying to make CoinZdense available for Monte and Elixir.

For now it's a path I'm exploring. Maybe this mini language, this DSL wont ever exist and I'll go back to the old plan, but for now it's an idea that I want to explore deeply enough to figure out if it's the right choice.

The below description is incomplete, it outlines some of the core principles that I hope are going to be the backbone of the DSL.

Merg-E

Merg-E (pronounced as "merge", aims to be a simple embeddable language that tries to combine a subset of functional programming with a subset of capability based languages.

Merg-E has no classes or objects, and the name is a play on the word "merge" as merging multiple source files through scoped imports into a closure centric application is an important aspect of the simple language, and the E language is an inspiration for the OCap/POLA design of the language, though Merg-E tries neither to be a full on functional language nor a complete ocap language.

implicit constness and transitive mutability

Everything in Merg-E is implicitly const unless it is explicitly marked as mutable. The only exception to that rule is the ambient keyword that we will look at soon. A function is transitively mutable whenever it is given explicit access to mutable state in the outer scope of the closure where it lives, otherwise it is const. For safety purposes, the user needs to make the mutability status of a function match the mutability that it gets from this transitive rule. failing to do so will result in an exception (interpreter) or a compilation error (compiler). Note that mutable function arguments don't make the function itself mutable, only the outer scope mask of the function fingerprint does. A mutable and a const can't be assigned to each other.

main and ambient

A Merg-E program's root is a single nameless function (main is a language keyword, not a function name). Please note that Merg-E only has 6 top level keywords:

app
ambient
lang
merge
ns
state

Every program starts with the first three of these four keywords.

All other keywords are scoped within the immutable lang.

app ambient lang merge state {
   lang.mutable lang.main ()::{args: ambient.args, entropy: ambient.entropy[1024], workdir: ambient.os.filesystem.home[".MyApp/var"], confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"])} {
      ...
   };
}

The above example may look a bit complicated, but once you understand the basics it becomes easy to read.
The first line tells us Lets walk through what it means. The app keyword tells us this is the start of the application scope. The ambient modifier is a cary MF, it basically represents all ambient authority that your program had at startup as granted to it by the operating system. For a least-authority App this is way too much authority, we get to that later. The lang modifier adds the actual Merg-E language to the scope. Without the ambient, we can't touch anything outside of our process and our actions will be limited to pure computation and gobbling up memory, which isn't very useful. Without lang we won't be able to express anything in the Merg-E language.

The second line looks complicated. It is prefixed with lang.mutable as to indicate that main is to be considered mutable because it gets access to mutable state from ambient. Then follows the main definition. It starts off with lang.main, as we said before, all keywords except for four special ones are defined in lang, the Merg-E language. main is one of these keywords. It tells us the next nameless function coming is the main of the program.
The () is the nameless function that is not allowed to have any arguments. After the function definition the ::{} adds a mapping for main's accessible scoped authority. Because this is main, there are no outer-scope variables that can be exposed except for ambient and lang.

The merge and state keywords are there so the keyworda are made available in the app. If you don't use actors, you can ommit the state keyword. If you arent using modules from your code, you can ommit the merge keyword.

By default lang is carried into the inner scope, not because lang is a special keyword, but because lang is a so-called deep-frozen. By default ambient is NOT carried into the inner scope because ambient is unfrozen mutable state and/or authority. So in order to carry in some needed
authority from ambient, but not too much, the mapping defines how a number of chunks from the ambient tree carry into the inner scope.

It is important to distinguish between read-only, immutable/frozen and deep-frozen. Something that is read-only is read-only to our scope, other scopes might have the ability to write it. Something that is immutable might still carry some authority, and is treated as such. Something that is deep frozen is assumed to carry no authority down into deeper scopes.

Let's look at the four mappings.

args: ambient.args : This makes the commandline arguments available as args, ambient.args is considered to be frozen or read-only
entropy: entropy: ambient.entropy[1024] : This gives the App access to 1024 high entropy chunks of (256 bits of) data.
workdir: ambient.os.filesystem.home[".MyApp/var"] : This gives the inner scope access to the user's work directory for this application. Note that it is full read/write access without any attenuation, but nothing outside of the directory is available. No .. is available.
confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"]) : Like the last one, this gives the inner scope access to a directory, but in this case access to the directory is attenuated to read only access.

No returns, no guarantee of order.

Before we look further into functions and closures, it is important to realize that Merg-E functions don't return anything. Often because they can't because they run in another execution context at the discretion of the execution environment. The easiest way to think about things is to consider each function as a (usually) short lived actor. It is possible to invoke a function as a long-lived actor, more about that later. Instead of returning a return value, Merg-E does allows to give the inner scope access to other named functions or actors defined in the outer scope, we will look into that later too.

Move as default

A big part of Least Authority programming is about minimizing shared mutable state. If you are a long term C++ user, you may remember the auto pointer that is now deprecated because of a property that C++ developers found non-intuitive. An auto pointer used to move ownership when it was assigned on use. In C++ this was called reference stealing. But in Merg-E this isn't stealing. This is the default on invocation unless the mutable state is first explicitly marked as shared.

This isn't the whole story. Function calls can reference shared-mutable-state in two distinct ways. The state can be referenced as a function call argument, or it can be referenced from the closure fingerprint, and there is more than sharing and moving; there is also copying. Note that in Merg-E, a copy is always deep and can thus be expensive.

It is important to note that not all combinations are possible. If a variable is marked to be copied or shared, it can no longer move. For this reason if either the closure fingerprint or the function argument is marked separately as shared, the other one is implicitly implied to need copying.

If we define a variable like this:

   lang.mutable lang.type.uint64 foo = 17;

this variable will get moved on whatever usage.

If instead we make it const:

   lang.type.uint64 foo = 17;

this implicitly const integer will be shared regardless how it is used.

If we want to not share a constant with inner scopes, we should either use a merge for each of these scopes, or we should mark the constant as anchored to this scope and unavailable in any other scope.

   lang.anchored lang.type.uint64 foo = 17;

Mutable state can be marked for copying or sharing or a combination of the two.

   lang.mutable lang.shared lang.type.uint64 foo = 17;

this will make any usage shared.

This, in contrast

   lang.mutable lang.copied lang.type.uint64 foo = 17;

will make any usage copied.

The following will make sharing through function arguments shared while implicitly making any closure fingerprint based sharing use a (deep) copy.

   lang.mutable lang.arg_shared lang.type.uint64 foo = 17;

And this one achieves the opposite

   lang.mutable lang.closure_shared lang.type.uint64 foo = 17;

A more complex but convenient alternative for move, is a move that snaps back to the previous holder at the end of the execution context lifetime.

   lang.mutable lang.borrow lang.type.uint64 foo = 17;

And in the same way as we define shared, we can distinguish between arg and closure based move scenarios:

   lang.mutable lang.arg_borrow lang.type.uint64 foo = 17;

   lang.mutable lang.closure_borrow lang.type.uint64 foo = 17;

So how about at deeper levels? We will dive deeper into this later in this document, but to get an idea, we need to declare it in our function line, or in this example our main:

app ambient lang merge state {
   lang.mutable lang.main ()::{args: ambient.args,
                  entropy: ambient.entropy[1024],
                  workdir: lang.shared[ambient.os.filesystem.home[".MyApp/var"]],
                  confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"])
                 } {
      ...
   };

In this case workdir becomes shared and owned by main.

Named functions

If we want to create a named function/actor, it is similar to our main definition.

   lang.mutable lang.copied lang.types.int64 z = 42;
   lang.mutable lang.shared lang.function foo = (x lang.types.int64, y lang.types.int64)::{z: z} {
     ...
   }
   lang.mutable lang.function bar = (x lang.types.int64)::{res: foo} {
     res.invoke(x, x+7);
   }

Or if we want to make the function into a long lived actor:

   lang.mutable lang.type.map actor_state = {z: 42};
   lang.mutable lang.function _bar = (x lang.types.int64)::{res: foo} {
      res.invoke(x, x+7);
      state.z +=1;
   };
   lang.mutable lang.actor bar = _bar.spawn(actor_state, 4);

merge

When a Merg-E program gets bigger, it is desirable to be able not to have to use one huge source file with a huge number of nestings. Next to this, some form of DRY is desired.
To accomplish this, the merge keyword allows us to do scoped and fingerprinted imports.

   lang.mutable lang.type.map actor_state = {z: 42};
   lang.mutable merge utils.bar as _bar(x lang.types.int64)::{res: foo}{};
   lang.mutable lang.actor bar = _bar.spawn(actor_state, 4);

The merge line replaces the function definition with a scoped fingerprint import. The utils package is looked up in the file system according to rules we will discuss later, and bar is imported from it if it exists with the right fingerprint matching that of the merge line.
Please note the extra {} at the end that we haven't seen before. While in a single file const and deep frozen outer scope variables are implicitly available, in the context of a merge, they are part of the fingerprint and need to be specified.

In the utils package, the function definition looks something like this:

ns utils lang state {
   lang.mutable lang.function_def bar = (x lang.types.int64)::{res: lang.types.int64}{} {
      res.invoke(x, x+7);
      state.z +=1;
   };

}

Note the difference in the fingerprint dict for outer scope vars, instead of a reference to the outer scope variable, a type is used that should match the type referenced in teh merge line.

Assume nothing about parallelism

The Merg-E language is meant to eventually have multiple implementations, from scripted running on top of single threaded single event loop Python (the first Proof Of Concept) app, to a BEAM-bytecode compiled version that leverages the parallelisms in the BEAM VM, to a LLVM IR version, and possibly in later versions the LLVM IR might end up translating the simpler Merg-E functions and actors into NVPTX LLVM IR or to SYCL C++ code that is then linked with the LLVM IR from the main Merg-E code. There is a lot there to figure out, but the important take away is that when writing Merg-E code, don't assume anything about parallelism. Your code may run blocking in a single task on a single thread, or it may run in a massively parallel environment. There are some suggestions about parallelism you may give, but the interpreter or compiler might decide otherwise, they are just hints and should be treated as such.

When you invoke a function, there is no return value, but there is something returned nonetheless. The thing that is returned has a language private type, so you can't directly assign it to anything, you can accumulate them though, which for many purposes is the same thing.

    lang.mutable lang.function.controller c;
    c += bar(17, 88);
    c += bar(42, 18);
    lang.await.any.abandon c;

This code creates a function controller, then it adds the potentially asynchronous running or potentially blocking and then completed function to the controller for later actions. At this point, bar might or might not have completed. It calls bar again with other arguments, again adding the function that might or might not be running to the controller. Then lang.await.any.abandon gets called on c. This is a weird little thing. language features under lang.await are the only features that are guaranteed to be blocking. This specific function blocks until at least one of the functions has completed and then tries to abandon the other functions that haven't completed yet. We say try because as we will see later, a function may be declared atomic and atomic functions, when started, can not be abandoned.

Please note that this single await line will block guaranteed only until one function has completed. It won't wait until all atomic functions are completed. If we want that, we can add one more line:

    lang.await.all c;

This line will wait till all functions are either completed or abandoned.

Actor spawning and parallelism

Remember this line?

    lang.mutable lang.actor bar = _bar.spawn(actor_state, 4);

We are spawning an actor, but what is the number 4 doing there? This number 4 is a parallelism hint. It tells the compiler or interpreter that it may make sense to make the actor consist of four workers sharing the same actor state between them if parallelism is supported.
Alternatively we might have written:

    lang.mutable lang.actor_pool bar = _bar.spawn_pool(actor_state, 4);

In what case not a single actor consisting of four workers with shared actor state, but four actors each with it's own version of the actor state will be spawned.

Exceptions

Each function, including main. has the option to define an error body for error handling.

   lang.mutable lang.function _bar = (x lang.types.int64)::{res: foo} {
      res.invoke(x, x+7);
      state.z +=1;
   }{
       lang.switch state.exception[-1] {
           lang.type.exception.range_error : {
              ...
           };
           lang.type.exception: {
              ...
           }
       }
   }

Note that the error body follows the happy flow body. There is no try catch at arbitrary nesting. This is a design decision that promotes small low authority units of code being their own function.
It is important to note that in the current design, if a function has no error body and is never awaited, the exception will be fatal unless the main has a third catch-all error body for uncouth exceptions.
If an exception is thrown, it is added to state.exception, that is defined as an array, hence the -1 index used above to get the last exception thrown if an exception is thrown from an error body. If the exception type thrown in an error body is not part of the state.exception array yet, the new exception will also get processed by the functions error body, but if the exception type is already part of the array, to prevent eternal loops, the new exception is handled as if there was no error body, what means it will either be handled by the calling function if an await is used, or will end up in the catch-all body of main.

app ambient lang merge state {
   lang.mutable lang.main ()::{args: ambient.args,
                  entropy: ambient.entropy[1024],
                  workdir: lang.shared[ambient.os.filesystem.home[".MyApp/var"]],
                  confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"])
                 } {
      ...
   }{
      ...
   }{
       lang.switch state.exception[-1] {
           lang.type.exception: {
              ...
           }
       }
   }
}

expanding the root namespace

While Merg-E only has 6 keywords to keep the App namespace clean and to allow clean upgrade paths, it is possible to expand the App namespace and slightly reduce verbosity by pulling lang and ambient entries down into the App namespace. This can be done in two ways:

In the app line of the program or ns line of the module.
Using lang.use

Often it is usefull to use a combination of the two. Before we look at this we should first discuss ambient and lang.

ambient

Ambient is an authority tree containing all the ambient authority stuff the OS granted your process. Note that the tree might contain some extra branches or leave nodes depending on your OS and environment. This could be basicly any type of authority from obvious things like the authority to make a TCP connection to an arbitrary IP address and TCP port, or the authority to write files to your home directory, but also less obvious ones like access to the system time and your system's entropy source, or even the ability to know your own process ID.

Given that everything in the ambient tree is marked as a resource, including read-only entities like your process ID, nothing of the tree trickles down implicity.

lang

Like ambient , lang is a tree, but not a tree of resources, a tree of Merg-E concepts. Everything that is a type or keyword in other languages is a node in the tree. lang is not strictly a tree, but it's a dag with links. Again there is no ".." in Merg-E, so links also only point down from the direct parent node of the link.

As an example, lang.if is a link to lang.flow.if and lang.int is a link to lang.types.int that in turn is a a link to lang.types.int_types.int64_t.

high level namespace colapsing and filtering

In the app or ns line we can choose to not encapsulate ambient and lang into the app namespace together with any of the optional keywords. We do this by listing a colapse filter. Note that a colapse filter only colaps one level down.

app ambient[entropy,os] lang[mutable,main,function,flow,attenuate,share,type] merge state {
   mutable main ()::{ args: ambient.args,
                      entropy: entropy[1024],
                      workdir: share.shared[os.filesystem.home[".MyApp/var"]],
                      confdir: attenuate.read_only(os.filesystem.home[".MyApp/etc"])
                    } {
      ...
   }{
      ...
   }{
       flow.switch state.exception[-1] {
           type.exception: {
              ...
           }
       }
   }
}

Note this makes the body slightly less verbose, but only slightly. Please note that a colapse filter for the language applies to one file only.

use

Instead of or next to colapse filters, we have the lang.use as a way to colapse deeper nodes into the app or function namespace.

app ambient[entropy,os] lang[flow,attenuate,share,type,use,mutable] merge state {
   use os.filesystem.home as home;
   use share.shared as shared;
   use os.filesystem.home as home;
   use attenuate.read_only as read_only;
   use type.exception as exception;
   use flow.switch as switch;
   mutable main ()::{ args: ambient.args,
                      entropy: entropy[1024],
                      workdir: shared[home[".MyApp/var"]],
                      confdir: read_only(home[".MyApp/etc"])
                 } {
      ...
   }{
      ...
   }{
       switch state.exception[-1] {
           exception: {
              ...
           }
       }
   }
}

Or we can use just *use" without colapse filters:

app ambient lang merge state {
   lang.use lang.use as use;
   use ambient.entropy as entropy;
   use ambient.os.filesystem.home as home;
   use lang.mutable as mutable;
   use lang.share.shared as shared;
   use os.filesystem.home as home;
   use lang.attenuate.read_only as read_only;
   use lang.type.exception as exception;
   use lang.flow.switch as switch;
   mutable main ()::{ args: ambient.args,
                      entropy: entropy[1024],
                      workdir: shared[home[".MyApp/var"]],
                      confdir: read_only(home[".MyApp/etc"])
                 } {
      ...
   }{
      ...
   }{
       switch state.exception[-1] {
           exception: {
              ...
           }
       }
   }
}

Which one you use is a question of personal taste.

Scopes and more on the reserved keywords subject

While we said earlier that the language has the following six reserved words, this isn't actually fully accurate.

app
ambient
lang
merge
ns
state

To make things clear, we need to understand scope and filename conventions a bit more. Lets start with filename conventions.

The file extention for an application source file is mrg. The file extension for a module is mrm. Each file has what is called a source-file scope that is different for an application than for a module.

In source-file scope the folowing keywords are always defined and are reserved within this scope:

ambient
lang
merge

It is important to note that in source-file scope these keywords are dormant, they are there, they can be designated, but their values or invocation is not available.

A source-file scope also always has one activated reserved keyword, what in the case of an application file is app and in the case of a module file is ns.

We could in theory create an application file that looked like this and while useless, it would be syntaxtically correct:

app {

}

or optionaly with a name:

app myApp {

}

The dame goes for a module file where the name is not optional:

ns mymodule {

}

The accolades in these three examples delimit what is called the top-scope. In these examples the top scope inherits nothing in terms of reserved keywords from the source-file scope, but it has it's own non-transitive reserved keyword named state. Think of state like you would about this or self in many Object Oriented languages.

Now to make the module or app in any way usefull, we need to populate our top scope with keywords or first-level subtree names.

app myApp lang {

}

Now lang is available in the top-scope, and because lang holds no resources, it is implicitly carried down into child scopes. We have absolutely no authority though to work with here, and we can't use any modules using the merge keyword.

app myApp lang ambient merge {

}

At this point we've maximized the number of reserved keywords we get without collapse filters that we discussed before.

About state and error handling

This is still a subject of concern. For now exceptions are defined as type only, but this makes for cumbersome error handling. It's not a problem if the errors can be handled in the direct error body where state is still available. It should not be a problem to have some kind of thrower state available in the catch-all body of main because main is pretty much the tiny God class of any Merge-E App, and I'm considering this. But the POLA problem we run into is when unhandled or loop-escaped exceptions bubble up through awaits to intermediate scope levels. One the one hand we want the await propagated exceptions to have enough info for the caller to be able to do something usefull with them, but on the other hand we don't want error handling to violate POLA in any conceivable way. So far I've been thinking about state trimming implementations, but the more I think about these, the more it seems that no usefull state would ever remain under strict POLA rules unless I implement extremely complicated delegation and tainting tracing and even then, in most practical situations, most state would get trimmed. Thoughts about this are highly welcome as input.

Feedback

I know everything here is still mostly vaporware and rough ideas, but I feel things are starting to fit together. So for Web 3.0 devs on here: Would you use a language like this for your Web3.0 bots and your Web 3.0 Layer 2 nodes? Do you feel the parallel computing agnostic use is usable? Do you think the least authority design principles are sanely outlined in the language outline? I'm not looking for feedback telling me to make it into a full fledged language where there are more ways to do things, but if things are impossible to do right, they need fixing. Any input is greatly appreciated.

Ideas for Merg-E: A least-authority language for Web 3.0