August 2025 -- I'm actively working (30 hours per week) on a first public alpha release. I have elaborate documentation and prototypes for various aspects of the tooling, enough for myself to have confidence that I can make it work. However, currently I feel that only the text on this page is refined enough to be shared publicly.
I'm writing a programming language, called Lake, for non-expert, non-professional programmers: lay programmers. A language to serve as an ergonomic tool for real work that a regular person might want to use programming for, such as
With Lake I want to offer lay programmers a language with the polish of a professional language, but tailored to them specifically. This means less powerful but powerful enough, with fewer things to learn, deal with and worry about.
I start with the motivation for and scope of Lake, but mostly this body of text is a discussion of the interesting aspects of the design of Lake. It's a narrative that is supposed to be read start to finish. The target audience are other language designers. By having this discussion out in the open, I have a reference I can share, and timestamped evidence in case I want to claim to have come up with something first. Perhaps it invites others to share thoughts (e.g. via GitHub). I've come to realize the latter is tough, as Lake, by design, directly goes against what many programmers think a language should be or offer.
To software developers. Based on my experience sharing the idea of Lake, chances are you will accidentally fall back to evaluating Lake by the standards and expectations of a professional programmer. If that does happen, I ask you to reevaluate the situation through the lens of a 200 line program being written by a single person, for that single person, and that once good enough will usually never be touched again. Given Lake's sharply defined scope, trade-offs change significantly.
The work one would do with Lake is writing small, personal projects of 10 to 500 LOC, in extremely rare cases 3000 LOC. To run a Lake program you will always only need a single file; there are no modules. Instead, there is an extensive standard library. If people often reuse the same code that is not in the standard library, they can start a discussion on the official forum and the standard library might be extended. There is no event loop, async or concurrency. There is blocking (e.g. 'sleep', or waiting for the user to do something) and the user can pause or stop the program at any time to inspect or debug. The only interop with other systems is via reading and writing text files.
Integrated into the runtime and standard library are a window for rendering and clicking on, keyboard support, a console for printing and prompting, and spreadsheets. Other examples of common use-cases supported by the standard library are working with 2d coordinates (pixels, spreadsheet cells, boardgame locations) and searching in and manipulating text.
The language Clojure shows the power of a well-designed extensive standard library. Given first-class support for common I/O (including parsing common file formats) and other interop being out of scope, the desire for modules - that wouldn't be accepted as part of the standard library - would come from domain specific applications, such as tax schemes or boardgame rules. I consider that the immensely reduced friction from not having modules at all far outweighs the downside of users having to write this specific logic themselves (which often actually would be easier than trying to meld third-party code with your own code, especially third-party code written by other lay programmers), or share this code via copy-pasting.
An assumption is that the target audience has learned the basics of programming before starting with Lake. Lake will be released simultaneously with another project of mine, designed to teach programming to a general audience. I am primarily a software engineer but also have a degree in educational science. Before Lake, I started on this educational project, but came to realize that it would be relatively useless. Even if I manage to teach programming to a general audience, the vast majority of them will not have a fitting language to use afterwards. Mature general languages are either too simplistic (often designed for children, e.g. Scratch) or, for a lay programmer, unnecessarily powerful and excessively confusing and frustrating (e.g. Python, JavaScript). After searching for such a language I failed to find one. Grudgingly I concluded that I would have to create such a language myself (making a practical language is an order of magnitude more involved than making an educational language), and Lake was born.
An important goal for Lake is to make the ability to program useful similar to how to ability to read and write natural language is useful. A major factor for this usefulness is the ability to perform your skill anywhere you go, even if you didn't bring your own material ("Can I borrow a pen for a second?"). To have a single stone that kills that bird and the bird of friction from installing tooling, the primary target for writing and running Lake programs is the browser ("Can I go to this website for a minute?")
A Lake program is run in a Web Worker, so despite the program being single-threaded, it doesn't ruin the responsiveness of your browser. Having a blocking runtime in the browser makes it quite slow, but given the intended use, still fast enough. Native runtimes are on the roadmap, but won't be part of the first release.
Another aspect of natural language that I hope to emulate with Lake is a pleasant collaboration between people at different levels of experience. I can speak with many people using the same language, adapting the advancedness of my speech to the situation.
Natural language is not strictly a tool; it is a wide palette for expressing, for being creative and sharing ideas. You can vary in style and tone as you feel like. Although a programming language requires significantly more rigor and consistency than a natural language, I carefully try to capture some of this spirit in Lake.
Lake caters to the idea that 'lay' primarily says something about wants and needs, and experience is orthogonal to that. The opposite of a layperson is an expert (who may or may not be paid), where the difference lies in the scope they try to master. Think regular person versus racecar driver: different wants and needs for an automobile, but both can be a beginner or experienced.
Lake grows with you: as you gain experience, you can choose to use more advanced constructs. Critically, these more advanced constructs are just denser ways to model the same core concepts, like saying "consequently" instead of "because the other thing happened". Always, an advanced construct can be explained and demonstrated in terms of more basic constructs. Equally critically, constructs of all levels can be combined. This enables a gradual increase for the user with minimal friction.
Advanced constructs are always a choice, never a necessity. In the documentation, all use-case examples show implementations for various levels of experience. To a limited extent tooling can convert advanced constructs to more basic constructs. Within 'the community' I will make a strong effort to foster a culture of non-elitism. The highly expressive advanced constructs will have a gravitational pull on the advanced users that will want to help beginners. To mitigate this, in the beginner section of the forum it will be mandatory for helpers to first provide examples that only use less-advanced constructs.
One of the driving forces behind first my educational project and later Lake is that I love to program socially, with friends and family that aren't familiar with programming. However, using extremely limited languages such as Scratch is unpleasant to me, but using something like JavaScript is too much for them. With Lake, the lesser-experienced person can happily write simple constructs; the more-experienced person can sometimes interject and say: "Hey, look at this cool thing you could also do". Or the lesser-experienced person asks "Whoah, what is that?" and the more-experienced person responds "Ah, let me show you by writing it differently".
As you will see later the advanced constructs do get quite advanced. But since they keep concerning the same familiar concepts, and can be combined with and emulated with more basic ones, there is never a disconnect between levels of experience. As you gain experience, instead of moving away in a straight line, you move along an arm's-reach circle, getting a new perspective on what's in the center.
Lake's syntax and semantics are designed with syntax highlighting strongly in mind. More than is typical for programming languages, information is conveyed to the reader not only via characters and the color of characters, but also via italicization and background color. Besides specifications for syntax and semantics there is also a spec for highlighting.
The latter provides clear guidelines for how to achieve the intended full experience. It will be adequately underspecified: it wouldn't mandate specific colors, or even that highlighting must be done in color. E.g. a text reader can implement the spec using sounds or words.
The use of highlighting is, ultimately, a convenience. A program can be fully understood without highlighting, albeit less easily. The heavy leaning into highlighting is strongly supported by the official tooling, and outside of the official tooling somewhat mitigated by the usually small and simple nature of a Lake program.
There are two parsers: the computer and the human. For both I strive for immediate obviousness. One manifestation of this is zero-lookahead deterministic parsing (the only exception being binary infix operators). For the computer there is not only the grammar but also the parsing process. For example, I can have both a statement and an expression start with the if
keyword. This is still zero-lookahead deterministic: the parser will choose one based on what it expects right now. To the less-experienced human the distinction is made immediately obvious via highlighting.
In the 'function capture' discussion later on you'll see that a function call can be turned into a capture by having a symbol nested arbitrarily deeply in the call arguments. That the call is actually a capture is made immediately obvious to the human by italicizing the function that is being called. This 'spooky action at a distance' is made not so spooky by the nature of Lake programs, and in the worst case, tooling is available to easily see what symbol made the capture a capture, or which call the symbol turned into a capture.
Lake assumes that typically, one thinks in terms of doing things, where mentally 'doing a thing' (e.g. making a move on a chessboard) creates a new, modified thought and doesn't erase any previous thought. People typically have an imperative thinking process, but the information in their mind is immutable. As such, a design challenge for Lake was to support an imperative style of programming with immutable data. For this I created a novel 'access chain' system; more on that later. Access chains, together with heavily leaning into contextual binding (implicitly done, but explicitly communicated via highlighting), allow for ergonomic imperative use of immutable data.
An example of both offering gradually more advanced tools and working imperatively with immutable data are Lake's special constructs that offer the core building blocks of functional programming (map, filter, reduce, piping) in a beginner-friendly, imperative style. More on that later.
In addition to the earlier mentioned aspects of Lake's design, the design tries to find a balance between four main pillars.
Static checking can have three verdicts on a part of the program: it certainly is correct, it certainly is incorrect, or its correctness is uncertain. In case of uncertainty, the static checker will allow it, but signal this uncertainty to the user and introduce a runtime check to guarantee the integrity of the running program.
The user can toggle the visibility of these warnings, per type of warning. In the extreme case, you choose to hide all warnings and static checking will only tell you what will definitely go wrong. To get rid of warnings, the user could also choose to spend effort convincing the typechecker. Importantly, this is a choice, not a necessity.
In case of a runtime error, the program just crashes completely, immediately and loudly. When a program crashes, details are provided and the user can inspect the state of the program identical to debugging, except that stepping forward is no longer possible. By default, operations (operators, functions) either work or crash. The programmer can choose to instead explicitly use a safe version of these operations, in which case a value is always returned, being either a flag for success paired with the sought-after value, or a flag for failure optionally paired with details (see Maybe later on). The programmer can then choose how to proceed by processing this value.
Besides performing flow and scope analysis, static checking is concerned about types. Types in Lake say something about values and structure only. There is no first-class construct to capture behavior, e.g. interfaces, protocols or traits. You could emulate this by requiring, as a structural type, a record with at a specific field some (reference to a) function. For Lake's intended use it would be sufficient to dynamically dispatch based on structure. Pattern matching and especially the Either type make this ergonomic.
Below is an overview of all types in Lake and their hierarchy. Importantly, this is not a subtype hierarchy, only a widening and narrowing hierarchy, which is made possible by all values being immutable.
--: valueless (bottom type)
\#: reference type
+ : compound (non-atomic)
(): will never be shown to the user
* : is a more specific case of a generic type, e.g. String is equivalent to List<Character> is equivalent to Seq<*Character>
Any
├-- Nothing
├-- Never
├─# Function
├─# Box
├─# Regex
├── Boolean
├── Number
├── Character
├── Work in progress: Date types
├── Enum
├─+ Either
│ └─* Maybe
├─+ Record
└─+ (Collection)
├── (Seq)
│ └─* List
| ├─* Tuple
│ └─* String
└── Set
└── Map
The notion of 'bottom type' is an implementation detail and will never be communicated to the user, although they will see Nothing and Never, e.g. as return types of functions. The types Collection and Seq are also implementation details and will never be communicated to the user. Instead, these types will always be communicated as or more narrower types. E.g. the user will never see "Seq", but instead "List, Tuple or String".
Operations (operators, functions) are defined for exactly one type in this hierachy; there is no overloading. Narrower types may also be supplied, and are implicitly widened; there is never implicit coercion 'across' the tree structure, e.g. from number to string (although you can still explicitly coerce). This balances two sets of scales: the power of overloading and coercion with remaining clear and predictable, and the ability to provide strong (meaning narrow) static inference with polymorphic reuse.
Lake does not have undiscriminated unions (typically |
in other languages, e.g. T | K
), because these go against the spirit of the hierarchy tree. In particular, with undiscriminated unions widening rules are not straightforward: instead of widening, you could also get an undiscriminated union. Without undiscriminated unions, it's obvious when widening happens: whenever for two types one doesn't widen into the other, widen to the earliest common ancestor. Instead of undiscriminated unions, use Enum (see later) for groups of known specific values and Either (see later) for groups of known values of a certain type.
For convenience, an alias for a type can be created with e.g. type Multiset<T> : Map<T, Number>
. In this case Multiset<T>
and Map<T, Number>
will always be considered equivalent: if a Multiset<T>
is expected, a Map<T, Number>
may be provided, and vice versa.
However, consider the case of Multiset. A multiset is an unordered collection, where elements may appear multiple times. The native type that covers these semantics most closely would be a Map<T, Number>
, but not all semantics are statically safeguarded. E.g., the values should at least 1, and an integer. It is impossible to express this first-class in the typesystem. A solution is to have special functions, their names prefixed with multiset\
(as per Lake's convention) that perform the operations on a Map<T, Number>
, such that the narrower semantics of a multiset are safeguarded.
Despite carefully creating these functions, it would still be easy to accidentally provide any Map<T, Number>
where a Map<T, Number>
with the narrower semantics of a Multiset was expected. Instead of the semantics of Multiset being guaranteed by the typesystem, they are 'guaranteed' only by user discipline.
Lake has a first-class solution to this problem, akin to type branding or nominal typing. I will discuss this solution in light of Multiset at the end of the document, after having shown all elements that will be used in the Multiset examples.
An Enum in Lake is a very narrow type with a fixed number of named values. Note, @[]
is a tuple literal, .<identifier>
is an enum literal.
type Direction : Enum {
N : @[ 0, -1]
E : @[ 1, 0]
S : @[ 0, 1]
W : @[-1, 0]
}
bind current_direction : .N
Under the hood, a specific Enum is an undiscriminated union of its values. Here Direction would widen into Tuple<Number, Number>
. However, this is never communicated to the user; the user will only see e.g. Direction.
Despite Enums being an undiscriminated union as an implementation detail. Enums aren't made ad-hoc by the static checker when widening, and only literal values and other Enum values (which appear as literals) will statically certainly match in an Enum type, which is intuitive (e.g. when a Direction is expected, you would expect to be able to provide the literal @[1, 0]). So the logic for undiscriminated unions never bubbles up beyond the statically defined Enums, which are typically rare and small, and for Enums and literals this logic is intuitive and clear.
Enum values are available without needing to mention the type (e.g. .N
), although you may (.N#Direction
). In case of clashing this becomes mandatory, e.g.
type A : Enum {foo : 1}
type B : Enum {foo : 2}
bind my_a : .foo#A
bind my_b : .foo#B
Enum values can also be defined without an actual value, in which case the values become nominal values. E.g. type Stage : Enum {pending, transit, done}
. Nominal enum values are unique, e.g. (where =
is the equality-check operator and ;
starts a comment)
type A : Enum { foo }
type B : Enum { foo }
.foo#A = .foo#B ; false (and a static warning that this could never be true)
Something that will have to be documented well is that specific enum values, .N
is printed/appears in tooling as @[0, -1]
, while a nominal value appears as e.g. some_nominal_value#Some_Enum
.
There is tension between the ergonomics of a user not having to #
-qualify their enum literals and having an extensive standard library. I want to have e.g. an enum for common colors, including .red
, but I also want the user to write a type Light : Enum { .red, .green, .blue }
and this userspace .red
without having to write .red#Light
. To solve this, Enum types can be explicitly specified to not make their values directly availble in the 'global namespace'. All standard library enums would be defined like that, e.g. type Color : Enum# { .red @[255, 0, 0] }
, where the #
signals that its literal values must always be qualified. Now the user can happily use the userspace .red
, because it's unambiguous to which enum it belongs, and when needed use .red#Color
.
Where the Enum type is used for groups of fixed, named specific values, the type Either is used for groups of fixed, named (immutable) containers for values of a certain type. It is the first-class way of doing tagged/discriminated unions, e.g. type Result<T, K> = Either |some[T] |none[K]
. Tags can also contain no subtype, e.g. type Option<T> = Either |some[T] |none
. An Either value is a tag with optionally a nested value, e.g. |some[123]
. These values can be compared for equality (with the =
operator), e.g. |some[123] = |some[123]
resulting in true
, and |some[123] = |some[456]
resulting in false
. Like Enum values, Either tags are directly available but require mentioning of the type to prevent clashing, when needed, e.g. |some#Option[123]
.
As Either allows 'empty' tags, you might be inclined to use an Either instead of a nominal-value Enum such as type Stage = Enum {pending, transit, done}
, e.g. as type Stage = Either |pending |transit |stage
. To enforce that implementation of this behavior is always done via Enum, having an Either with only empty tags is not allowed. Note that this also would not match the intended use of an Either; in essence it's a compound type, a group of named containers.
Despite the examples, Result and Option are not out-of-the-box types in Lake (although you could create them in userspace). I decided against this to prevent choice paralysis for the target audience. Instead of having to choose between the common Result and Option types, I wanted to provide a single clear choice: type Maybe<T, K> : Either |some[T] |none[K] |none
. Notice that a tag is allowed to appear twice, once subtyped and once non-subtyped. This can also be written in shorthand (and to prevent duplication errors): type Maybe<T, K> : Either |some |none[K?]
. This way, missing values can always and only be modeled using Maybe. The rest of the language supports optional subtypes for tags first-class, e.g. through pattern matching you can distinguish between |none
, equivalent to |none[]
, and |none[_]
; or if you don't care about how many values there are in the |none
tag, you can write |none[...]
.
Now a user can just write Maybe, but still has to make a choice for the 'none' subtype. To eliminate the need for this choice, Generic types allow default subtypes, e.g. type Maybe<T, K default Nothing> : Either |some[T] |none[K?]
. Now the user can just write Maybe<T>
to emulate an Option (in case of not found, don't care about details). The static checker can even provide helpful suggestions based on a subtype being Nothing; e.g. for a Maybe<T, Nothing>
, the pattern |none[_]
is superfluous because it will never match.
Another source of choice paralysis and inconsistency is optional fields in records. Will you go for an optional field containing a T
, or a non-optional field containing a Maybe<T>
? To remove this dilemma, Lake does not allow optional fields. The consequences are largely mitigated by the useful constructs concerning manipulating records.
Like Enum types, Either types can be defined with a #
to not have tags be available without #
-qualifying. Because of its ubiquity, Maybe is the only standard library Either which does make its tags directly available.
The type Nothing is one of two types that have no values, the other being Never. Besides the previous Maybe example, Nothing is used to specify that a function does return, but does not return a value. The type Never is used to specify that a function does not return at all (crash, infinite loop); this is useful for flow analysis.
Note that despite having a Nothing type, a user is still forced to model potentially missing values with Maybe, because Nothing has no values. You could create a type Option<T> : Either |value[T?]
, but there is so much first-class support for Maybe (discussed later) that you will likely always use Maybe.
The type Seq is used under the hood to model all sequential types. It is both generic and variadic. It takes zero or more positional types, and finally optionally *
followed by a type for a range of zero or more values. E.g. List<T>
is equivalent to Seq<*T>
(any number of values of type T), Tuple<T, K>
is equivalent to Seq<T, K>
(first position is a T, second position is a K, no further elements).
Seq would allow for powerful types like AtLeast<n, T>
, where n is some literal integer, e.g. AtLeast<2, T>
would be equivalent to Seq<T, T, *T>
. Operations on Seq like the add-last binary operator <+
would then allow for powerful static deductions, e.g. List<T> <+ T
resulting in AtLeast<1, T>
.
However, after consideration, I decided to keep these powerful deductions strictly under the hood. For the target audience these would very rarely be truly useful, yet appear often. Instead, Lake communicates more simply using only List, Tuple and String, where any Seq with a * is communicated as a List, without a * as a Tuple, and ListList<T> <+ T
remains List<T>
. Importantly, the static checking that could be done with Seq are limited to what can be communicated to the user without the abstraction leaking through.
All first-class operations on lists you might expect from a language are in Lake defined for Seq instead. This makes them work automatically for Tuple and String as well, including mixing of List, Tuple and String, with appropriate widening rules. E.g. Tuple<T, U> <+ V
results in Tuple<T, U, V>
, "foo" <+ 1
results in List<Any>
. E.g. for the concatenate operator <>
, List<T> <> Tuple<T, T>
results in List<T>
; List<T> <> Tuple<T, U>
results in List<V>
where V is the earliest common ancestor of T and U; String <> String
results in String
.
Because widening only happens when necessary, e.g. if you pass a Tuple to the function map
to change all elements by applying the same function to each element, then the static checker knows this call to map
results in a Tuple of the same length as the Tuple you passed in. If you pass in a List, the result is a List.
Tuples (Seqs with only fixed-position subtypes) widen into lists, but not into tuples of a different size. You might expect e.g. my_2d_function(@[1, 2, 3])
to be fine, because if a tuple of length 2 is expected, I can provide a tuple of length 3, because it's at least what was asked for. The problem is that if you were to then ask for e.g. the size of a tuple, it might be different from what you think is possible and would be confusing. Instead, you could use e.g. the 'remove-last' postfix operator <-
to get a tuple of the right size, e.g. my_2d_function(my_3d_tuple<-)
.
Some might say that being able to perform an operation on a type and get a new type is bad (e.g. Tuple<Number, Number, Number> <-
resulting in Tuple<Number, Number>
) because of type explosions (causing unreadable hints and errors) and 'loose' types (causing unexpected behavior). However, given the intended use and audience for Lake, the upsides (e.g. being able to my_2d_function(my_3d_tuple<-)
) far outweigh the downsides. In almost all Lake programs there won't be long chains of operations, so no explosions. Lake values are immutable, so the type of a value never changes, and for all places you can keep a value it is known which type is allowed. More on that right now.
Mutability in Lake is handled explicitly, almost always using a binding; in rare, advanced cases using a Box (more on that later). You create a lexical binding (bind a value to a name), which follows common lexical scoping rules, with bind name : value
. To change the value bound to the name, you rebind name : new_value
. However, bindings are immutable by default, and can only be a valid rebind
target if they are explicitly marked as mutable: bind mutable count : 1
, rebind count : count + 1
.
To allow for convenient 'updates', rebind
provides an implicit binding to its RHS expression: it
. it
is bound to the 'current value', so you can do e.g. rebind count : it + 1
. Note that it
is just an expression, and can be placed anywhere, e.g. rebind x : 10 - it
, rebind foo : some_function(it)
. To make clear what this implicit binding means, it
is always given a background color, and what it is bound to is given the same background color, e.g.
rebind foo : some_function(it)
xxx xx
Here foo
and it
are given the same background color.
When you try to rebind
and encounter an error because your rebind target is not mutable, if after consideration you decide that you certainly do want this binding to be mutable, you can press a hotkey to let the IDE write mutable
for you in the right place, while you keep your eyes where they were.
With immutable-by-default bindings, static checking is generally more powerful, and with the hotkey, when you do need a mutable binding the friction is quite small.
A primary goal of Lake is to also facilitate an empirical style. A common pattern in any style is that the (initial) value of a binding or variable is conditional. Lake offers a ternary expression for this (if ... then ... else ...
) and more advanced constructs, but to enable a common empirical pattern, you can can do the equivalent of separating declaration and initialization. You can declare name
or declare mutable name
and later bind initial name : value
. Static checking provides appropriate errors and warnings for this mechanism.
The type of values a binding can hold (or rather, a name can be bound to) is fixed upon the initial binding (usually bind
, otherwise bind initial
), and is always inferred from its RHS (right-hand side) expression. In case of multiple bind intial
s, the binding's type will an appropriate widening.
Given Lake's simple type system, inference should not feel like confusing magic in most cases. If the user is bamboozled, type hinting will always explain how a type was inferred.
What is notoriously annoying in statically typed languages with inference is having to explicitly specify the type of empty lists and maps. To make this minimally annoying in Lake, there is syntax for empty list and map literals that directly specifies their type. Instead of []
, which, if there is no context to infer anything from, defaults to a List<Any>
, you can write e.g. [empty T]
. The empty
is there to make it unambiguous that a type should follow, both to the user and the parser, and T can be clearly highlighted as not a value but a type. Similarly for maps, instead of @{}
you can write @{empty K : V}
, for sets @set{empty T}
and for Either values e.g. |none[empty T]
. Note that these specifically-typed empty literals can be used anywhere, not only as the RHS of a bind
.
Another option is to let the subtype of collection be inferred. This would be possible if the static checker is certain a value is added to the collection, e.g.
bind mutable numbers : []
if foo { rebind numbers : it <+ 1 }
If you want your binding to hold a specific, wider type than the type of the RHS value, then you can use the widen
annotation on the RHS expression. E.g. bind my_list : widen List<Any> : [1, 2, 3]
. It will be statically checked if this widen
is possible, e.g. widen Number : true
will result in a static error.
The widen
annotation can be used on any expression, not only the RHS of a bind
. Similarly, there is narrow
. Recall that when the static checker always checks for matching types: if neither widening nor narrowing is possible, static error; if widening is possible, implicit widening; if narrowing is possible, warning and runtime check. So static checker already knows everything it needs to know, no narrow is strictly required
. narrow
, then, is an explicit acknowledgement by the programmer, and removes the warning. Of course, despite the explicit narrow
, it will still be statically checked if the narrow
is possible, and a runtime check is performed to assert integrity.
Sometimes you'd like to explicitly specify the type to narrow
to, as an assurance, but often you just want to give the acknowledgement without more fuss. In that case, instead of a type, you can provide the keyword fit
, e.g. some_function(narrow fit my_any)
. This is saying "I know I'm narrowing here, just narrow to fit whatever is required". Importantly, there will still be static and runtime checking!
Similar to Seq abstracting over List and Tuple, I used to have another under-the-hood, generic, variadic type: Lookup, widening into Set, abstracting over Map and Record. Although powerful, I ended up removing it. I'll briefly discuss it nonetheless; perhaps somebody else can make use of the idea, and I can illustrate the tradeoffs I consider for Lake.
Lookup takes multiple key-value type pairs: Lookup<K1 : V1, ..., Kn : Vn>
. Important: all Ki
in a Lookup must not overlap. Map<K : V>
is simply Lookup<K : V>
. Record<foo : T, bar : K>
is Lookup<"foo" : T, "bar" : K>
. Note that a value acts as a (very narrow) type here.
A Lookup can be widened to a Set. E.g. Lookup<K1 : V1, ..., Kn : Vn>
widens to Set<Tuple<K1,V1>, ..., Tuple<Kn, V1>>
. Any operation defined for a Set thus allows you to supply a Lookup, because of implicit widening, but such an operation defined for a Set will produce a Set, because this operation only cares about Set semantics, and doesn't try to maintain any semantics of narrower types.
Now why I've removed Lookup. There are two reasons, in no particular order. The first is that Set being able to take multiple arbitrary subtypes doesn't rhyme well with not having undiscriminated unions. You wouldn't be able to create such a set directly, (e.g. I want the literal @set{1, true}
to widen to Set<Any>
). The multiple subtypes for a set would only be to accomodate for the multiple key types of a Lookup representing a Record. This feels inconsistent. Instead of removing Lookup, I could have made Lookup not widen into Set, but I do think Map widening into Set is useful and intuitive.
The second reason is that if Record were to widen into Set, the fields, being Lookup keys, would have to be actual values. I considered two options. I could make them strings, but I definitely want the primary syntax for fields to be keywords. This introduces an inconsistency: you write a keyword, but at some point get a string. Perhaps minor, but it's friction. I could also create a type just for fields, e.g. Field. These would be nominal values. If you then were to do something with fields as values, e.g. compare them, you could use the field
keyword to write a literal, e.g. field my_field
.
This felt too bespoke for a rare use-case, and besides, if you really want to do dynamic stuff with fields, then not having undiscriminated unions is a shortcoming. And finally, it would be just more clear for the user if the idea of Records is more static. This made me move Record to widen directly into Any, removing the need for Lookup. Set is now defined simply as Set<T>
(no longer variadic), and Map<K : V>
widens into Set<Tuple<K, V>>
.
The syntax for a Record literal is @.{field1 : value, field2 : value}
. Like with operations on a Tuple, it is possible for operations on a Record to result in a new Record type (see later). This enables row-polymorphism, which is a notorious source of complication. However, like with type explosions and loose types for operations on Tuple, programs written in Lake won't make row polymorphism an issue in almost all cases. In Lake there are no twelve-step Record-building pipelines that you might see in an enterprise login system. I decided the potential for extremely rare unintelligible complexity is worth the utility.
Let's look at something especially beginner-friendly. A primary goal of Lake is to fuse an imperative style with immutable data. To this end I came up with a couple of constructs.
The of-loop is an expression that results in a list. In its body you can use the collect
statement to add an element to the list that the of-loop will result in. So instead of
bind mutable numbers : [empty Number]
for n in [1,2,3,4,5] {
if n > 2 {
rebind numbers : it <+ n + 3
}
}
You can do
bind numbers : of n in [1,2,3,4,5] {
if n > 2 {
collect n + 3
}
}
Effectively, an of-loop is a beginner-friendly map/filter from functional programming. Instead of reduce, there is the accumulate loop:
bind sum_at_most_10 : accumulate sum from 0 using n in numbers {
bind new_sum : sum + n
if new_sum > 10 {
break sum
} else {
continue new_sum
}
}
; custom accumulator name can be omitted, then `it` can be used instead
bind sum : accumulate from 0 using n in numbers { continue it + n }
Besides map, filter and reduce I also wanted to offer a beginner-friendly alternative to piping. The analogy is the assembly line, where a single value travels along different stations to be changed. Here too a custom name is optional.
bind result : assemble x from 0 {
x + 1,
adapt(x),
3 - x
}
bind result : assemble from 0 {
it + 1,
adapt(it),
3 - it
}
do assemble from 0{
it + 1,
adapt(it),
3 - it,
print(it)
}
Now a major power of Lake: access chains. Access chains are used to retrieve and change nested values: elements in a Seq, values in map or record and tagged values in an Either. A chain is a 'root' expression, followed by one or more 'links', optionally followed by a 'cap'.
As I go over the elements of access chains the entire system might increasingly appear daunting and not a good fit for Lake. However, afterwards I summarize and highlight how there are but few orthogonal concepts and how there is consistency between elements. This makes the cognitive complexity of access chains not multiplicative but additive, and a great addition to Lake's catalog of granular constructs.
The key difference between Lake's access chains and common postfix accessor operators you find in other languages (e.g. [index]
and .field
) is that those postfix operators are evaluated individually, one by one, whereas an access chain is evaluated as a whole, retaining information about the root and each individual link. This enables several mechanisms.
The most basic case of retrieving a nested looks and behaves practically the same as it would using another language's accessor operators. Note that sequences in Lake are 1-indexed.
bind list : [11, 22, 33]
list[1] ; 11
bind tuples : [@[1, 2, 3], @[11, 22, 33]]
tuples[2][3] ; 33
bind user : @.{age : 10, scores : [11, 22, 33]}
user.age ; 10
user.scores[2] ; 22
bind number_to_name : @{3 : "Foo", 7 : "Bar"}
number_to_name@{7} ; "Bar"
bind number_maybe : |some[123]
number_maybe|some ; 123
number_maybe! ; 123
For each chain, you see an expression as the root, then one or two links, and no cap. If there is no cap, the value at the end of the chain, the leaf, is retrieved. If any link fails, the program crashes. You'll see solutions to this shortly.
A link for Seq is [<expression>]
, for for Record is .<keyword>
, for Map is @{<expression>}
and for Either is |<keyword>
, where !
is the preferred alias for |some
, the 'value found' tag of Maybe.
The most basic cap is ::
, which requires a RHS expression. The value at the end of the chain is replaced by the result of the RHS, and the result of the chain is an updated copy of the root, making chain updates lens-like.
bind user : @.{age : 10}
user.age::11 ; @.{age : 11}
user.age::user.age + 1 ; @.{age : 11}
Like rebind
, ::
provides an it
to the RHS expression, allowing for
bind user : @.{age : 10}
user.age::it + 1 ; @.{age : 11}
Given that both rebind
and ::
provide an it
, you can now ergonomically do imperative-style updates of immutable nested data:
bind mutable user : @.{age : 10}
rebind user : it.age::it + 1
Recall that it
is always given a background color. This clearly visually distinguishes multiple it
s:
rebind user : it.age::it + 1
xxxx xx ooo oo
More constructs than rebind
and ::
provide an it
-binding. However, the intuition for it
is always the same: the value that's currently the star of the show, the one we want to talk about most.
Like a chain without a cap, a ::
-capped chain where any link fails crashes the program.
The cap ::remove
removes the leaf. Note that for Tuples and Records, this changes the type.
[11,22,33][3]::remove ; [11, 22]
@[11,22,33][3]::remove ; @[11, 22]
The cap ::?
is 'update if possible'. It results in the root if any link fails; if all links succeed, ::?
behaves exactly like ::
.
bind increment_at_index : fn(list, index) {
return list[index]::?it + 1
}
increment_at_index([1,2,3], 4) ; no crash!
Note that without a cap, !
acts exactly like you would expect if it were a mere postfix operator: get the some-value, or if it's not there, crash. However, since it's a link, you can e.g. update a some-value only when it exists: my_maybe!::?it + 1
To retrieve a nested value without crashes, you use the ::maybe
cap. This results in a Maybe, with as none-value (helpful details about the failure to retrieve) a tuple of the links that did succeed. Note that the none-value can be a tuple (which would be more useful than e.g. a List<Any>
, to which the tuple can widen anyway) because links are always statically defined in the code.
[11, 22, 33][3]::maybe ; |some[33]
[11, 22, 33][4]::maybe ; |none[@[]]
[[11, 22], [33, 44]][2][3] ; |none[@[[11, 22]]]
The final cap involves using a Maybe, so I'll first discuss more constructs for Maybe.
The postfix operator ?
returns a boolean and can be used to check if a Maybe value is |some[_]
; this makes ?
for this common case an ergonomic alternative to the matches
binary operator, which takes a LHS expression and RHS pattern-matching pattern. E.g. x?
is equivalent to x matches |some[_]
. This enables the idiom if user.age::maybe? { ... }
.
Often when you check for the existance of a some-value, you want to use it. This is made ergonomic with the given
statement: given <expression> { ... } [else {...}]
, where <expression>
must result in a Maybe, and the body is only evaluated if it's a |some
. Inside the body, it
is bound to the some-value.
given user.age::maybe { print(it) }
xxxxx xx
given parse_number_maybe(some_string) {
xxxxx
bind n : it
xx
rebind total : it + n
ooooo oo
}
Note that here given
is given the same color as it
, because the expression after given doesn't represent the value that will be bound to it
.
Note that by convention, all functions ending in _maybe
return a Maybe. Such functions are the 'safe', no-crash versions of their brethren without the _maybe
suffix.
Where before the confusion from multiple constructs introducing it
was solved by background color, here we see a structural problem. We had to bind given
's it
so we could use it in the rebind
. This can be solved using a construct that was originally designed as a solution to a different problem.
In almost all languages, the name of a variable (or similar concept, such as binding) can be used as an expression, meaning "the value currently assigned to this variable". In Lake this is called a bound-expression: "the value (currently, in case of a mutable binding) bound to this name".
A bound-expression has an optional prefix keyword upper
. This allows you to 'skip over' one binding in the lexical hierarchy. Note the background colors in the revised example:
given parse_number_maybe(some_string) {
xxxxx
rebind total : it + upper it
ooooo oo xx
}
upper
was originally designed to prevent namespace pollution in case of 'adaptive shadowing', where you want a new binding x
, but base its value on an existing x
. To allow for named recursion, you can't bind x : adapt(x)
. You're then forced to create a binding to store the existing x:
bind x_existing : x
bind x : adapt(x_existing)
But this x_existing
is not relevant at all to its scope (only to the RHS of bind x
), so it pollutes. You can prevent this with upper
, e.g. bind x : adapt(upper x)
.
Another construct to prevent scope pollution is the 'do-expression', which in other languages is often called a block-expression, or emulated with an 'immediately invoked function expression' (IIFE). It's an expression that evaluates a block of code, like any other body, and results in some value provided in the body. In Lake this is done explicitly with a yield
statement, e.g.
bind my_value : do {
bind temporary1 : 123
bind temporary2 : 456
yield temporary1 + temporary2
}
The last two constructs for Maybe values before we go back to the final chain cap are the binary operators catch
and catch!
. Both take an expression on either side. catch
results in the LHS if it's a some-value, otherwise the RHS. catch!
does so too, but would unwrap the LHS first. E.g. |some[123] catch |some[456]
results in |some[123]
; |none catch |some[456]
results in |some[456]
; |some[123] catch! 456
results in 123
; |none catch! 456
results in 456
.
Statements that redirect the flow of control (return
, break
, continue
, etc.) can be placed in a position where an expression (and thus a value, after evaluation) would be expected. This is fine: the expected value is never provided, but this doesn't matter, as the flow of control has moved elsewhere. This allows for useful idioms such as bind name : user.name::maybe catch! continue
.
The final cap ::+
, like ::?
, deals with the possibility of failing links. However, where the RHS of ::?
is evaluated conditionally, the RHS of ::+
is evaluated always (unless a crash happens because of an issue in a link). The it
provided by ::+
is a Maybe, identical to the value resulting from ::maybe
, so a some-value only if all links succeed. A ::+
-capped chain will always result in an updated root; any missing links are autovivified.
Note that ::+
can change type by replacing a leaf by a value of a different type, but its autovivification is not allowed to change types. What ::+
's autovivification can do is create a new index in a list (although not always, as will be discussed shortly), a new key in a map, and put a value in an empty Either value, if that value is allowed to (besides being empty) contain a value (e.g. |none[K?]
). It can also create lists, maps and Either values that are allowed to hold a value (so |some[T]
and |none[K?]
, but not |foo
).
bind l : [[[1, 2]], [[3, 4]]]
; create index [1][1][3]
l[1][1][3]::+ 5 ; [[[1, 2, 5]], [[3, 4]]]
; create index [1][2], also create a list
l[1][2][1]::+5 ; [[[1, 2], [5]], [[3, 4]]]
bind maybes : [|some[123], |none]
; create index [3], also create a some-value
maybes[3]!::+456 ; [|some[123], |none, |some[456]]
bind m : @{1 : "one"}
m@{2}::"two" ; crash! `2` is not a key;
; nothing to bind to `it`
m@{2}::+"two" ; `it` bound to `|none`
m@{1}::+it! <> "!" ; @{1 : "one!"}
m@{2}::+it! <> "!" ; crash!
m@{3}::+(it catch! "n") <> "!" ; @{1 : "one", 3 : "n!"}
m@{1}::+given it then it <> "!" else "n" ; @{1 : "one!"}
m@{3}::+given it then it <> "!" else "n" ; @{1 : "one", 3 : "n"}
x ooooo xx oo
Note how here given
is used as an expression, with slightly different syntax (then
, no curly braces). Some concepts have both a statement and an expression version. They start with the same keyword, but this keyword is colored differently for the statement version than for the expression version, making the distinction clear. This is an example of I let highlighting influence syntax (or, how highlighting is part of the syntax). Other concepts that have both a statement and an expression version are if
, match
and do
.
Note that when parsing, no lookahead is required to distinguish between the statement and expression version, given a keyword. If a statement is expected, an expression is always also allowed, but the statement takes precedence. If an expression is expected, then it's the expression version.
An issue then becomes, what if you want to use the expression version of if
, match
or given
where you could also place a statement? A legitimate reason for this would be to then pipe this expression into a function with |>
. That's what the do
statement is for. If do
is found where a statement is allowed, it starts the do
statement, which immediately expects only an expression. So in a body (or at the top level), if
starts the if-statement, but do if
starts the if-expression, clearly distinguished by color.
But what if you'd want to place a do-expression where a statement is also allowed? That'd be do do
, which is weird. In that case you can remove both do
s and write a plain block.
do do {
...
yield foo
} |> f ...
{
...
foo |> f ...
}
Back to autovivification. Consider [11][3]::+33
. Autovivification could place 33
at the third index just fine, but what should be placed at the second? I absolutely do not want default zero values for all types; what's the default for Any?. There is no generic 'null' value; you should use Maybe. That's the solution. If a list has a Maybe subtype, then autovivication can just insert |none
values in these 'holes', e.g.
[|some[11]][3]::+some[33] ; [|some[11], |none, |some[33]]
If a list does not have a Maybe subtype, then ::+
on such a list is allowed, with a warning (it might work), but if a hole appears at runtime the program crashes. If you don't want a potential crash, instead use <+
or a standard library function like put_at_index(list, element, filler : 0)
.
For the same 'hole' reason, where autovivification can create lists, maps and value-holding Either values, it can't create Tuples and Records, unless their type specifies exactly one value (which in practise is would be rare).
There used to be another cap, a supercharged version of ::+
. I removed it for being too sundering. Unlike all other caps, it did not have any type restrictions and would forcefully place values and autovivify, changing the result's type to match whatever abomination was wrought. I created this cap so you can use chains to create records with a new field and thus of a new type. This is impossible with ::+
because the static checking on links would cause an error if a link describes a field that is not already in the record. I really want to keep this static checking on links (to provide guardrails for a relatively sensitive operation), so the supercharged ::+
was out.
Adding a field to a record now happens similar to adding a new value to a tuple: not via chains, but via an operator. The rarity of this operator does not warrant a symbolic operator, but a keyword operator instead: my_record add_field new_field : value
.
A powerful extra layer to the access chain system is 'multilinks'. Where a regular link selects a single element of a collection, a multilink selects multiple. Where a chain of regular links describes a single path leading to a single leaf, at a multilink the path splits into multiple branches, leading to multiple leaves.
The simplest case is [*]
('all') for a sequence.
bind l [[11, 22], [33, 44]]
l[*] ; [[11, 22], [33, 44]]
l[*][1] ; [11, 33]
l[1][*] ; [11, 22]
l[*][*] ; [11, 22, 33, 44]
[][*] ; []
Note how l[*][*]
results in a flattened list. The point of an access chain is that you're interested in a leaf, or in the case of multilinks, multiple leaves. So what you're getting is a list of leaves.
Instead of all indices, you can also make a specific selection.
bind l : [11, 22, 33, 44, 55]
l[* in [2, 4]] ; [22, 33, 44]
l[* in [2, 6]] ; crash! link failed
l[* from 3] ; [33, 44, 55]
l[* to 3] ; [11, 22, 33]
l[* of [1, 3, 5]] ; [11, 33, 55]
l[* of [1, 2, 7]] ; crash! link failed
Note how all supplied indices must succeed. Supplying multiple indices with *
can be seen as individually supplying each index, following the same rules as a regular link. A chain with one or more multilinks can be seen as creating multiple 'actual' chains; each actual chain can cause a crash!
Inside these in/from/to/of links, it
is bound to the list. it
is given the same background color as the link brackets, indicating the list as a whole.
list[* of find_relevant_indices(it)]
x xx x
The final multilink for sequences is if
. Its it
is bound to the element itself, rather than the list, and given the same background color as *
.
bind l : [11, 22, 33, 44, 55]
l[* if it > 25] ; [33, 44, 55]
x xx
In a simple (non-multilink) chain without a cap, you get the leaf. In a multilink chain, so multiple actual chains from root to leaf, you get a list of all leaves (possibly empty). With the ::maybe
cap on a simple chain, you allow the chain to fail, and you get a Maybe, maybe containing the leaf. With the ::maybe
cap on a multilink chain, the intuition is that the cap is applied to each individual actual chain: the result is a list of Maybes.
For the updating caps (::
, ::remove
, ::?
, ::+
), on a simple chain they result in an updated root (or a crash). On a multilink chain, the intuition is again that a cap is applied to each individual actual chain, but the root for each actual chain is the updated root that resulted from applying the cap to previous actual chain. The result of this process is the final updated root, effectively having the cap applied for each leaf sequentially. Note that the it
on the RHS of a cap is always a single leaf.
bind lists : [[11, 22], [33, 44, 55]]
lists[*][1] ; [11, 33]
lists[*][3] ; crash; works for one actual chain but not on all
lists[*][1]::maybe ; [|some[11], |some[33]]
lists[*][3]::maybe ; [|none[[11, 22], |none[[33, 44, 55]]]]
lists[*][1]::it + 1 ; [[12, 22], [34, 44, 55]]
lists[*][3]::it + 1 ; crash
lists[*][1]::remove ; [[22], [44, 55]]
lists[*][3]::remove ; crash
lists[*][1]::?it + 1 ; [[12, 22], [34, 44, 55]]
lists[*][3]::?it + 1 ; [[11, 22], [33, 44, 56]]
lists[*][1]::+it! + 1 ; [[12, 22], [34, 44, 55]]
lists[*][3]::+given it then it + 1 else 0 ; [[11, 22, 0], [33, 44, 56]]
x ooooo xx oo
lists[*][4]::+0 ; crash, would create a hole
While links are strongly statically checked, so aren't able to change a type with ::+
, caps are allowed to provide a value that has a different type than the leaf bound to it
. Note that the door of caps changing types was already opened by ::remove
(which for a record would change type). If you change the type of a leaf, you also change the type of the updated root. However, because the change of the updated root's type can only happen in one place (the cap), error messaging can be made quite clear. Allowing a cap to change type enables a stepping stone between of-collect
and map
:
[1,2,3][*]::to_string(it) ; ["1", "2", "3"]
bind mutable user : @.{scores : [1, 2, 3]}
rebind user : it.scores[*]::to_string(it) ; static error: scores should be List<Number>, now is List<String>!
The final layer to access chains is the ability to apply a cap to multiple leafs as a collection. This is only possible if the very last link of a chain is a multilink. You signal this 'going one dimension up from a single leaf' by instead of ::
writing :::
. The links and caps still follow the same rules for success, failure, and result. The primary difference is that now it
is not a single leaf, but a collection of what the leaves would have been if ::
was used intead of :::
; in case of an update, the entire collection is updated. For :::?
, the update will only happen if all paths succeed. For :::+
, it
will be a collection of Maybe values.
bind l : [11,22,33,44,55]
l[* in [2, 4]]:::reverse(it) ; [11, 44, 33, 22, 55]
l[* in [2, 4]]:::[66] ; [11, 66, 55]
l[* in [2, 4]]:::[] ; [11, 55]
l[* in [2, 4]]::remove ; [11, 55]
l[* in [4, 6]]::: reverse(it) ; crash
l[* in [4, 6]]:::?reverse(it) ; [11,22,33,44,55]
l[2]:::[3] ; static error, ::: only allowed if last link is a multilink
Note that :::remove
would be superfluous, as you can always use ::remove
instead.
Of a list, a contiguous subselection (* in/from/to
) can always be replaced by another list of arbitrary length. However, for non-contiguous selections (* of/if
) it is less trivial. If you supply fewer replacements than there are leaves, then the leaves without a replacement can be dropped, like they would if a contiguous sublist would be replaced by a shorter one. However, where for a contiguous sublist any extra replacements can be intuitively and usefully be given a place, for non-contiguous sublists it's not obvious where any extra replacements should go. For this reason I considered not allowing :::
for * of/if
. However, I do see utility, so instead I've settled on a runtime error if for such a non-contiguous sublist any extra replacements are supplied. If during beta-testing it appears that this is a source of confusion, I can always apply the restriction again.
[11,22,33][* if > 15]:::it[*]::15 ; [11, 15, 15]
; note the order!
[11,22,33][* of [3, 1]]:::[55] ; [11, 22, 55]
[11,22,33][* of [3, 1]]:::[55, 66] ; [66, 22, 55]
[11,22,33][* of [3, 1]]:::[55, 66, 77] ; crash
Maps have equivalent but fewer multilinks, only @{*}
, @{* of expr}
and @{* if expr}
. For consistency, even if for a capless chain the final link is map-multilink, you still get a list of leaves. It would be useful to use access chains to extract a submap; to that end there is :::extract
.
bind m : @{1 : "one", 2 : "two", 3 : "three"}
m@{* of [1, 3]} ; ["one", "three"]
m@{* of [1, 3]}:::extract ; @{1 : "one", 3 : "three"}
Where the RHS of :::
for lists should supply a list, for maps its should supply a map. For a map, :::
binds it
to a map of key-leaf pairs.
bind m : @{1 : "one", 2 : "two", 3 : "three"}
m@{* if #it = 3}:::it{*}::to_upper_case(it) ; @{1 : "ONE", 2 : "TWO", 3 : "three"}
Some mixed examples:
bind m : @{
1 : [11, 111],
2 : [22, 222],
3 : [33, 333],
4 : [44, 444]
}
m@{* of [2, 4]} ; [[22, 222], [44, 444]]
m@{*}[1] [11, 22, 33, 44]
bind maps : [@{1 : 11, 2 : 22}, @{3 : 33, 4 : 44}]
maps[*]@{* if 20 < it < 35} ; [@{2 : 22}, @{3 : 33}]
Records also have multilinks, @.{*}
and @.{* field1, field2}
. :::extract also works if the final link is a record-multilink. Note that +++
is not relevant for :::extract; you're already explicitly saying you're not interested in the root. To allow for autovivification along the way, however, you might still want :::+extract
.
Let's wrap up with an overview.
seq[index]
map@{key}
either|tag
record.field
[*]
[* in range] it : sequence
[* from start] it : sequence
[* to end] it : sequence
[* of indices] it : sequence
[* if condition] it : element
@{*}
@{* of keys} it : map
@{* if condition} it : value
@.{*}
@.{* field1, field2, ..., fieldn } it : record
<no cap> (get leaf/leaves)
::maybe Maybe of leaf, |none if any link failed
::remove
::expr it : leaf replace
::?expr it : leaf replace if exists
::+expr it : Maybe of leaf autovivify, replace
only if last link is a multilink:
:::extract
:::expr it : collection (list, map or record)
:::?expr it : collection only if all links succeed
:::+expr it : collection, elements/values are Maybe
A chain describes one or more 'actual' chains. Any *
means you might have multiple actual chains; no *
means there's certainly only one.
An actual chain leads to a leaf. No matter the kinds of links or cap used, the reasoning is always in terms of the one leaf of a single actual chain.
Sometimes you just want the leaf (no multilinks, no cap). Sometimes you want multiple leaves and use a multilink (ang get a list of all leaves). Sometimes you allow a link to fail, so you accept a Maybe instead of the direct leaf (so you get a single Maybe or for multilinks a list of Maybes).
Sometimes you want to update the root by replacing the leaf. You want to enforce this always happens (::
), or only if the chain succeeds (::?
). Sometimes you don't mind if the chain doesn't succeed right now, and it's fine if missing values can be inserted to make it work (::+
). In case of multiple actual chains, these updates are performed sequentially, in a deterministic order: the same order as the leaves appear in the list you would get by omitting the cap.
Caps starting with ::
are always concerned about a single leaf, and are applied to each actual chain. Caps starting with :::
can only be used immediately after a multilink. For these caps the same actual chains are processed in the same way, resulting in the same leaves. The only difference is that for a :::
, it
is not a single leaf but the leaves that belong directly to the final multilink, put in a collection.
Access chains aren't a 'complexity cliff'. You can gradually add different kinds of elements to your chains. For teaching, multilinks could be 'unrolled' to operations on multiple explicit, non-multilink chains. And importantly, by the nature of Lake programs, nesting will never be that deep anyway. One or two levels is common, four or more would already quite rare.
I deliberately kept the elaborateness of access chains to the degree it is now. At some point I allowed *
for Either links; because of a specific tag being able to hold only a single value, it practically became a way to safely short-circuit a chain. I thought this was useful, so I worked on a design for something similar to multilinks, e.g. [? 3]
and @{? "A"}
where the ?
would be an 'exists or not', if not, then stop the path here, kind of like how *
can stop a path by not selecting it. However, this pushed the access chain system over a threshold. Too many options, too many interactions; just too much. The straightforwardness was too far gone. I scrapped ?
completely, and *
for Either links is no longer allowed.
The model now is limited to: either you provide the next specific place (index, key, field, tag) the chain should always go to (and if not possible, fail), or do not specify a specific place but instead any place depending on what's at that place (* if
). This clear either-or as a limitation keeps the complexity of access chains at an appropriate level. With this a rule, it follows that fields and tags aren't allowed a multilink (because such links always already describe at least a specific place), and that ?
is forbidden (it breaks 'chain must go here or fail'). The links now dictate "this is the path that must be followed!"; dealing with failure modes is strictly the concern of caps.
You could still do contrived things to emulate 'only this place if it exists', e.g. with [* of if it[3]::maybe? then [3] else []]
, but I will deliberate keep doing something like this non-first-class and high friction.
I explored the potential depth of the access chain system and found another dimension. This extended system would be a terrible fit for Lake. Perhaps a version of it fits in your language. You could make chains first-class objects. Just to think of a symbol, @
would be the 'unit' chain, of type Chain()
, @.foo
would be of type Chain(Record<foo>)
, @([1,2,3])
of type Chain[List<Number>)
, @::it + 1
of type Chain(Function<Number : Number>]
. Note the use of ()
and []
in the Chain types to denote the openness of either end. A fully closed Chain would be disallowed, because a 'closed chain' would be the value you get from applying the links and cap to the root. A 'weld' operator, e.g. <~>
combines two chains. You can store partial chains to be reused later. A chain containing only links (an 'all-open' chain) is just a path, and the caps describe any 'action'. This would make access chains more orthogonal than typical optics libraries, where path and action are often combined. E.g. a 'reverse lense' or 'constructor' would be emulated with just a path and the ::+
cap, with as a special root e.g. stem
(as in stem cell), that can be autovivified into whatever base structuer is needed.
There is a final access link, which ties in to more advanced, but not encouraged, way to deal with mutable state (rather than with bindings). A 'box' is a referenced object, like a function, that always contains a single value. The notion of something like a box is explicitly taught in the assumed pre-Lake education.
A box's value can be retrieved and changed using the access chain system. The syntax for a new box (creating a box object and resulting a reference value) is @box[value]
. The access link to enable retrieval and updating is @content
, leading to e.g. print(box@content)
and box@content::it + 1
. Note that copies are shallow; the whole point of using an indirection like a box is to not copy the value but instead keep the same reference. It is possible to create a giant mess with boxes in access chains, such as using a function in a multilink to change the content of the box that's accessed in the same chain as the multilink. Static checking can give some warning for this, but ultimate a box is an advanced construct and a deviation of the immutable paradigm for which the user consciously takes responsibility.
Note that boxes can be modeled with closures, e.g.
type Box_command<T> : Either |get |put[Function<T : T>]
bind create_box : fn(value) {
bind mutable content : value
return fn(command) {
match command {
|get { return content }
|put[f] {
rebind content : f(it)
return content
}
}
}
}
bind box : create_box(123)
print(box(|get))
box(|put[fn(content) { return content + 1 }])
Note that Box types will only widen into Any, so e.g. Box<Tuple<Number, Number>>
won't widen into Box<List<Number>>
.
After simple imperative constructs like for-loops and rebinding, stepping stones like of-collect, and the more advanced access chains, let's now take a look at the final level of construct advancedness in Lake: first-class functions (and other reference semantics).
The most common function literal is fn(parameter1, parameter2) { ... return value }
. A shorthand for simple functions such as fn(x, y) { return x + y }
is fn(x, y) x + y
. Function are expressions and can be placed anywhere. They can be closures.
Function types widen with typical contravariance for arguments and covariance for the return type. This concept is taught explitictly in the assumed pre-Lake education.
A very common pattern is that a function has a single parameter, and that single parameter is the first thing in an expression, e.g. map(users, fn(user) user.name)
. For this common pattern there is &
, which starts a function and also acts as that function's only parameter as an expression, e.g. map(users, &.name)
. While designing Lake, I would veer towards postfix for better synergy with &
.
Another very common pattern is that you want a function, and the only thing it does is call (and perhaps return that result of that call), e.g. map(users, fn(user) increment_age(user, 1))
. For this Lake has function templating, or 'captures', as they are called in Gleam. This enables map(users, increment_age($, 1))
.
Part of Lake's highlighting specification is that parentheses (and the like, including comma's and colons) should be colored according to depht. Also part of this spec is that a $
must have the same color as the parentheses of the call it turns into a capture, and that these parentheses and the expression that supplies the function to be called are distinctly marked, e.g. italicized. This makes it immediately obvious that a call is not a call but a capture, so a function literal in disguise. In case of nested captures, the coloring makes it immediately obvious which $ belongs to which capture.
Note the 'i' for italicized and numbers for different depth colors:
csv_lines |> map($, zipmap(["date", "category", "amount"], $))
iii111 iiiiii23 3 3 32 221
Multiple parameters can be given to a capture, and the order of the parameters can be controlled, by adding a 1, 2, ...
after the $
. E.g. foo(1, $, $)
is equivalent to fn(x) foo(1, x, x)
; foo(1, $2, $1)
is equivalent to fn(x1, x2) foo(1, x2, x1)
.
What drove me to clearly marking functions that are called as captures instead, is that I absolutely do not want variadic functions in Lake. Besides being more difficult to understand, variadic functions mess with expectations. "Wait, did this function take a list, or is it variadic?" Instead in Lake, where you might want a variadic function, you create two versions. The basic one, usually taking two arguments, and an _all
version which instead takes a list.
But to make work $
nicely with _all
functions, I had to allow $
to be nested arbitrarily deeply. This necessitated clear communication about what is a capture. So a $
will turn the first call it encounters going upwards into a capture. If despite the italicizing and coloring it is not obvious at a glance what call a $
turns into a capture, you can click on the $
to see for sure.
Despite Lake definitely not variadics, I wanted to enable, for the highly advanced user, the concept of decorator functions. To facilitate this, I created a special function literal, and limited the ability to decorate functions to the wrapping function only being able to call the decorated function with exactly the arguments the wrapping function was called with. This resulted in the interceptor
function literal, and the forward
expression:
bind create_report_called : fn(f, message) {
return interceptor {
print(message)
return forward f
}
}
bind report_called : create_report_called("foo", &)
report_called(1) ; prints "foo", returns 1
report_called(2) ; prints "foo", returns 2
Although an interceptor can't change the arguments it forwards, it can observe them. You gain access to these arguments by giving the interceptor a parameter: interceptor(args) { print(args) forward f }
.
Note that decorators can be lowered in level by instead writing multiple versions of the interceptor, each with a different signature that maps directly to the signature required for the wrapped function. The lesser experienced programmer can see why it's useful, and when starting to play with interceptors, inspect the arguments inside the interceptor, and then inside the forwarded-to function to see what's going on.
All together, this allows for very high level stuff:
bind juxt : fn(fs) interceptor map(fs, fn(f) forward f)
@[&, & + 1] |> juxt |> map([1, 2, 3], $) ; [@[1, 2], @[2, 3], @[3, 4]]
Although juxt
is used in the example above as a (to some) familiar name, it Lake it would break the convention of all functions having a verb name. An idiomatic refactor would be
bind create_juxtapose : fn(fs) interceptor map(fs, fn(f) forward f)
@[&, & + 1] |> create_juxtapose |> map([1, 2, 3], $)
Notice the double verb: create_juxtapose
creates a function which is (implicitly) called juxtapose
. Functions having verb names adheres to a more subtle principle in Lake's design, that of 'mechanical transparency'. What a thing is mechanically is typically not hidden by its presentation. Although sometimes perhaps more verbose, it does keep things more clear. Other manifestations of this principle are rebind
, upper
, narrow
, widen
; it's clear how these program constructs tie in to the core mechanisms.
At this point I've shown enough elements to give comprehensive examples for the Multiset problem. To solve the Multiset problem, where we want to statically safeguard more semantics than is first-class expressible with types, you can define a custom type to be not just an alias for another type, but a narrower version that other type. This is akin to first-class type branding, or nominal typing. E.g. type narrow Multiset<T> : Map<T, Number>
. Now Multiset<T>
is considered narrower than Map<T, Number>
, so a Multiset<T>
may be provided where a Map<T, Number>
is expected, but not the other way around. Specialized functions, e.g. multiset\add
can now take a Multiset, operate on it using all first-class tools in the language for maps, and return a result.
type narrow Multiset<T> : Map<T, Number>
bind multiset\add : fn(multiset, element) {
return multiset@{element}::+(it catch! 0) + 1
}
bind multiset\create_from_list : fn(elements) accumulate from @{}
using element in elements {
continue multiset\add(it, element)
}
However, the first-class tools for maps naturally all result in a Map. In this case, Map<T, Number>
is returned. But this is not the purpose of multiset\add
: it should tell static checker that what comes out of it is narrower, a Multiset<T>
.
A direction you might want to go in is annotating the return type in the function signature. This is possible, but for the sake of the example let's take another direction.
Instead of 'casting', in Lake you explicitly narrow
or widen
. This is a static annotation you put on an expression. The static checker will check if it's indeed possible to widen or narrow the expression to the provided type. E.g. widening a tuple to a list is certainly correct (widen List<Number> : @[1, 2]
) and widening a list to a set is certainly incorrect (widen Set : [1, 2, 3]
); the correctness of widening is statically never uncertain. Narrowing, however, can be statically uncertain, e.g. narrow List<Number> : my_list_any
, in which case, as for all static uncertainties, a warning is given and a runtime check is performed.
In case of Multiset, the static checker doesn't know how to check if a value is a Multiset at runtime, so here we're forced to explicitly make the narrow
unchecked, with the unchecked
keyword:
type narrow Multiset<T> : Map<T, Number>
bind multiset\add : fn(multiset, element) {
return narrow unchecked Multiset : multiset@{element}::+(it catch! 0) + 1
}
bind multiset\create_from_list : fn(elements) accumulate from narrow unchecked Multiset : @{}
using element in elements {
continue multiset\add(it, element)
}
According to the static checker, multiset\add
now returns a Multiset
You can also provide a type checking function, which takes a single parameter of the type narrowed by the custom type defintion, and must return a boolean. It will be used for type validation at runtime. Here the single parameter of the type check function is a Map<T, Integer>.
type narrow Multiset<T> : Map<T, Number>
type check Multiset : fn(m) {
for n in get_values(m) {
unless n >= 1 and check_is_integer(n) { return false }
}
return true
}
bind multiset\add : fn(multiset, element) {
return narrow Multiset : multiset@{element}::+(it catch! 0) + 1
}
bind multiset\create_from_list : fn(elements) accumulate from narrow Multiset : @{}
using element in elements {
continue multiset\add(it, element)
}
Now narrow Multiset
is allowed without unchecked
. Note that even with a type check
present, you might still want to perform a narrow unchecked
to prevent expensive type check function calls.
Even if (for performance) in the functions for your custom narrower type all narrow
s are unchecked
, you might still also want a type check function. That way you can use the is
operator for your custom narrower type. is
takes the name of an immutable binding and a type, and returns a boolean. It can be used as a condition. With flow analysis, the static checker can then know that a binding holds your custom narrower type, e.g.
bind a_map : get_a_map()
bind mutable b_map : a_map
if a_map is Multiset<String> {
rebind b_map : multiset\add(it, "foo")
}
is
can also be used with all first-class types.
You can also create a custom narrower type of another custom narrower type, e.g. type narrow MultisetAtLeastTwo<T> : Multiset<T>
. Note that any widening, implicit or explicit, to your custom narrow type can't be statically checked, so will always be done at runtime, which requires a type check function for the wider type. Or, you can explicitly widen unchecked
.
Regarding teaching custom narrow types, they can't be expressed using more basic constructs. Instead, the more basic version is the practise of programmer discipline. When teaching this advanced construct, it will be a matter of "you yourself no longer have to be careful everywhere", then using the familiar function to do something with the familiar widening system.
Note that there is a slight mixing of worlds with type check
. Type annotations "don't exist at runtime", yet type check
requires an expression to provide a function. This expression must be evaluated at runtime. It is evaluated exactly once, the first time the runtime would encounter it.
This leads to another construct I ultimately removed from Lake. The once
expression. It takes a RHS, which is evaluated only the first time the once
is evaluated. Subsequent evaluations of once
would result in the cached result of the RHS. This can be used not only for performance optimizations, which are of lesser importance in Lake, but primarily to provide code locality.
If you want an multiple iterations of an inner scope to share the same state, you'd have to create a stateful object like a function or a box. But, this object shouldn't be constructed each time the inner scope is evaluated. So you'd have to create and bind the object in an upper scope, reducing locality. once
would solve this. However, once
creates a bespoke coupling between syntactical location and evaluation timing. The rarity of it solving an actual issue is so low that it doesn't warrant the paradigm-altering nature of once
.
This body of text so far covers all aspects of Lake that I find interesting to discuss. I've left out the more typical aspects of Lake out of this document, such as pattern matching and destructuring.
My current focus is getting the tooling to a point where the per-keystroke highlighting covers the arbitrary-depth highlighting I want, i.e. rainbow punctuation and italicized captures. I've written a decent batch parser for semantic checking, but I'm not satisfied with what the Lezer parser generator can do for the syntax highlighting that I want for Lake. I'm making progress on a hand-written incremental parser!