Writing your own Elixir pipe operator
TL;DR: Elixir - a language that positively celebrates the use of macros
I’ve been picking up some Elixir so I can try my hand at a new project I want to build using the Erlang/OTP ecosystem.
Elixir uses lots and lots of compile-time macros. The language’s pervasive use of macros is unprecedented in my experience. Most (all?) of the core features of Elixir (its special forms) are (non-overridable) macros (e.g. the case statement)
Everybody likes and wants to use a language that has macros. But you are not supposed to use them right? ☺ ☺ ☺
Possibly the single most comment that has endeared me to Elixir and its community is a passage from Chris McCord’s great book Metaprogramming Elixir.
Some of the greatest insights I’ve had with macros have been with irresponsible code that I would never ship to production. There’s no substitute for learning by experimentation.
Write irresponsible code, experiment, and have fun. Use the insight you gain to drive design decisions of things you would ship to a production system.
In the same vein as Chris’s just-give-it-a-go-and-learn-some-lessons meme, I recently watched Zack Tellman’s talk Some Things that Macros do at last year’s Curry On! where he made the point that macro usage in Clojure has tended to be either at the low-end, standard fare of boilerplate removal, or up at the high-end where the doyen of Clojure macros core.async towers. Zack reckoned the gap between the two extremes offered plenty of space for people to experiment with macros.
So, with that thought, this post is my first irresponsible experiment in designing, writing and using Elixir macros. Caveat Emptor
Elixir’s built-in pipe |> operator?
Of course anybody who has written even the merest snippet of Elixir knows it already
has a statement threading operator: the famous pipe operator |>
The docs say
it simply takes the output from the expression on its left side and passes it as the first argument to the function call on its right side.
The |>
is really syntactic sugar that enables this code that creates
the string HelloWorld
1
Enum.join(Stream.map(String.split("hello world"), &String.capitalize/1))
… to be written like this:
1
"hello world" |> String.split |> Stream.map(&String.capitalize/1) |> Enum.join
… or even like this where each |>
breaks
onto a new line.
1
2
3
4
"hello world"
|> String.split
|> Steam.map(&String.capitalize/1)
|> Enum.join
No difference at all to the final, generated, code (and result) but again a welcome boost to visual clarity.
Elixir Metaprogramming
To explain how the |>
macro works needs a bit of an explanation of
how Elixir does code generation. Many people have already covered the
structure of an AST notably Chris’s
book but also in many fine blogs
out there (see for example Sasa Juric’s series on
macros). So I’m not going to delve too deep.
To set the scene, when you (run-time) metaprogram in Ruby you build the string representation of the new code and call e.g. class_eval to compile it.
Clojure, being a Lisp, is homoiconic so you’d write a (compile-time) macro to construct s-expressions with the new code.
Elixir’s lingua franca of code generation is a quoted abstract syntax tree (AST). Elixir ASTs are homoiconic and fractal-like.
Elixir macros take an AST, transform it and return the transformed AST which is then compiled by the compiler.
What’s in an AST?
An Elixir AST is a tuple with three elements:
1
{function-to-call, function-metadata, function-arguments}
-
The function-to-call is often an atom of the function to call e.g. :hd (to take the head of a list), or it may be another AST for example representing a call to a function in another module; .
-
The function-metadata is “red tape” which I wont dwell on here.
-
The function-arguments is an list containing the arguments to the functions. The arguments may be literals e.g. 42 or, again, ASTs.
ASTs are regular, recursive structure including tuples, lists, atoms and “literals” such as strings, booleans (i.e. true or false) and numbers (1, 2, etc)
Show me an AST!
The AST for any code can be created using the quote function.
Try this in Elixir’s interactive repl iex to see the AST for the first step of the example above, splitting the “hello world” string
1
iex(1)> quote do: String.split("hello world")
and you’ll see a simple AST showing the the Kernel function . (dot) being used to call the split function in module String with the argument “hello world”:
1
{{:., [], [{:__aliases__, [alias: false], [:String]}, :split]}, [], ["hello world"]}
If you have a look at the AST generated when using the pipe operator i.e.
1
iex(2)> quote do: "hello world" |> String.split
and you should see something like this:
1
2
3
4
{:|>,
[context: Elixir, import: Kernel],
["hello world",
{{:., [], [{:__aliases__, [alias: false], [:String]}, :split]}, [], []}]}
This AST is quite hard to parse by
eye, but if you stare at it for a while, you should be able to discern
its the |>
operator being called with two arguments:
-
The first argument is “hello world” string; and
-
The second argument is essentially the same AST as above when Kernel . (dot) was used to call split in String but without the “hello world” string as the first argument i.e.
1
{{:., [], [{:__aliases__, [alias: false], [:String]}, :split]}, [], []}
To labour this point: the AST for the |>
has two arguments, one of
which is another AST.
The |> is just an Elixir macro
The |>
operator is just a macro, and a very brief one at that with
only two lines using Macro pipe and unpipe.
1
2
3
4
defmacro left |> right do
[{h, _}|t] = Macro.unpipe({:|>, [], [left, right]})
:lists.foldl fn {x, pos}, acc -> Macro.pipe(acc, x, pos) end, h, t
end
How does so little code do so much?
The Macro.unpipe call
Lets generate the AST for the complete HelloWorld pipeline
1
2
# generate the ast for the complete HelloWorld pipeline
iex(n)> pipeline_ast = quote do: "hello world" |> String.split |> Stream.map(&String.capitalize/1) |> Enum.join
you’ll see
1
2
3
4
5
6
7
8
9
10
11
{:|>, [context: Elixir, import: Kernel],
[{:|>, [context: Elixir, import: Kernel],
[{:|>, [context: Elixir, import: Kernel],
["hello world",
{{:., [], [{:__aliases__, [alias: false], [:String]}, :split]}, [], []}]},
{{:., [], [{:__aliases__, [alias: false], [:Stream]}, :map]}, [],
[{:&, [],
[{:/, [context: Elixir, import: Kernel],
[{{:., [], [{:__aliases__, [alias: false], [:String]}, :capitalize]},
[], []}, 1]}]}]}]},
{{:., [], [{:__aliases__, [alias: false], [:Enum]}, :join]}, [], []}]}
Now run the Macro unpipe call on the pipeline_ast
1
2
# run unpipe on the pipeline ast
pipeline_tuples = Macro.unpipe(pipeline_ast)
you’ll see a list - pipeline_tuples - with four items which I’ve laid out and annotated below to try to highlight the structure and contents.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[
# The "source" - a tuple with two elements, the first being the initial string
{"hello world", 0},
# This is the String.split step
# Note it is also a 2tuple, the first being the ast for the call to
# String.split *without* any args (i.e. the "hello world").
# BTW the second element of the 2tuple is 0
{{{:., [], [{:__aliases__, [alias: false], [:String]}, :split]}, [], []}, 0},
# This is the Stream.map step
# As before its a 2tuple where the first element is the call to
# Enum.map with *only* one argument: the call to String.capitalize.
# So it is missing the first argument: the result of the String.split
{{{:., [], [{:__aliases__, [alias: false], [:Stream]}, :map]}, [],
[{:&, [],
[{:/, [context: Elixir, import: Kernel],
[{{:., [], [{:__aliases__, [alias: false], [:String]}, :capitalize]}, [],
[]}, 1]}]}]}, 0}
# The final step: the call to Enum.join
# As before, a 2tuple and missing its first (and only) argument: the result of the Stream.map
{{{:., [], [{:__aliases__, [alias: false], [:Enum]}, :join]}, [], []}, 0}
]
To sum up:
-
the list has for items;
-
all items are tuples of size 2 (2tuples);
-
the first element of each 2tuple is an AST;
-
for the 2nd and subsequent tuples, the AST is missing its first argument
-
the second element of each 2tuple is 0; this is the position of the previous AST in the arguments to the current AST i.e. its missing first argument
The Macro.pipe call
The Macro unpipe call has done most of the heavy lifting.
If you take another peek at the unpipe statement in the |>
macro you’ll see
it destructures the unpipe result - pipeline_tuples in my example code above - to pick out the first 2tuple, and
drop its second element i.e 0 (the index):
1
iex(n)> [{first_ast, _index} | rest_tuples] = pipeline_tuples
The call to Macro pipe has the relatively simpler job of creating as AST that looks like the code when =|> has not been used i.e, from the above,
1
Enum.join(Enum.map(String.split("hello world"), &String.capitalize/1))
The call to Macro pipe in the |>
uses a call to Erlang’s foldl
which may not be familar to many who only know Elixir. So I’m going
the rewrite this step using the more familiar Enum reduce
1
2
3
# thread the asts together, the first being the inner most
iex(n)> final_ast = rest_tuples |> Enum.reduce(first_ast,
fn {rest_ast, rest_index}, this_ast -> Macro.pipe(this_ast, rest_ast, rest_index) end)
The pipe inserts the this_ast (e.g “hello world”) as the rest_index argument (e.g. 0th) in the rest_ast (e.g. the ast for the call to String.split).
To see what the code for final_ast looks like, Macro to_string will “convert” the AST back to Elixir
1
iex(n)> Macro.to_string(final_ast)
and you should see:
1
"Enum.join(Stream.map(String.split(\"hello world\"), &String.capitalize/1))"
Voilá!
Too many pipes!
Yeah well that’s all fine and awesome but, like, you know, too many pipes dude, too many pipes!
You may feel having to type
the |>
each time gets a bit old, after all you’ve already signalled
your intention of creating a thread-first pipeline. What I’d really
like to write is something like this:
1
2
3
4
|> "hello world"
String.split
Stream.map(&String.capitalize/1)
Enum.join
Of course this will give the parser kittens because each |>
takes two
arguments: one on the left and one on the right! Oh well. Lost cause? Maybe not?
How about a “Thread First” macro?
Consider the same code but written like this using a familiar Elixir do block passed to a macro called thread-first (like a def with no arguments and just a do block)
1
2
3
4
5
6
thread_first do
"hello world"
String.split
Stream.map(&String.capitalize/1)
Enum.join
end
What would a thread-first macro need to do to make this work?
You may have wondered why I took so much time explaining how |>
works.
Well its because its implementation holds the answer to my question - and it is really is quite simple.
We’ve already seen how to thread ASTs together using Macro pipe. How can we produce a list of ASTs from the do block to feed pipe?
As you may or may not know a do block is just the value of the :do key in the Keyword list passed as the last argument to the macro.
Lets have a stab at writing thread_first:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
defmacro thread_first(args) do
# get the statements' asts from the do block
[first_ast | rest_asts] = args
# fetch the :do key
|> Keyword.fetch!(:do)
|> case do
# if the do block has multiple statements it will be a __block__
# and the statements' asts will be its args
{:__block__, _, args} -> args
# if only one statement in the do block the value of the :do:
# key will be the statement's ast
ast -> [ast]
end
# now use Macro.pipe to thread all the statements' asts together
# and return the threaded ast
rest_asts
|> Enum.reduce(first_ast,
fn rest_ast, this_ast ->
# insert this_ast as the 0th argument of rest_ast
Macro.pipe(this_ast, rest_ast, 0)
end)
end
Proof of the Pudding!
Lets do some quick tests. First the original example creating the HelloWorld string.
1
2
3
4
5
6
7
8
9
10
11
12
test "thread1: HelloWorld" do
result = thread_first do
"hello world"
String.split
Stream.map(&String.capitalize/1)
Enum.join
end
assert result == "HelloWorld"
end
It doesn’t make any sense really to have a single line do block but thread_first handles it:
1
2
3
4
5
6
7
8
9
test "thread2: single-line do block" do
result = thread_first do
"hello world"
end
assert result == "hello world"
end
A daft example: a calculator
1
2
3
4
5
6
7
8
9
10
11
12
13
14
test "thread3: a calculator" do
result = thread_first do
42
+ 5 # 47
- 17 # 30
rem 26 # remainder is 4
end
assert 4 == result
end
Final Words
A bit of fun but it demonstrates how easy it is to roll-your-own code using Elixir’s macros.
My thread_first macro has only about 10 lines of actual code. That’s not only a tribute to the power of Elixir’s macros but also the standard library that includes utility modules such as Macro and Code that do so much of the heavy lifting for you.
Bit of a fess-up. Those of you who know any Clojure will have spotted fairly quickly that thread_first is equivalent to Clojure’s own thread first macro → which I’ve written about here.
Final Final Words
People are generally “warned off” writing and using macros. Don’t be. Use your own judgement when they are necessary and when not. And when they are, use them effectively just like any other feature of Elixir. And remember: Sometimes only a macro will do.
Code on Github
There is not much to the code but its on Github if anybody wants it:
1
2
3
4
cd /tmp
git clone git@github.com:ianrumford/elixir-thread-first.git
cd elixir-thread-first
mix test