Universal Makefile for Erlang Projects That Use Rebar

This post sponsored by ErlangCamp 2013 in Nashville and Amsterdam

At this point in the game nearly every Erlang project uses Rebar. The problem with that is that Rebar’s approach to the command line and command dependency chaining is leaves a lot to be desired. You tend to end up typing the same command with the same options list over and over again during the course of your work. Because of the poor dependency chaining you often must retype the same sequence of commands as well. Finally, there are certain things (like Dialyzer support) that Rebar does not support.

In our Erlware projects, we want a consistent and recognizable entry point into the build process. For that reason we tend to treat Rebar as a low level tool and drive it and the other build tools I mentioned with a Makefile. That makes it far easier for us, as developers, to chain rules as needed and create additional rules that add features to the build system. This allows us to integrate other tools seamlessly into the build experience. At Erlware, we have developed a pretty standard Makefile that can be used with little or no changes from project to project. You can find the whole of that Makefile here. However, I will work my way through a few parts of it explaining so you understand what is going on and can make changes relevant to your project.

The main targets this Makefile supports are as follows:

  • deps: Pull the project dependencies (called automatically as needed)
  • update-reps: Update the dependencies (never called automatically)
  • compile: Compiles the project
  • doc: Builds the edoc documentation
  • test: Compiles the code and runs the tests (designed to be called by a human)
  • dialyzer: Build the dependency PLT and run dialyzer on the project
  • typer: Run Typer on the project
  • shell: Bring up an Erlang shell with all the dependencies already loaded and unit tests compiled and available.
  • pdf: Turn your README.md into a pdf using pandoc (pretty useful at times, but completely optional)
  • clean: Delete the build output files
  • distclean: Remove the build output files as well as the project PLT file and all the dependencies
  • rebuild: Do a dist clean, rebuild everything from scratch and run both the tests and dialyzer

Now that we have an idea of the targets available lets work through the major points of the Makefile.

Defining Variables

ERLFLAGS= -pa $(CURDIR)/.eunit -pa $(CURDIR)/ebin -pa $(CURDIR)/deps/*/ebin

DEPS_PLT=$(CURDIR)/.deps_plt
DEPS=erts kernel stdlib

At the top of the make file a few variables that are set. For the most part you don’t ever have to touch any of these with the exception of DEPS. The DEPS variable provides a list of dependent applications that are used by Dialyzer to build the dependency PLT file. The others are ERLFLAGS, which is used by the shell command to correctly make your code available in the shell, and DEPS_PLT, which points to the location where the project PLT file will be located.

PLT Files and Dialyzer

$(DEPS_PLT):
	@echo Building local plt at $(DEPS_PLT)
	@echo
	dialyzer --output_plt $(DEPS_PLT) --build_plt \
	   --apps $(DEPS) -r deps

dialyzer: $(DEPS_PLT)
	dialyzer --fullpath --plt $(DEPS_PLT) -Wrace_conditions -r ./ebin

This is how the Dialyzer command is run. The main things to notice here are that a PLT file specific to the project is built using the dependencies that you described at the top of the file in the DEPS variable. Building a per project PLT solves a raft of potential problems but has the downside that the first run of Dialyzer or the first run after a rebuild can take several minutes as it analyzes all of the dependencies to build the PLT file.

Rebuilding

Rebuilding is basically a completely clean rebuild and test of the system. You should run this code before you submit a PR or share code with your peers. It basically tries to ensure that you have not forgotten or left off anything that is needed.

Conclusion

You can, quite literally, drop this makefile into your project and use it today with only some very minor modification to the DEPS variable. If you are not already using something like this in your project I encourage you to add this Makefile now. It will save you a lot of tedious typing and make your build process much clearer to your users.

Alternatives

There are a few alternatives to this approach out there. These are quite good if somewhat more complex.

Getting Flymake and Rebar to Play Nice

TLDR;
Copy and paste the following into your elisp erlang-mode configuration to get flymake working with Rebar projects.

 (defun ebm-find-rebar-top-recr (dirname)
      (let* ((project-dir (locate-dominating-file dirname "rebar.config")))
        (if project-dir
            (let* ((parent-dir (file-name-directory (directory-file-name project-dir)))
                   (top-project-dir (if (and parent-dir (not (string= parent-dir "/")))
                                       (ebm-find-rebar-top-recr parent-dir)
                                      nil)))
              (if top-project-dir
                  top-project-dir
                project-dir))
              project-dir)))

    (defun ebm-find-rebar-top ()
      (interactive)
      (let* ((dirname (file-name-directory (buffer-file-name)))
             (project-dir (ebm-find-rebar-top-recr dirname)))
        (if project-dir
            project-dir
          (erlang-flymake-get-app-dir))))

     (defun ebm-directory-dirs (dir name)
        "Find all directories in DIR."
        (unless (file-directory-p dir)
          (error "Not a directory `%s'" dir))
        (let ((dir (directory-file-name dir))
              (dirs '())
              (files (directory-files dir nil nil t)))
            (dolist (file files)
              (unless (member file '("." ".."))
                (let ((absolute-path (expand-file-name (concat dir "/" file))))
                  (when (file-directory-p absolute-path)
                    (if (string= file name)
                        (setq dirs (append (cons absolute-path
                                                 (ebm-directory-dirs absolute-path name))
                                           dirs))
                        (setq dirs (append
                                    (ebm-directory-dirs absolute-path name)
                                    dirs)))))))
              dirs))

    (defun ebm-get-deps-code-path-dirs ()
        (ebm-directory-dirs (ebm-find-rebar-top) "ebin"))

    (defun ebm-get-deps-include-dirs ()
       (ebm-directory-dirs (ebm-find-rebar-top) "include"))

    (fset 'erlang-flymake-get-code-path-dirs 'ebm-get-deps-code-path-dirs)
    (fset 'erlang-flymake-get-include-dirs-function 'ebm-get-deps-include-dirs)

Intro

Its probably no great surprise to anyone that I dislike Rebar a lot. That said there are times when I have no choice but to use it. This is always either because a company I am contracting for uses it, or an open source project I am contributing to uses it. When I am forced to use it there are a few things I don’t want to give up. Most important among these is Flymake for Erlang. The default setup for Flymake doesn’t work for Rebar projects because Flymake does not know where the code and include paths for dependencies are. Fortunately, we can fix this with a few lines of elisp.

Flymake For Erlang

First make sure you have Flymake for Erlang installed. It is easiest just to follow the directions available on the Erlang Website.

The Elisp Additions for Erlang Flymake

There are two defvars that point to functions that are used to search for the correct code paths and include paths respectively. We are going to replace those functions with our own functions. Both these functions search upwards from the directory that contains the file pointed to by the current buffer, looking for the top most ‘rebar.config’ in the directory path. It then uses that for a base and searches down the directory structure looking for either ‘ebin’ files or ‘include’ files.

There are two things to note here. The first is that you must have already run `get-deps` for rebar for this to work and the second is that if your project is truly huge or you have way more dependencies then you probably need this search could take a second or two. That is a second or two too long in an interactive compiler like Flymake. That said, the likelihood that you will run into this second problem is quite low.

Getting Started

The very thing you want to do is ensure that you have required the erlang-flymake module. Most of what we do below depends on this.

(require 'erlang-flymake)

Finding the Top rebar.config

The second thing we want to do is look for the top rebar.config in the project. If a rebar project contains more then one OTP application its quite likely that it will contain more then one rebar.config. The very topmost `rebar`config` is the right one to serve as root of our search. So we introduce a set of recursive functions to look for that top level dir.

    (defun ebm-find-rebar-top-recr (dirname)
      (let* ((project-dir (locate-dominating-file dirname "rebar.config")))
        (if project-dir
            (let* ((parent-dir (file-name-directory (directory-file-name project-dir)))
                   (top-project-dir (if (and parent-dir (not (string= parent-dir "/")))
                                       (ebm-find-rebar-top-recr parent-dir)
                                      nil)))
              (if top-project-dir
                  top-project-dir
                project-dir))
              project-dir)))

ebm-find-rebar-top-recr will return either the top most directory or nil. Our next function takes that result and does something useful with.

    (defun ebm-find-rebar-top ()
      (interactive)
      (let* ((dirname (file-name-directory (buffer-file-name)))
             (project-dir (ebm-find-rebar-top-recr dirname)))
        (if project-dir
            project-dir
          (erlang-flymake-get-app-dir))))

In this function, we get the directory containing the file pointed at by the current buffer. We then call our recr function. If it returns a directory we return that, if it returns nil however, we call the original erlang-flymake-get-app-dir function.

At this point we should have our project root. Now its a simple matter of recursively searching down the directory tree looking for files of a certain name. So we create a function that does just that, given a directory and a name will return a list of absolute paths for each subdirectory that matches the specified name.

(defun ebm-directory-dirs (dir name)
        "Find all directories in DIR."
        (unless (file-directory-p dir)
          (error "Not a directory `%s'" dir))
        (let ((dir (directory-file-name dir))
              (dirs '())
              (files (directory-files dir nil nil t)))
            (dolist (file files)
              (unless (member file '("." ".."))
                (let ((absolute-path (expand-file-name (concat dir "/" file))))
                  (when (file-directory-p absolute-path)
                    (if (string= file name)
                        (setq dirs (append (cons absolute-path
                                                 (ebm-directory-dirs absolute-path name))
                                           dirs))
                        (setq dirs (append
                                    (ebm-directory-dirs absolute-path name)
                                    dirs)))))))
              dirs))

Now we write a couple of functions to replace the corresponding functions in `erlang-flymake`. The first looks for all `ebin` directories while the second looks for all `include` directories.

    (defun ebm-get-deps-code-path-dirs ()
        (ebm-directory-dirs (ebm-find-rebar-top) "ebin"))

    (defun ebm-get-deps-include-dirs ()
       (ebm-directory-dirs (ebm-find-rebar-top) "include"))

Finally we replace the `erlang-flymake` versions of those functions with our implementations.

(fset 'erlang-flymake-get-code-path-dirs 'ebm-get-deps-code-path-dirs)
(fset 'erlang-flymake-get-include-dirs-function 'ebm-get-deps-include-dirs)

Conclusion

This approach is a bit of a hack, we basically use some heuristics to find a root and then just grab everything under that that looks remotely like a code or include directory. While its a bit hacky it has the valuable upside that its flexible and robust.

Erlang Common Test Continuous Integration

Common Test is a well thought out integration testing framework for Erlang. If you
are not using it you probably should be. However, it has one fault. It
does not return non-negative exit status’ to the caller when the tests
fail. This is a major oversight, and it makes it difficult to use as
part of a continuous integration scheme or in a make based build
system.

The long term fix is for the OTP folks to resolve the issue in the
ct_run command. To that end I have filed a bug report with the
Erlang folks. In the short term, though, we need this behaving
correctly. After much twiddling around with different solutions and
conversions on the erlang-questions list. This solution finally popped
out of a conversation with Lukas Larsson. Basically, we use the old
unix standby of awk.


    ct_run -dir tests  ... | awk "/FAILED/{exit 1;}/failed/{exit 1;}/SKIPPED/{exit 1;}"

Where ... is replaced with your additional options. Its not the best
solution on the planet, but it is the simplest one that I found that
works consistently.

Announcing ErlangDC: An Epic One-Day Erlang Conference in the Washington, DC Area

We are happy to announce ErlangDC: An Epic One-Day Erlang Conference in the Washington, DC area.

New to Erlang? Learn the basics — and find out why Erlang should be in your programmer’s toolkit — during the morning bootcamp. Meet fellow DC-area Erlang enthusiasts at lunch. Learn advanced Erlang techniques in the afternoon tech talks. Swap Erlang war stories and make lifelong friends over pints at the post- conference Happy Hour.

It will be reliably awesome. Just like Erlang.

ErlangDC is organized by the local DC Erlang Meetup Group, with help from Erlang Solutions and Erlang Factory. The event will be hosted at the AOL Headquarters in Dulles, VA.

Sign up when you get a chance. The early-bird tickets have been sold out fast! Tickets are only $40.

Property based testing for unit testers with PropEr – Part 1

This tutorial is brought to you by ErlangCamp 2011 (click here) – Boston, August 12th and 13th – It’s gonna be totally sweet!

Main contributors: Torben Hoffmann, Raghav Karol, Eric Merritt

The purpose of the short document is to help people who are familiar
with unit testing understand how property based testing (PBT) differs,
but also where the thinking is the same.

This document focusses on the PBT tool
PropEr for Erlang since that is
what I am familiar with, but the general principles applies to all PBT
tools regardless of which language they are written in.

The approach taken here is that we hear from people who are used to
working with unit testing regarding how they think when designing
their tests and how a concrete test might look.

These descriptions are then “converted” into the way it works with
PBT, with a clear focus on what stays the same and what is different.

Testing philosophies

A quote from Martin Logan (@martinjlogan):

For me unit testing is about contracts. I think about the same things
I think about when I write statements like {ok, Resp} =
Mod:Func(Args). Unit testing and writing specs are very close for me.
Hypothetically speaking lets say a function should return return {ok,
string()} | {error, term()} for all given input parameters then my
unit tests should be able to show that for a representative set of
input parameters that those contracts are honored. The art comes in
thinking about what that set is.

The trap in writing all your own tests can often be that we think
about the set in terms of what we coded for and not what may indeed be
asked of our function. As the code is tried in further exploratory
testing and in production new input parameter sets for which the given
function does not meet the stated contract are discovered and added to
the test case once a fix has been put into place.

This is a very good description of what the ground rules for unit
testing are:

  • Checking that contracts are obeyed.
  • Creating a representative set of input parameters.

The former is very much part of PBT – each property you write will
check a contract, so that thinking is the same.

xUnit vs PBT

Unit testing has become popular for software testing with the advent
of xUnit tools like jUnit for Java. xUnit like tools typically
provide a testing framework with the following functionality

  • test fixture setup
  • test case execution
  • test fixture teardown
  • test suite management
  • test status reporting and management

While xUnit tools provide a lot of functionality to execute and manage
test cases and suites, reporting results there is no focus on test
case execution step, while this is the main focus area of
property-based testing (PBT).

Consider the following function specification

sort(list::integer()) ---> list::integer() | error

A verbal specification of this function is,

For all input lists of integers, the sort function returns a sorted
list of integers.

For any other kind of argument the function returns the atom error.

The specification above may be a requirement of how the function
should behave or even how the function does behave. This distinction
is important; the former is the requirement for the function, the
latter is the actual API. Both should be the same and that is what our
testing should confirm. Test cases for this function might look like

assertEqual(sort([5,4,3,2,1]), [1,2,3,4,5])
assertEqual(sort([1,2,3,4,5]), [1,2,3,4,5])
assertEqual(sort([]         ), []         )
assertEqual(sort([-1,0, 1]  ), [-1, 0, 1] )

How many tests cases should we write to be convinced that the actual
behaviour of the function is the same as its specification? Clearly,
it is impossible to write tests cases for all possible input values,
here all lists of integers, the art of testing is finding individual
input values that are representative of a large part of the input
space. We hope that the test cases are exhaustive to cover the
specification. xUnit tools offer no support for this and this is where
PBT and PBT Tools like PropEr and QuickCheck come in.

PBT introduces testing with a large set of random input values and
verifying that the specification holds for each input value
selected. Functions used to generate input values, generators, are
specified using rules and can be simply composed together to construct
complicated values. So, a property based test for the function above
may look like:

FOREACH({I, J, InputList},  {nat(), nat(), integer_list()},
    SUCHTHAT(I < J andalso J < length(InputList),
    SortedList = sort(InputList)
    length(SortedList) == length(InputList)
    andalso
    lists:get(SortedList, I) =< lists:get(SortedList, J))

The property above works as follows

  • Generate a random list of integers InputList and two natural numbers
    I, J, such that I < J < size of InputList
  • Check that size of sorted and input lists is the same.
  • Check that element with smaller index I is less than or equal to
    element with larger index J in SortedList.

Notice in the property above, we specify property. Verification of
the property based on random input values will be done by the property
based tool, therefore we can generated a large number of tests cases
with random input values and have a higher level of confidence that
the function when using unit tests alone.

But it does not stop at generation of input parameters. If you have
more complex tests where you have to generate a series of events and
keep track of some state then your PBT tool will generate random
sequences of events which corresponds to legal sequences of events and
test that your system behaves correctly for all sequences.

So when you have written a property with associated generators you
have in fact created something that can create numerous test cases -
you just have to tell your PBT tool how many test cases you want to
check the property on.

Shrinking the bar

At this point you might still have the feeling that introducing the
notion of some sort of generators to your unit testing tool of choice
would bring you on par with PBT tools, but wait there is more to
come.

When a PBT tool creates a test case that fails there is real chance
that it has created a long test case or some big input parameters -
trying to debug that is very much like receiving a humongous log from
a system in the field and try to figure out what cause the system to
fail.

Enter shrinking…

When a test case fails the PBT tool will try to shrink the failing
test case down to the essentials by stripping out input elements or
events that does not cause the failure. In most cases this results in
a very short counterexample that clearly states which events and
inputs are required to break a property.

As we go through some concrete examples later the effects of shrinking
will be shown.

Shrinking makes it a lot easier to debug problems and is as key to the
strength of PBT as the generators.

Converting a unit test

We will now take a look at one possible way of translating a unit
test into a PBT setting.

The example comes from Eric Merritt and is about the add/2 function in
the ec_dictionary instance ec_gb_trees.

The add function has the following spec:

-spec add(ec_dictionary:key(), ec_dictionary:value(), Object::dictionary()) ->
          dictionary().

and it is supposed to do the obvious: add the key and value pair to
the dictionary and return a new dictionary.

Eric states his basic expectations as follows:

  1. I can put arbitrary terms into the dictionary as keys
  2. I can put arbitrary terms into the dictionary as values
  3. When I put a value in the dictionary by a key, I can retrieve that same value
  4. When I put a different value in the dictionary by key it does not change other key value pairs.
  5. When I update a value the new value in available by the new key
  6. When a value does not exist a not found exception is created

The first two expectations regarding being able to use arbritrary
terms as keys and values is a job for generators.

The latter four are prime candidates for properties and we will create
one for each of them.

Generators

key() -> any().

value() -> any().

For PropEr this approach has the drawback that creation and shrinking
becomes rather time consuming, so it might be better to narrow to
something like this:

key() -> union([integer(),atom()]).

value() -> union([integer(),atom(),binary(),boolean(),string()]).

What is best depends on the situation and intended usage.

Now, being able to generate keys and values is not enough. You also
have to tell PropEr how to create a dictionary and in this case we
will use a symbolic generator (detail to be explained later).

sym_dict() ->
    ?SIZED(N,sym_dict(N)).

sym_dict(0) ->
    {'$call',ec_dictionary,new,[ec_gb_trees]};
sym_dict(N) ->
    ?LAZY(
       frequency([
                  {1, {'$call',ec_dictionary,remove,[key(),sym_dict(N-1)]}},
                  {2, {'$call',ec_dictionary,add,[value(),value(),sym_dict(N-1)]}}
                 ])).

sym_dict/0 uses the ?SIZED macro to control the size of the
generated dictionary. PropEr will start out with small numbers and
gradually raise it.

sym_dict/1 is building a dictionary by randomly adding key/value
pairs and removing keys. Eventually the base case is reached which
will create an empty dictionary.

The ?LAZY macro is used to defer the calculation of the
sym_dict(N-1) until they are needed and frequency/1 is used
to ensure that twice as many adds compared to removes are done. This
should give rather more interesting dictionaries in the long run, if
not one can alter the frequencies accondingly.

But does it really work?

That is a good question and one that should always be asked when
looking at genetors. Fortunately there is a way to see what a
generator produces provided that the generator functions are exported.

Hint: in most cases it will not hurt to throw in a
-compile(export_all). in the module used to specify the
properties. And here we actually have a sub-hint: specify the
properties in a separate file to avoid peeking inside the
implementation! Base the test on the published API as this is what the
users of the code will be restricted to.

When the test module has been loaded you can test the generators by
starting up an Erlang shell (this example uses the erlware_commons
code so get yourself a clone to play with):

$ erl -pz ebin -pz test
1> proper_gen:pick(ec_dictionary_proper:key()).
{ok,4}
2> proper_gen:pick(ec_dictionary_proper:key()).
{ok,35}
3> proper_gen:pick(ec_dictionary_proper:key()).
{ok,-5}
4> proper_gen:pick(ec_dictionary_proper:key()).
{ok,48}
5> proper_gen:pick(ec_dictionary_proper:key()).
{ok,'36\207_là ´?\nc'}
6> proper_gen:pick(ec_dictionary_proper:value()).
{ok,2}
7> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-14}
8> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-3}
9> proper_gen:pick(ec_dictionary_proper:value()).
{ok,27}
10> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-8}
11> proper_gen:pick(ec_dictionary_proper:value()).
{ok,[472765,17121]}
12> proper_gen:pick(ec_dictionary_proper:value()).
{ok,true}
13> proper_gen:pick(ec_dictionary_proper:value()).
{ok,<<>>}
14> proper_gen:pick(ec_dictionary_proper:value()).
{ok,<<89,69,18,148,32,42,238,101>>}
15> proper_gen:pick(ec_dictionary_proper:sym_dict()).
{ok,{'$call',ec_dictionary,add,
        [[114776,1053475],
         'fª20\227\215',
         {'$call',ec_dictionary,add,
             ['',true,
              {'$call',ec_dictionary,add,
                  ['2^Ø¡',
                   [900408,886056],
                   {'$call',ec_dictionary,add,[[48618|...],<<...>>|...]}]}]}]}}
16> proper_gen:pick(ec_dictionary_proper:sym_dict()).
{ok,{'$call',ec_dictionary,add,
        [10,'a¯\21431fõC',
         {'$call',ec_dictionary,add,
             [false,-1,
              {'$call',ec_dictionary,remove,
                  ['d·ÉV÷[',
                   {'$call',ec_dictionary,remove,[12,{'$call',...}]}]}]}]}}

That does not look too bad, so we will continue with that for now.

Properties of add/2

The first expectation Eric had about how the dictionary works was that
if a key had been stored it could be retrieved.

One way of expressing this could be with this property:

prop_get_after_add_returns_correct_value() ->
    ?FORALL({Dict,K,V}, {sym_dict(),key(),value()},
         begin
             try ec_dictionary:get(K,ec_dictionary:add(K,V,Dict)) of
                    V ->
                        true;
                    _ ->
                        false
             catch
                   _:_ ->
                       false
             end
          end).

This property reads that for all dictionaries get/2 using a key
from a key/value pair just inserted using the add/3 function
will return that value. If that is not the case the property will
evaluate to false.

Running the property is done using proper:quickcheck/1:

proper:quickcheck(ec_dictionary_proper:prop_get_after_add_returns_correct_value()).
....................................................................................................
OK: Passed 100 test(s).
true

This was as expected, but at this point we will take a little detour
and introduce a mistake in the ec_gb_trees implementation and see
how that works.

Signatures – Reusable, Toolable, Testable Types

This tutorial is brought to you by ErlangCamp 2011 – Boston, August 12th and 13th – It’s gonna be totally sweet!

It often occurs in coding that we need a library, a set of functionality. Often there are several algorithms that could provide
this functionality. However, the code that uses it, either doesn’t care about the individual algorithm or wishes to delegate choosing that algorithm to some higher level. Lets take the concrete example of dictionaries. A dictionary provides the ability to access a value via a key (other things as well but primarily this). There are may ways to implement a dictionary. Just a few are:

Each of these approaches has their own performance characteristics,  memory footprints etc. For example, a table of size n with open addressing has no collisions and holds up to n elements, with a single comparison for successful lookup, and a table of size n with chaining and k keys has the minimum max(0, k-n) collisions and O(1 + k/n) comparisons for lookup.  For skip lists the performance characteristics are about as good as that of randomly built binary search trees – namely (O log n). So the choice of which to select depends very much on memory available, insert/read characteristics, etc. So delegating the choice to a single point in your code is a very good idea. Unfortunately, in Erlang that’s not so easy to do at the moment.

Other languages, have built-in support for this functionality. Java has InterfacesSML has Signatures. Erlang, though, doesn’t currently support this model, at least not directly. There are a few ways you can approximate it. One way is to pass the Module name to the calling functions along with the data that it is going to be called on.

add(ModuleToUse, Key, Value, DictData) ->
    ModuleToUse:add(Key, Value, DictData).

This works, and you can vary how you want to pass the data. For example, you could easily use a tuple to contain the data. That is, you could pass in {ModuleToUse, DictData} and that would make it a bit cleaner.

add(Key, Value, {ModuleToUse, DictData}) ->
    ModuleToUse:add(Key, Value, DictData).

Either way, there are a few problems with this approach. One of the biggest is that you lose code locality, by looking at this bit of code you don’t know what ModuleToUse is at all. You would need to follow the call chain up to figure out what it is. Also it may not be obvious what is actually happening. The fact that ModuleToUse is a variable name obscures the code making it harder to understand. The other big problem is that the tools provided with Erlang can’t help find mistakes that you might have made. Tools like Xref and Dialyzer have just as hard a time figuring out the what ModuleToUse is pointing to as you do. So they can’t give you warnings about potential problems. In fact someone could inadvertently pass an unexpected function name as ModuleToUse and you would never get any warnings, just an exception at run time.

Fortunately, Erlang is a pretty flexible language so we can use a similar approach with a few adjustments to give us the best of both worlds. Both the flexibility of ignoring a specific implementation and keeping all the nice locality we get by using an explicit module name.

So what we actually want to do is something mole like this:

add(Key, Value, DictData) ->
    dictionary:add(Key, Value, DictData).

Doing this we retain the locality. We can easily look up the dictionary Module. We immediately have a good idea what a dictionary actually is and we know what functions we are calling. Also, all the tools know what a dictionary is as well and how to check that your code is calling it correctly. For all of these reasons, this is a much better approach to the problem. This is what Signatures are all about.

Signatures

How do we actually do this in Erlang when Erlang is missing what Java, SML and friends has built-in?

The first thing we need to do is to define a Behaviour for our functionality. To continue our example we will define a Behaviour for dictionaries. That Behaviour looks like this:

-module(ec_dictionary).

-export([behaviour_info/1]).

behaviour_info(callbacks) ->
    [{new, 0},
     {has_key, 2},
     {get, 2},
     {add, 3},
     {remove, 2},
     {has_value, 2},
     {size, 1},
     {to_list, 1},
     {from_list, 1},
     {keys, 1}];
behaviour_info(_) ->
    undefined.

So we have our Behaviour now. Unfortunately, this doesn’t give us much yet. It will make sure that any dictionaries we write will have all the functions they need, but it wont help us actually use those dictionaries in an abstract way in our code. To do that we need to add a bit of functionality. We do that by actually implementing our own behaviour, starting with new/1.

%% @doc create a new dictionary object from the specified module. The
%% module should implement the dictionary behaviour.
%%
%% @param ModuleName The module name.
-spec new(module()) -> dictionary(_K, _V).
new(ModuleName) when is_atom(ModuleName) ->
    #dict_t{callback = ModuleName, data = ModuleName:new()}.

This code creates a new dictionary for us. Or to be more specific it actually creates a new dictionary Signature record, that will be used subsequently in other calls. This might look a bit familiar from our previous less optimal approach. We have both the module name and the data. here in the record. We call the module name named in ModuleName to create the initial data. We then construct the record and return that record to the caller and we have a new dictionary. What about the other functions; the ones that don’t create a dictionary but make use of it. Let’s take a look at the implementations of two kinds of functions, one that updates the dictionary and another that just retrieves data.

The first we will look at is the one that updates the dictionary by adding a value.

%% @doc add a new value to the existing dictionary. Return a new
%% dictionary containing the value.
%%
%% @param Dict the dictionary object to add too
%% @param Key the key to add
%% @param Value the value to add
-spec add(key(K), value(V), dictionary(K, V)) -> dictionary(K, V).
add(Key, Value, #dict_t{callback = Mod, data = Data} = Dict) ->
    Dict#dict_t{data = Mod:add(Key, Value, Data)}.

There are two key things here.

  1. The dictionary is deconstructed so we can get access to the data
    and the callback module.
  2. We modify the dictionary record with the new data and return that
    modified record.

This is the same approach that you will use for any Signature that updates data. As a side note, notice that we are calling the concrete implementation to do the work itself.

Now let’s do a data retrieval function. In this case, the get function of the dictionary Signature.

%% @doc given a key return that key from the dictionary. If the key is
%% not found throw a 'not_found' exception.
%%
%% @param Dict The dictionary object to return the value from
%% @param Key The key requested
%% @throws not_found when the key does not exist
-spec get(key(K), dictionary(K, V)) -> value(V).
get(Key, #dict_t{callback = Mod, data = Data}) ->
    Mod:get(Key, Data).

In this case, you can see a very similar approach to deconstructing the dict record. We still need to pull out the callback module and the data itself and call the concrete implementation of the algorithm. In this case, we return the data returned from the call, not the record itself.

That is really all you need to define a Signature. There is a complete implementation in erlware_commons/ec_dictionary.

Using Signatures

It’s a good idea to work through an example so we have a bit better idea of how to use these Signatures. If you are like me, you probably have some questions about what kind of performance burden this places on the code. At the very least we have an additional function call along with the record deconstruction. This must add some overhead. So lets write a little timing test, so we can get a good idea of how much this is all costing us.

In general, there are two kinds of concrete implementations for Signatures. The first is a native implementations, the second is a
wrapper.

Native Signature Implementations

A Native Signature Implementation is just that, a module that implements the Behaviour defined by a Signature directly. For most user defined Signatures this is going to be the norm. In our current example, the erlware_commons/ec_rbdict module is the best example of a Native Signature Implementation. It implements the ec_dictionary module directly.

Signature Wrappers

A Signature Wrapper is a module that wraps another module. Its purpose is to help a pre-existing module implement the Behaviour defined by a Signature. A good example of this in our current example is the erlware_commons/ec_dict module. It implements the ec_dictionary Behaviour, but all the functionality is provided by the stdlib/dict module itself. Lets take a look at one example to see how this is done.

We will take a look at one of the functions we have already seen. The get function an ec_dictionary get doesn’t have quite the same semantics as any of the functions in the dict module. So a bit of translation needs to be done. We do that in the ec_dict module get function.

-spec get(ec_dictionary:key(K), Object::dictionary(K, V)) ->
           ec_dictionary:value(V).
get(Key, Data) ->
    case dict:find(Key, Data) of
    {ok, Value} ->
        Value;
     error ->
        throw(not_found)
    end.

So the ec_dict module’s purpose for existence is to help the preexisting dict module implement the Behaviour defined by the Signature.

Why do we bring this up here? Because we are going to be looking at timings, and Signature Wrappers add an extra level of indirection to the mix and that adds a bit of additional overhead.

Creating the Timing Module

We are going to create timings for both Native Signature Implementations and Signature Wrappers.

Lets get started by looking at some helper functions. We want dictionaries to have a bit of data in them. So to that end we will create a couple of functions that create dictionaries for each type we want to test. The first we want to time is the Signature Wrapper, so dict vs ec_dict called as a Signature.

create_dict() ->
lists:foldl(fun(El, Dict) ->
        dict:store(El, El, Dict)
    end, dict:new(),
    lists:seq(1,100)).

The only thing we do here is create a sequence of numbers 1 to 100, and then add each of those to the dict as an entry. We aren’t too worried about replicating real data in the dictionary. We care about timing the function call overhead of Signatures, not the performance of the dictionaries themselves.

We need to create a similar function for our Signature based dictionary ec_dict.

create_dictionary(Type) ->
lists:foldl(fun(El, Dict) ->
        ec_dictionary:add(El, El, Dict)
    end,
    ec_dictionary:new(Type),
    lists:seq(1,100)).

Here we actually create everything using the Signature. So we don’t need one function for each type. We can have one function that can create anything that implements the Signature. That is the magic of Signatures. Otherwise, this does the exact same thing as the dict create_dict/1.

We are going to use two function calls in our timing. One that updates data and one that returns data, just to get good coverage. For our dictionaries we are going to use the size function as well as the add function.

time_direct_vs_signature_dict() ->
    io:format("Timing dict~n"),
    Dict = create_dict(),
    test_avg(fun() ->
             dict:size(dict:store(some_key, some_value, Dict))
         end,
         1000000),
    io:format("Timing ec_dict implementation of ec_dictionary~n"),
    time_dict_type(ec_dict).

The test_avg function runs the provided function the number of times specified in the second argument and collects timing information. We are going to run these one million times to get a good average (its fast so it doesn’t take long). You can see that in the anonymous function that we directly call dict:size/1 and dict:store/3 to perform the test. However, because we are in the wonderful world of Signatures we don’t have to hard code the calls for the Signature implementations. Lets take a look at the time_dict_type function.

time_dict_type(Type) ->
    io:format("Testing ~p~n", [Type]),
    Dict = create_dictionary(Type),
    test_avg(fun() ->
         ec_dictionary:size(ec_dictionary:add(some_key, some_value, Dict))
         end,
         1000000).

As you can see we take the type as an argument (we need it for dict creation) and call our create function. Then we run the same timings that we did for ec dict. In this case though, the type of dictionary is never specified, we only ever call ec_dictionary, so this test will work for anything that implements that Signature.

dict vs ec_dict Results

So we have our tests, what was the result. Well on my laptop this is what it looked like.

Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.2  (abort with ^G)

1> ec_timing:time_direct_vs_signature_dict().
Timing dict
Range: 2 - 5621 mics
Median: 3 mics
Average: 3 mics
Timing ec_dict implementation of ec_dictionary
Testing ec_dict
Range: 3 - 6097 mics
Median: 3 mics
Average: 4 mics
2>

So for the direct dict call, we average about 3 mics per call, while for the Signature Wrapper we average around 4. Thats a 25% cost for Signature Wrappers in this example, for a very small number of calls. Depending on what you are doing that is going to be greater or lesser. In any case, we can see that there is some cost associated with the Signature Wrapper Implementations.

What about native Signatures though? Lets take a look at ec_rbdict. The ec_rbdict also implements the ec_dictionary Signature, but it is not a Signature Wrapper. It is a native implementation of the Signature. To use ec_rbdict directly we have to create a creation helper just like we did for dict.

create_rbdict() ->
lists:foldl(fun(El, Dict) ->
        ec_rbdict:add(El, El, Dict)
    end, ec_rbdict:new(),
    lists:seq(1,100)).

This is exactly the same as create_dict with the exception that dict is replaced by ec_rbdict.

The timing function itself looks very similar as well. Again notice that we have to hard code the concrete name for the concrete
implementation, but we don’t for the ec_dictionary test.

time_direct_vs_signature_rbdict() ->
    io:format("Timing rbdict~n"),
    Dict = create_rbdict(),
    test_avg(fun() ->
         ec_rbdict:size(ec_rbdict:add(some_key, some_value, Dict))
         end,
         1000000),
   io:format("Timing ec_dict implementation of ec_dictionary~n"),
   time_dict_type(ec_rbdict).

And there we have our test. What do the results look like?

ec_dict vs ec_rbdict as an ec_dictionary Results

The main thing we are timing here is the additional cost of the dictionary Signature itself. Keep that in mind as we look at the
results.

Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.2  (abort with ^G)

1> ec_timing:time_direct_vs_signature_rbdict().
Timing rbdict
Range: 6 - 15070 mics
Median: 7 mics
Average: 7 mics
Timing ec_dict implementation of ec_dictionary
Testing ec_rbdict
Range: 6 - 6013 mics
Median: 7 mics
Average: 7 mics
2>

So no difference it time. Well the reality is that there is a difference in timing, there must be, but we don’t have enough resolution in the timing system to be able to figure out what that difference is. Essentially that means it’s really, really small – or small enough not to worry about at the very least.

Conclusion

Signatures are a viable, useful approach to the problem of interfaces in Erlang. The have little or no over head depending on the type of implementation, and greatly increase the flexibility of the a library while retaining testability and locality.

Terminology

Behaviour
: A normal Erlang Behaviour that defines a contract

Signature
: A combination of an Behaviour and functionality to make the functions callable in a concrete way

Native Signature Implementation
: A module that implements a signature directly

Signature Wrapper
: A module that does translation between a preexisting module and a Signature, allowing the preexisting module to be used as a Signature Implementation.

Code Referenced