21 June 2011

Signatures - Reusable, Toolable, Testable Types

This tutorial is brought to you by ErlangCamp 2011 - Boston, August 12th and 13th - It’s gonna be totally sweet!
It often occurs in coding that we need a library, a set of functionality. Often there are several algorithms that could provide
this functionality. However, the code that uses it, either doesn’t care about the individual algorithm or wishes to delegate choosing that algorithm to some higher level. Lets take the concrete example of dictionaries. A dictionary provides the ability to access a value via a key (other things as well but primarily this). There are may ways to implement a dictionary. Just a few are:

Associative Arrays
Binary Trees
Hash Tables
Skip Lists
Many, many more ….
Each of these approaches has their own performance characteristics, memory footprints etc. For example, a table of size n with open addressing has no collisions and holds up to n elements, with a single comparison for successful lookup, and a table of size n with chaining and k keys has the minimum max(0, k-n) collisions and O(1 + k/n) comparisons for lookup. For skip lists the performance characteristics are about as good as that of randomly built binary search trees - namely (O log n). So the choice of which to select depends very much on memory available, insert/read characteristics, etc. So delegating the choice to a single point in your code is a very good idea. Unfortunately, in Erlang that’s not so easy to do at the moment.

Other languages, have built-in support for this functionality. Java has Interfaces, SML has Signatures. Erlang, though, doesn’t currently support this model, at least not directly. There are a few ways you can approximate it. One way is to pass the Module name to the calling functions along with the data that it is going to be called on.

Fortunately, Erlang is a pretty flexible language so we can use a similar approach with a few adjustments to give us the best of both worlds. Both the flexibility of ignoring a specific implementation and keeping all the nice locality we get by using an explicit module name.

So what we actually want to do is something mole like this:

Signatures

How do we actually do this in Erlang when Erlang is missing what Java, SML and friends has built-in?

The first thing we need to do is to define a Behaviour for our functionality. To continue our example we will define a Behaviour for dictionaries. That Behaviour looks like this:

-export([behaviour_info/1]).

behaviour_info(callbacks) ->
[{new, 0},
{has_key, 2},
{get, 2},
{add, 3},
{remove, 2},
{has_value, 2},
{size, 1},
{to_list, 1},
{from_list, 1},
{keys, 1}];
behaviour_info(_) ->
undefined.

The first we will look at is the one that updates the dictionary by adding a value.

The dictionary is deconstructed so we can get access to the data
and the callback module.
We modify the dictionary record with the new data and return that
modified record.
This is the same approach that you will use for any Signature that updates data. As a side note, notice that we are calling the concrete implementation to do the work itself.

Now let’s do a data retrieval function. In this case, the get function of the dictionary Signature.

That is really all you need to define a Signature. There is a complete implementation in erlware_commons/ec_dictionary.

Using Signatures

It’s a good idea to work through an example so we have a bit better idea of how to use these Signatures. If you are like me, you probably have some questions about what kind of performance burden this places on the code. At the very least we have an additional function call along with the record deconstruction. This must add some overhead. So lets write a little timing test, so we can get a good idea of how much this is all costing us.

In general, there are two kinds of concrete implementations for Signatures. The first is a native implementations, the second is a
wrapper.

Native Signature Implementations

A Native Signature Implementation is just that, a module that implements the Behaviour defined by a Signature directly. For most user defined Signatures this is going to be the norm. In our current example, the erlware_commons/ec_rbdict module is the best example of a Native Signature Implementation. It implements the ec_dictionary module directly.

Signature Wrappers

A Signature Wrapper is a module that wraps another module. Its purpose is to help a pre-existing module implement the Behaviour defined by a Signature. A good example of this in our current example is the erlware_commons/ec_dict module. It implements the ec_dictionary Behaviour, but all the functionality is provided by the stdlib/dict module itself. Lets take a look at one example to see how this is done.

We will take a look at one of the functions we have already seen. The get function an ec_dictionary get doesn’t have quite the same semantics as any of the functions in the dict module. So a bit of translation needs to be done. We do that in the ec_dict module get function.

Why do we bring this up here? Because we are going to be looking at timings, and Signature Wrappers add an extra level of indirection to the mix and that adds a bit of additional overhead.

Creating the Timing Module

We are going to create timings for both Native Signature Implementations and Signature Wrappers.

Lets get started by looking at some helper functions. We want dictionaries to have a bit of data in them. So to that end we will create a couple of functions that create dictionaries for each type we want to test. The first we want to time is the Signature Wrapper, so dict vs ec_dict called as a Signature.

We need to create a similar function for our Signature based dictionary ec_dict.

We are going to use two function calls in our timing. One that updates data and one that returns data, just to get good coverage. For our dictionaries we are going to use the size function as well as the add function.

`dict` vs `ec_dict` Results

So we have our tests, what was the result. Well on my laptop this is what it looked like.

Eshell V5.8.2 (abort with ^G)

1> ec_timing:time_direct_vs_signature_dict().
Timing dict
Range: 2 - 5621 mics
Median: 3 mics
Average: 3 mics
Timing ec_dict implementation of ec_dictionary
Testing ec_dict
Range: 3 - 6097 mics
Median: 3 mics
Average: 4 mics
2>

What about native Signatures though? Lets take a look at ec_rbdict. The ec_rbdict also implements the ec_dictionary Signature, but it is not a Signature Wrapper. It is a native implementation of the Signature. To use ec_rbdict directly we have to create a creation helper just like we did for dict.

The timing function itself looks very similar as well. Again notice that we have to hard code the concrete name for the concrete
implementation, but we don’t for the ec_dictionary test.

`ec_dict` vs `ec_rbdict` as an `ec_dictionary` Results

The main thing we are timing here is the additional cost of the dictionary Signature itself. Keep that in mind as we look at the
results.

Eshell V5.8.2 (abort with ^G)

1> ec_timing:time_direct_vs_signature_rbdict().
Timing rbdict
Range: 6 - 15070 mics
Median: 7 mics
Average: 7 mics
Timing ec_dict implementation of ec_dictionary
Testing ec_rbdict
Range: 6 - 6013 mics
Median: 7 mics
Average: 7 mics
2>

Conclusion

Signatures are a viable, useful approach to the problem of interfaces in Erlang. The have little or no over head depending on the type of implementation, and greatly increase the flexibility of the a library while retaining testability and locality.

Terminology

Behaviour: A normal Erlang Behaviour that defines a contract
Signature: A combination of an Behaviour and functionality to make the functions callable in a concrete way
Native Signature Implementation: A module that implements a signature directly
Signature Wrapper: A module that does translation between a preexisting module and a Signature, allowing the preexisting module to be used as a Signature Implementation.

Signatures - Reusable, Toolable, Testable Types

Signatures

Using Signatures

Native Signature Implementations

Signature Wrappers

Creating the Timing Module

`dict` vs `ec_dict` Results

`ec_dict` vs `ec_rbdict` as an `ec_dictionary` Results

Conclusion

Terminology

Code Referenced

Property based testing for unit testers with PropEr - Part 1

Erlang PubNub Client and Chat

Signatures

Using Signatures

Native Signature Implementations

Signature Wrappers

Creating the Timing Module

dict vs ec_dict Results

ec_dict vs ec_rbdict as an ec_dictionary Results

Conclusion

Terminology

Code Referenced

`dict` vs `ec_dict` Results

`ec_dict` vs `ec_rbdict` as an `ec_dictionary` Results