Skip to content

OpaDo Data Storage

OpaDo (a port of the TodoMVC app to Opa) now persists todo items to the Opa database. The new version is up on dotcloud, http://opado-tristan.sloughter.dotcloud.com/

I’ve added a todo_item type which stores the item’s value and two other attributes we won’t use until the next post when we have user accounts for their own todo_item stores.

type todo_item = { user_id : string
                 ; value : string
                 ; created_at : string
                 }

To tell Opa where to store the records we’ll create, we provide a path to the Opa db function and set its type. For our todo items we use a stringmap since currently the id’s are randomly generated strings (I know, I know, but its just an example!). We can then reference a record in the database with the path /todo_item[some_id_string].

db /todo_items : stringmap(todo_item)

Now we can insert todo_item‘s to this db path as so:

/todo_items[id] <- { value=x user_id="" created_at="" }

For now user_id and created_at are empty, but I’ll be updating that when I add user accounts.

Since we are storing each item, we need to populate the list on page load with whats already stored:

add_todos() =
  items = /todo_items
  StringMap.iter((x, y -> add_todo_to_page(x, y.value)), items)

The first line of the function sets the variable items to all the todo_item records in the database. We use StringMap.iter to take each todo_item and add it to the page. The first argument to the anonymous function is the id the item is stored in the database with (the id we will use in the HTML as well) and the second is the actual todo_item, so we take its value field and pass that to the add_todo_to_page function along with the id.

To have the add_todos function when the list element is ready we add an on_ready attribute that will call add_todos:

<ul id=#todo_list onready={_ -> add_todos() } ></ul>

Lastly, we want to be able to delete a todo_item from the database:

remove_item(id: string) =
  do Dom.remove(Dom.select_parent_one(#{id}))
  do Db.remove(@/todo_items[id])
  update_counts()

remove_all_done() =
  Dom.iter(x -> remove_item(Dom.get_id(x)), Dom.select_class("done"))

The main piece to notice here is @/todo_items[id] in Db.remove(). The @ is saying that we are passing the path itself to remove() and not the value at that path.

Nice and easy! No database to setup or deploy, just Opa. Next time we’ll add user accounts, so we don’t have to all share the same todo list.

TodoMVC in Opa

Edit: I just learned that dotcloud supports Opa! So I’ve pushed OpaDo and you can see a demo here http://opado-tristan.sloughter.dotcloud.com/

I wanted something quick and simple to do in Opa to give it a try so I decided to implement the TodoMVC example that has been redone in almost all Javascript frameworks, https://github.com/addyosmani/todomvc.

The code can be found on GitHub here: https://github.com/tsloughter/OpaDo

Opa is unique in that it is not only a new language but also a new web server and database. While Opa’s page pushes the idea that its for the cloud and its easy distribution, I found the nicest part being the static typing and no need for Javascript.

The functions below handle interactions with the Todo items. It somewhat reminds me of Lift but taken even farther.

/** * {1 User interface} */
update_counts() =
  num_done = Dom.length(Dom.select_class("done"))
  total = Dom.length(Dom.select_class("todo"))
  do Dom.set_text(#number_done, Int.to_string(num_done))
  Dom.set_text(#number_left, Int.to_string(total - num_done))

make_done(id: string) =
  do if Dom.is_checked(Dom.select_inside(#{id}, Dom.select_raw("input"))) then Dom.add_class(#{id}, "done")
  else
    Dom.remove_class(#{id}, "done")

  update_counts()

remove_item(id: string) =
  do Dom.remove(#{id})
  update_counts()

remove_all_done() =
  do Dom.remove(Dom.select_parent_one(Dom.select_class("done")))
  update_counts()

add_todo(x: string) =
  id = Random.string(8)
  li_id = Random.string(8)
  line = <li id={ li_id }><div class="todo" id={ id }> <div class="display"> <input class="check" type="checkbox" onclick={_ -> make_done(id) } /> <div class="todo_content">{ x }</div> <span class="todo_destroy" onclick={_ -> remove_item(li_id) }></span> </div> <div class="edit"> <input class="todo-input" type="text" value="" /> </div> </div></li>
  do Dom.transform([#todo_list +<- line ])
  do Dom.scroll_to_bottom(#todo_list)
  do Dom.set_value(#new_todo, "")
  update_counts()

It is unique in combining the HTML into the language itself. Some have argued against this but when it works well it makes perfect sense. I don’t want to have to convert a designers HTML into some other representation! And being able to have type checked dynamic functionality within the HTML is a boon. Even with just this simple program, I found the usefulness of the type checker outstanding.

Next we have the main outline of the page and the entry part for the program.

start() =
  <div id="todoapp"> <div class="title"> <h1>Todos</h1> </div>
    <div class="content"> <div id=#create_todo> <input id=#new_todo placeholder="What needs to be done?" type="text" onnewline={_ -> add_todo(Dom.get_value(#new_todo)) } /> </div>
      <div id=#todos> <ul id=#todo_list></ul> </div>

      <div id="todo_stats"> <span class="todo_count"> <span id=#number_left class="number">0</span> <span class="word">items</span> left. </span> <span class="todo_clear"> <a href="#" onclick={_ -> remove_all_done() }> Clear <span id=#number_done class="number-done">0</span> completed <span class="word-done">items</span> </a> </span> </div>
    </div>
  </div>

/** * {1 Application} */

/** * Main entry point. */
server = Server.one_page_bundle("Todo",
       [@static_resource_directory("resources")],
       ["resources/todos.css"], start)

It won’t be able to replace my use of Erlang for the backend and Coffeescript for the frontend, but it looks very promising.

I’ll be extending this example to include persistence, sessions and users and will add posts as I complete those.

Centos6 : Chef Node Creation

I thought I’d share the scripts I use to take a fresh Centos6 install and have it configured to work with a Chef server. Maybe its not as easy as when running in a virtualized environment, but it saves plenty of time.

On the new node I run the setup_client.sh script which calls in the end client_gen.sh on the Chef server once everything is installed on the node. I left the version numbers for Ruby and Chef in the script so you know what versions I’ve tested this with.

setup_client.sh

#!/bin/bash
CHEF_IP=XXX.XXX.XXX.XXX
CHEF=http://$CHEF_IP:4000
CHEF_USER=XXXXX
NODE=XXXXXXXX
RUBY_VSN=1.3.7
CHEF_VSN=0.9.16

sudo rpm -Uvh http://download.fedora.redhat.com/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm

sudo yum update

sudo yum install ruby ruby-shadow ruby-ri ruby-rdoc gcc gcc-c++ ruby-devel ruby-static

cd /tmp
wget http://production.cf.rubygems.org/rubygems/rubygems-$RUBY_VSN.tgz
tar zxf rubygems-$RUBY_VSN.tgz
cd rubygems-$RUBY_VSN
sudo ruby setup.rb --no-format-executable

sudo gem install chef -v $CHEF_VSN 

mkdir ~/.chef

cat > ~/.chef/knife.rb <<EOF log_level :info log_location STDOUT node_name '$NODE' client_key '/home/$USER/.chef/$NODE.pem' validation_client_name 'chef-validator' validation_key '/etc/chef/validation.pem' chef_server_url '$CHEF' cache_type 'BasicFile' cache_options( :path => '/home/$USER/.chef/checksums' ) EOF

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub $CHEF_IP

ssh $CHEF_IP "yes | knife client delete $NODE"
ssh $CHEF_IP "yes | /home/$CHEF_USER/client_gen.sh $NODE"
scp $CHEF_IP:/tmp/$NODE ~/.chef/$NODE.pem

client_gen.sh

#!/bin/bash
knife client create $1 -n -a -f /tmp/$1
knife node create $1 --no-editor

Mixed Erlang and Scala with Scalang

This is a summary of a talk by Cliff Moon @moonpolysoft given at Strangeloop about building mixed Erlang and Scala systems with Scalang. Boundary does network analytics as a service. Their architecture uses a mixture of Erlang and Scala. Erlang is very very good at doing things like no down time deploys. We can have very low downtime on public facing parts of the system and we don’t even have to go down for deploys. On the data processing side one of the things that erlang is very bad at is dealing with numbers, and generally anything where mutability has a high value.

Trying to make the scala side talk do the Erlang side was required to handle the language choices for the system. Turns out that Erlang ships with Jinterface which is just the thing – or so it seem. Unfortunately it ended up being really really cumbersome. Jinterface is at the wrong level of abstraction. Erlang is all about actors and Jinterface only exposes mailboxes. All the rich interface you get in erlang with actors goes away when you are stuck with only mailboxes. The other problem is it is not performant. Primitives end up getting wrapped twice, first by Jinterface and then by a case class in scala which is just want to heavy weight when trying to process millions of pieces of data.

They decided to take a step back, something that would be easier to use. They were looking for more correctness in behavior, things kind of behave like Erlang actors. They wanted performance, and then simplicity; not having to deal with custom serializers and other such cruft. The internal architecture is built on NIO sockets and Netty. There are also a bunch of codecs to do encoding and decoding between erlang and scala. There is also a delivery system wich deals with registration and actors which run in Jetlang – an actor framework for the JVM.

The main interface into the system on the JVM side is something called a Node – this should be very familiar to Erlangers. It takes a node name and a magic cookie. So, pretty much exactly what you would expect.

Once you have a node you want to make a process. Processes are spawned and messages are sent with the ! operator, just like in Erlang. You can send messages to a Pid, you can send messages to a local registered name, and you can send messages to a remote registered name by supplying a name node tuple. So basically just like Erlang.

Cliff Moon talking about processes in scalang

Error Handing in Scalang

Scalang fires a link breakage exit signal anytime a Scalang process throws an uncaught exception. It works between Erlang and JVM. The one problem is that this is not preemptive on the JVM side as lightweight preemptive actors on the JVM side seems hard to do.

Erlang to Scala Type Mappings

Most things are a one to one mapping for primitives. Anything that does not fit, like numbers, will be tured into something reasonable on the scala side. If just that is not quite good enough for you; you wnat to do rich type mappings. You can use a rich type mapping plugin to turn rich types into records and vice-versa.

Scalang Services

One of the big things about Erlang is OTP. You typically use gen_servers, behaviors. These behaviors give you messaging primitives for sync and async and lots of other good stuff. Scalang wants to be able to interact with gen_servers transparently on the other side. So three functions are implemented for Scalang processes:

handleCall
handleCast
handleInfo

These will look very familiar for most Erlangers (if you are coding in OTP like you should be). Scalang also supports anonymous processes. You can just spawn processes with funs for those times when you don’t want a gen_server.

Runtime Metrics

This is what boundary does, so they wanted to bake them into all of their JVM stuff. Scalang has a full suite of runtime metrics. You get things like meters showing how many messages have come across the wire for each process. Histograms for process performance. Time spent in serialization. Message queue sizes and quite a number of other metrics. The idea was to make it similar to pulling up a remote shell into an erlang instance and being able to query to see where the bottlenecks are.

Scalang JVM Performance Tuning

We are all about running fast here. Scalang aims to make things easily tunable. It turns out one of the best way to performance tune is to screw around with the thread pools. The ThreadPoolFactory lets you screw around with different implementation. There are 4 kinds

Boss Pool – initial connection and accept handling
Worker Pool – non blocking reads and writes
Actor Pool – process callbacks
Batch executor – per process execution logic

Editorial: The system really looks to be quite powerful. It allows for the features I describe above as well as easy remote shell invocation of JVM nodes. It actually interacts nicely with EPMD for native feeling messaging. The system seems to be abstracted more appropriately than any Erlang to X intercoms library I have run across. I look forward to hearing about experiences using it.

Martin Logan (@martinjlogan) also, if you are into distributed systems and metrics you should check out Camp DevOps Conf in Chicago this Oct

Here is where you can find the code and example usage information: https://github.com/boundary/scalang

Batman.js vs Knockout.js

The following is NOT a tutorial for either Batman.js or Knockout.js. But, it is instead a sort of side-by-side comparison of the two for creating a user creation form that POSTs the new user’s data as JSON to the backend.

The method of web development I’ve come to find the best is based on heavy frontend Javascript (though written in Coffeescript) communicating with a backend via a RESTful interface. This is appealing, because you are not cluttering the application logic with view related code. This allows me to use Erlang, my language of choice, which is great at implementing RESTful interfaces with Webmachine, but not too great for trying to build a site like you would with Rails.

I recently began using Knockout.js and found it to be a great fit for my development paradigm. When Batman.js came out, I saw that it could provide me with what Knockout.js does but also already take care of pieces I would develop myself on the frontend, like RESTful persistence.

First, we have the Knockout.js logic, written in Coffeescript, for setting up a User class and object that observers the input fields.

class @User
  constructor : ->
    firstname : ko.observable ""
    lastname : ko.observable ""
    email : ko.observable ""
    username : ko.observable ""
    password : ko.observable ""

  save : ->
    if $("form").validate().form()
      $.post('/user', ko.toJSON(this), (data) -> window.location = "/login.html"; return false ;).error () -> alert("error"); return false;

user = new User
ko.applyBindings user

Now for the HTML for displaying the fields, configuring the submit handler, binding the inputs to Knockout.js observables and setting validators.

<form data-bind="submit: save">
    <div class="new_user_box">
    <label>First Name: </label>
    <input type=text name=firstname data-bind="value: firstname" minlength=2 maxlength=25 class="required" />

    <label>Last Name: </label>
    <input type=text name=lastname data-bind="value: lastname"  minlength=2 maxlength=25 class="required" />

    <label>Username: </label>
    <input type=text name=username data-bind="value: username" remote="/user/check"  minlength=6 maxlength=25 class="required" />

    <label>Password: </label>
    <input type=password id="password1" data-bind="value: password, uniqueName: true" minlength=8 class="required password" />

    <label>Retype Password: </label>
    <input type=password data-bind="uniqueName: true"  id="password1" equalto="#password1" class="required" />

    <label>Email Address: </label>
    <input type=email name=email data-bind="value: email"  remote="/user/email_check"  class="required email" />

    <input type=submit value="Save" id="saveSubmit" />
    </div>
</form>

With Batman.js our validation will be configured in the user model. Here we only need to set @persist to Batman.RestStorage and when a model is saved with the save method it will POST the encoded fields as JSON to /users.

class CT extends Batman.App
    @global yes
    @root 'users#index'

class CT.User extends Batman.Model
    @global yes
    @persist Batman.RestStorage
    @encode 'firstname', 'lastname', 'username', 'email', 'password'
    @validate 'firstname', presence: yes, maxLength: 255
    @validate 'lastname', presence: yes, maxLength: 255
    @validate 'username', presence: yes, lengthWithin: [6,255]
    @validate 'email', presence: yes
    @validate 'password', 'passwordConfirmation', presence: yes, lengthWithin: [6,255]

class CT.UsersController extends Batman.Controller
    user: null

    index: ->
        @set 'user', new User
        return false

    create: =>
        @user.save()
        return false

CT.run()

We see below that the Batman.js HTML is a bit cleaner than that in the Knockout.js example above.

        <form data-formfor-user="controllers.users.user" data-event-submit="controllers.users.create">
          <div class="new_user_box">
          <label>First Name: </label>
          <input type=text name=firstname data-bind="user.firstname" />

          <label>Last Name: </label>
          <input type=text name=lastname data-bind="user.lastname" />

          <label>Username: </label>
          <input type=text name=username data-bind="user.username" remote="/user/check" />

          <label>Password: </label>
          <input type=password data-bind="user.password" />

          <label>Retype Password: </label>
          <input type=password data-bind="user.passwordConfirmation" />

          <label>Email Address: </label>
          <input type=email data-bind="user.email" remote="/user/email_check" />

          <input type=submit value="Save" id="saveSubmit" />
          </div>

        </form>

Property based testing for unit testers with PropEr – Part 1

This tutorial is brought to you by ErlangCamp 2011 (click here) – Boston, August 12th and 13th – It’s gonna be totally sweet!

Main contributors: Torben Hoffmann, Raghav Karol, Eric Merritt

The purpose of the short document is to help people who are familiar
with unit testing understand how property based testing (PBT) differs,
but also where the thinking is the same.

This document focusses on the PBT tool
PropEr for Erlang since that is
what I am familiar with, but the general principles applies to all PBT
tools regardless of which language they are written in.

The approach taken here is that we hear from people who are used to
working with unit testing regarding how they think when designing
their tests and how a concrete test might look.

These descriptions are then “converted” into the way it works with
PBT, with a clear focus on what stays the same and what is different.

Testing philosophies

A quote from Martin Logan (@martinjlogan):

For me unit testing is about contracts. I think about the same things
I think about when I write statements like {ok, Resp} =
Mod:Func(Args). Unit testing and writing specs are very close for me.
Hypothetically speaking lets say a function should return return {ok,
string()} | {error, term()} for all given input parameters then my
unit tests should be able to show that for a representative set of
input parameters that those contracts are honored. The art comes in
thinking about what that set is.

The trap in writing all your own tests can often be that we think
about the set in terms of what we coded for and not what may indeed be
asked of our function. As the code is tried in further exploratory
testing and in production new input parameter sets for which the given
function does not meet the stated contract are discovered and added to
the test case once a fix has been put into place.

This is a very good description of what the ground rules for unit
testing are:

  • Checking that contracts are obeyed.
  • Creating a representative set of input parameters.

The former is very much part of PBT – each property you write will
check a contract, so that thinking is the same.

xUnit vs PBT

Unit testing has become popular for software testing with the advent
of xUnit tools like jUnit for Java. xUnit like tools typically
provide a testing framework with the following functionality

  • test fixture setup
  • test case execution
  • test fixture teardown
  • test suite management
  • test status reporting and management

While xUnit tools provide a lot of functionality to execute and manage
test cases and suites, reporting results there is no focus on test
case execution step, while this is the main focus area of
property-based testing (PBT).

Consider the following function specification

sort(list::integer()) ---> list::integer() | error

A verbal specification of this function is,

For all input lists of integers, the sort function returns a sorted
list of integers.

For any other kind of argument the function returns the atom error.

The specification above may be a requirement of how the function
should behave or even how the function does behave. This distinction
is important; the former is the requirement for the function, the
latter is the actual API. Both should be the same and that is what our
testing should confirm. Test cases for this function might look like

assertEqual(sort([5,4,3,2,1]), [1,2,3,4,5])
assertEqual(sort([1,2,3,4,5]), [1,2,3,4,5])
assertEqual(sort([]         ), []         )
assertEqual(sort([-1,0, 1]  ), [-1, 0, 1] )

How many tests cases should we write to be convinced that the actual
behaviour of the function is the same as its specification? Clearly,
it is impossible to write tests cases for all possible input values,
here all lists of integers, the art of testing is finding individual
input values that are representative of a large part of the input
space. We hope that the test cases are exhaustive to cover the
specification. xUnit tools offer no support for this and this is where
PBT and PBT Tools like PropEr and QuickCheck come in.

PBT introduces testing with a large set of random input values and
verifying that the specification holds for each input value
selected. Functions used to generate input values, generators, are
specified using rules and can be simply composed together to construct
complicated values. So, a property based test for the function above
may look like:

FOREACH({I, J, InputList},  {nat(), nat(), integer_list()},
    SUCHTHAT(I < J andalso J < length(InputList),
    SortedList = sort(InputList)
    length(SortedList) == length(InputList)
    andalso
    lists:get(SortedList, I) =< lists:get(SortedList, J))

The property above works as follows

  • Generate a random list of integers InputList and two natural numbers
    I, J, such that I < J < size of InputList
  • Check that size of sorted and input lists is the same.
  • Check that element with smaller index I is less than or equal to
    element with larger index J in SortedList.

Notice in the property above, we specify property. Verification of
the property based on random input values will be done by the property
based tool, therefore we can generated a large number of tests cases
with random input values and have a higher level of confidence that
the function when using unit tests alone.

But it does not stop at generation of input parameters. If you have
more complex tests where you have to generate a series of events and
keep track of some state then your PBT tool will generate random
sequences of events which corresponds to legal sequences of events and
test that your system behaves correctly for all sequences.

So when you have written a property with associated generators you
have in fact created something that can create numerous test cases -
you just have to tell your PBT tool how many test cases you want to
check the property on.

Shrinking the bar

At this point you might still have the feeling that introducing the
notion of some sort of generators to your unit testing tool of choice
would bring you on par with PBT tools, but wait there is more to
come.

When a PBT tool creates a test case that fails there is real chance
that it has created a long test case or some big input parameters -
trying to debug that is very much like receiving a humongous log from
a system in the field and try to figure out what cause the system to
fail.

Enter shrinking…

When a test case fails the PBT tool will try to shrink the failing
test case down to the essentials by stripping out input elements or
events that does not cause the failure. In most cases this results in
a very short counterexample that clearly states which events and
inputs are required to break a property.

As we go through some concrete examples later the effects of shrinking
will be shown.

Shrinking makes it a lot easier to debug problems and is as key to the
strength of PBT as the generators.

Converting a unit test

We will now take a look at one possible way of translating a unit
test into a PBT setting.

The example comes from Eric Merritt and is about the add/2 function in
the ec_dictionary instance ec_gb_trees.

The add function has the following spec:

-spec add(ec_dictionary:key(), ec_dictionary:value(), Object::dictionary()) ->
          dictionary().

and it is supposed to do the obvious: add the key and value pair to
the dictionary and return a new dictionary.

Eric states his basic expectations as follows:

  1. I can put arbitrary terms into the dictionary as keys
  2. I can put arbitrary terms into the dictionary as values
  3. When I put a value in the dictionary by a key, I can retrieve that same value
  4. When I put a different value in the dictionary by key it does not change other key value pairs.
  5. When I update a value the new value in available by the new key
  6. When a value does not exist a not found exception is created

The first two expectations regarding being able to use arbritrary
terms as keys and values is a job for generators.

The latter four are prime candidates for properties and we will create
one for each of them.

Generators

key() -> any().

value() -> any().

For PropEr this approach has the drawback that creation and shrinking
becomes rather time consuming, so it might be better to narrow to
something like this:

key() -> union([integer(),atom()]).

value() -> union([integer(),atom(),binary(),boolean(),string()]).

What is best depends on the situation and intended usage.

Now, being able to generate keys and values is not enough. You also
have to tell PropEr how to create a dictionary and in this case we
will use a symbolic generator (detail to be explained later).

sym_dict() ->
    ?SIZED(N,sym_dict(N)).

sym_dict(0) ->
    {'$call',ec_dictionary,new,[ec_gb_trees]};
sym_dict(N) ->
    ?LAZY(
       frequency([
                  {1, {'$call',ec_dictionary,remove,[key(),sym_dict(N-1)]}},
                  {2, {'$call',ec_dictionary,add,[value(),value(),sym_dict(N-1)]}}
                 ])).

sym_dict/0 uses the ?SIZED macro to control the size of the
generated dictionary. PropEr will start out with small numbers and
gradually raise it.

sym_dict/1 is building a dictionary by randomly adding key/value
pairs and removing keys. Eventually the base case is reached which
will create an empty dictionary.

The ?LAZY macro is used to defer the calculation of the
sym_dict(N-1) until they are needed and frequency/1 is used
to ensure that twice as many adds compared to removes are done. This
should give rather more interesting dictionaries in the long run, if
not one can alter the frequencies accondingly.

But does it really work?

That is a good question and one that should always be asked when
looking at genetors. Fortunately there is a way to see what a
generator produces provided that the generator functions are exported.

Hint: in most cases it will not hurt to throw in a
-compile(export_all). in the module used to specify the
properties. And here we actually have a sub-hint: specify the
properties in a separate file to avoid peeking inside the
implementation! Base the test on the published API as this is what the
users of the code will be restricted to.

When the test module has been loaded you can test the generators by
starting up an Erlang shell (this example uses the erlware_commons
code so get yourself a clone to play with):

$ erl -pz ebin -pz test
1> proper_gen:pick(ec_dictionary_proper:key()).
{ok,4}
2> proper_gen:pick(ec_dictionary_proper:key()).
{ok,35}
3> proper_gen:pick(ec_dictionary_proper:key()).
{ok,-5}
4> proper_gen:pick(ec_dictionary_proper:key()).
{ok,48}
5> proper_gen:pick(ec_dictionary_proper:key()).
{ok,'36\207_là ´?\nc'}
6> proper_gen:pick(ec_dictionary_proper:value()).
{ok,2}
7> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-14}
8> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-3}
9> proper_gen:pick(ec_dictionary_proper:value()).
{ok,27}
10> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-8}
11> proper_gen:pick(ec_dictionary_proper:value()).
{ok,[472765,17121]}
12> proper_gen:pick(ec_dictionary_proper:value()).
{ok,true}
13> proper_gen:pick(ec_dictionary_proper:value()).
{ok,<<>>}
14> proper_gen:pick(ec_dictionary_proper:value()).
{ok,<<89,69,18,148,32,42,238,101>>}
15> proper_gen:pick(ec_dictionary_proper:sym_dict()).
{ok,{'$call',ec_dictionary,add,
        [[114776,1053475],
         'fª20\227\215',
         {'$call',ec_dictionary,add,
             ['',true,
              {'$call',ec_dictionary,add,
                  ['2^Ø¡',
                   [900408,886056],
                   {'$call',ec_dictionary,add,[[48618|...],<<...>>|...]}]}]}]}}
16> proper_gen:pick(ec_dictionary_proper:sym_dict()).
{ok,{'$call',ec_dictionary,add,
        [10,'a¯\21431fõC',
         {'$call',ec_dictionary,add,
             [false,-1,
              {'$call',ec_dictionary,remove,
                  ['d·ÉV÷[',
                   {'$call',ec_dictionary,remove,[12,{'$call',...}]}]}]}]}}

That does not look too bad, so we will continue with that for now.

Properties of add/2

The first expectation Eric had about how the dictionary works was that
if a key had been stored it could be retrieved.

One way of expressing this could be with this property:

prop_get_after_add_returns_correct_value() ->
    ?FORALL({Dict,K,V}, {sym_dict(),key(),value()},
         begin
             try ec_dictionary:get(K,ec_dictionary:add(K,V,Dict)) of
                    V ->
                        true;
                    _ ->
                        false
             catch
                   _:_ ->
                       false
             end
          end).

This property reads that for all dictionaries get/2 using a key
from a key/value pair just inserted using the add/3 function
will return that value. If that is not the case the property will
evaluate to false.

Running the property is done using proper:quickcheck/1:

proper:quickcheck(ec_dictionary_proper:prop_get_after_add_returns_correct_value()).
....................................................................................................
OK: Passed 100 test(s).
true

This was as expected, but at this point we will take a little detour
and introduce a mistake in the ec_gb_trees implementation and see
how that works.

Signatures – Reusable, Toolable, Testable Types

This tutorial is brought to you by ErlangCamp 2011 – Boston, August 12th and 13th – It’s gonna be totally sweet!

It often occurs in coding that we need a library, a set of functionality. Often there are several algorithms that could provide
this functionality. However, the code that uses it, either doesn’t care about the individual algorithm or wishes to delegate choosing that algorithm to some higher level. Lets take the concrete example of dictionaries. A dictionary provides the ability to access a value via a key (other things as well but primarily this). There are may ways to implement a dictionary. Just a few are:

Each of these approaches has their own performance characteristics,  memory footprints etc. For example, a table of size n with open addressing has no collisions and holds up to n elements, with a single comparison for successful lookup, and a table of size n with chaining and k keys has the minimum max(0, k-n) collisions and O(1 + k/n) comparisons for lookup.  For skip lists the performance characteristics are about as good as that of randomly built binary search trees – namely (O log n). So the choice of which to select depends very much on memory available, insert/read characteristics, etc. So delegating the choice to a single point in your code is a very good idea. Unfortunately, in Erlang that’s not so easy to do at the moment.

Other languages, have built-in support for this functionality. Java has InterfacesSML has Signatures. Erlang, though, doesn’t currently support this model, at least not directly. There are a few ways you can approximate it. One way is to pass the Module name to the calling functions along with the data that it is going to be called on.

add(ModuleToUse, Key, Value, DictData) ->
    ModuleToUse:add(Key, Value, DictData).

This works, and you can vary how you want to pass the data. For example, you could easily use a tuple to contain the data. That is, you could pass in {ModuleToUse, DictData} and that would make it a bit cleaner.

add(Key, Value, {ModuleToUse, DictData}) ->
    ModuleToUse:add(Key, Value, DictData).

Either way, there are a few problems with this approach. One of the biggest is that you lose code locality, by looking at this bit of code you don’t know what ModuleToUse is at all. You would need to follow the call chain up to figure out what it is. Also it may not be obvious what is actually happening. The fact that ModuleToUse is a variable name obscures the code making it harder to understand. The other big problem is that the tools provided with Erlang can’t help find mistakes that you might have made. Tools like Xref and Dialyzer have just as hard a time figuring out the what ModuleToUse is pointing to as you do. So they can’t give you warnings about potential problems. In fact someone could inadvertently pass an unexpected function name as ModuleToUse and you would never get any warnings, just an exception at run time.

Fortunately, Erlang is a pretty flexible language so we can use a similar approach with a few adjustments to give us the best of both worlds. Both the flexibility of ignoring a specific implementation and keeping all the nice locality we get by using an explicit module name.

So what we actually want to do is something mole like this:

add(Key, Value, DictData) ->
    dictionary:add(Key, Value, DictData).

Doing this we retain the locality. We can easily look up the dictionary Module. We immediately have a good idea what a dictionary actually is and we know what functions we are calling. Also, all the tools know what a dictionary is as well and how to check that your code is calling it correctly. For all of these reasons, this is a much better approach to the problem. This is what Signatures are all about.

Signatures

How do we actually do this in Erlang when Erlang is missing what Java, SML and friends has built-in?

The first thing we need to do is to define a Behaviour for our functionality. To continue our example we will define a Behaviour for dictionaries. That Behaviour looks like this:

-module(ec_dictionary).

-export([behaviour_info/1]).

behaviour_info(callbacks) ->
    [{new, 0},
     {has_key, 2},
     {get, 2},
     {add, 3},
     {remove, 2},
     {has_value, 2},
     {size, 1},
     {to_list, 1},
     {from_list, 1},
     {keys, 1}];
behaviour_info(_) ->
    undefined.

So we have our Behaviour now. Unfortunately, this doesn’t give us much yet. It will make sure that any dictionaries we write will have all the functions they need, but it wont help us actually use those dictionaries in an abstract way in our code. To do that we need to add a bit of functionality. We do that by actually implementing our own behaviour, starting with new/1.

%% @doc create a new dictionary object from the specified module. The
%% module should implement the dictionary behaviour.
%%
%% @param ModuleName The module name.
-spec new(module()) -> dictionary(_K, _V).
new(ModuleName) when is_atom(ModuleName) ->
    #dict_t{callback = ModuleName, data = ModuleName:new()}.

This code creates a new dictionary for us. Or to be more specific it actually creates a new dictionary Signature record, that will be used subsequently in other calls. This might look a bit familiar from our previous less optimal approach. We have both the module name and the data. here in the record. We call the module name named in ModuleName to create the initial data. We then construct the record and return that record to the caller and we have a new dictionary. What about the other functions; the ones that don’t create a dictionary but make use of it. Let’s take a look at the implementations of two kinds of functions, one that updates the dictionary and another that just retrieves data.

The first we will look at is the one that updates the dictionary by adding a value.

%% @doc add a new value to the existing dictionary. Return a new
%% dictionary containing the value.
%%
%% @param Dict the dictionary object to add too
%% @param Key the key to add
%% @param Value the value to add
-spec add(key(K), value(V), dictionary(K, V)) -> dictionary(K, V).
add(Key, Value, #dict_t{callback = Mod, data = Data} = Dict) ->
    Dict#dict_t{data = Mod:add(Key, Value, Data)}.

There are two key things here.

  1. The dictionary is deconstructed so we can get access to the data
    and the callback module.
  2. We modify the dictionary record with the new data and return that
    modified record.

This is the same approach that you will use for any Signature that updates data. As a side note, notice that we are calling the concrete implementation to do the work itself.

Now let’s do a data retrieval function. In this case, the get function of the dictionary Signature.

%% @doc given a key return that key from the dictionary. If the key is
%% not found throw a 'not_found' exception.
%%
%% @param Dict The dictionary object to return the value from
%% @param Key The key requested
%% @throws not_found when the key does not exist
-spec get(key(K), dictionary(K, V)) -> value(V).
get(Key, #dict_t{callback = Mod, data = Data}) ->
    Mod:get(Key, Data).

In this case, you can see a very similar approach to deconstructing the dict record. We still need to pull out the callback module and the data itself and call the concrete implementation of the algorithm. In this case, we return the data returned from the call, not the record itself.

That is really all you need to define a Signature. There is a complete implementation in erlware_commons/ec_dictionary.

Using Signatures

It’s a good idea to work through an example so we have a bit better idea of how to use these Signatures. If you are like me, you probably have some questions about what kind of performance burden this places on the code. At the very least we have an additional function call along with the record deconstruction. This must add some overhead. So lets write a little timing test, so we can get a good idea of how much this is all costing us.

In general, there are two kinds of concrete implementations for Signatures. The first is a native implementations, the second is a
wrapper.

Native Signature Implementations

A Native Signature Implementation is just that, a module that implements the Behaviour defined by a Signature directly. For most user defined Signatures this is going to be the norm. In our current example, the erlware_commons/ec_rbdict module is the best example of a Native Signature Implementation. It implements the ec_dictionary module directly.

Signature Wrappers

A Signature Wrapper is a module that wraps another module. Its purpose is to help a pre-existing module implement the Behaviour defined by a Signature. A good example of this in our current example is the erlware_commons/ec_dict module. It implements the ec_dictionary Behaviour, but all the functionality is provided by the stdlib/dict module itself. Lets take a look at one example to see how this is done.

We will take a look at one of the functions we have already seen. The get function an ec_dictionary get doesn’t have quite the same semantics as any of the functions in the dict module. So a bit of translation needs to be done. We do that in the ec_dict module get function.

-spec get(ec_dictionary:key(K), Object::dictionary(K, V)) ->
           ec_dictionary:value(V).
get(Key, Data) ->
    case dict:find(Key, Data) of
    {ok, Value} ->
        Value;
     error ->
        throw(not_found)
    end.

So the ec_dict module’s purpose for existence is to help the preexisting dict module implement the Behaviour defined by the Signature.

Why do we bring this up here? Because we are going to be looking at timings, and Signature Wrappers add an extra level of indirection to the mix and that adds a bit of additional overhead.

Creating the Timing Module

We are going to create timings for both Native Signature Implementations and Signature Wrappers.

Lets get started by looking at some helper functions. We want dictionaries to have a bit of data in them. So to that end we will create a couple of functions that create dictionaries for each type we want to test. The first we want to time is the Signature Wrapper, so dict vs ec_dict called as a Signature.

create_dict() ->
lists:foldl(fun(El, Dict) ->
        dict:store(El, El, Dict)
    end, dict:new(),
    lists:seq(1,100)).

The only thing we do here is create a sequence of numbers 1 to 100, and then add each of those to the dict as an entry. We aren’t too worried about replicating real data in the dictionary. We care about timing the function call overhead of Signatures, not the performance of the dictionaries themselves.

We need to create a similar function for our Signature based dictionary ec_dict.

create_dictionary(Type) ->
lists:foldl(fun(El, Dict) ->
        ec_dictionary:add(El, El, Dict)
    end,
    ec_dictionary:new(Type),
    lists:seq(1,100)).

Here we actually create everything using the Signature. So we don’t need one function for each type. We can have one function that can create anything that implements the Signature. That is the magic of Signatures. Otherwise, this does the exact same thing as the dict create_dict/1.

We are going to use two function calls in our timing. One that updates data and one that returns data, just to get good coverage. For our dictionaries we are going to use the size function as well as the add function.

time_direct_vs_signature_dict() ->
    io:format("Timing dict~n"),
    Dict = create_dict(),
    test_avg(fun() ->
             dict:size(dict:store(some_key, some_value, Dict))
         end,
         1000000),
    io:format("Timing ec_dict implementation of ec_dictionary~n"),
    time_dict_type(ec_dict).

The test_avg function runs the provided function the number of times specified in the second argument and collects timing information. We are going to run these one million times to get a good average (its fast so it doesn’t take long). You can see that in the anonymous function that we directly call dict:size/1 and dict:store/3 to perform the test. However, because we are in the wonderful world of Signatures we don’t have to hard code the calls for the Signature implementations. Lets take a look at the time_dict_type function.

time_dict_type(Type) ->
    io:format("Testing ~p~n", [Type]),
    Dict = create_dictionary(Type),
    test_avg(fun() ->
         ec_dictionary:size(ec_dictionary:add(some_key, some_value, Dict))
         end,
         1000000).

As you can see we take the type as an argument (we need it for dict creation) and call our create function. Then we run the same timings that we did for ec dict. In this case though, the type of dictionary is never specified, we only ever call ec_dictionary, so this test will work for anything that implements that Signature.

dict vs ec_dict Results

So we have our tests, what was the result. Well on my laptop this is what it looked like.

Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.2  (abort with ^G)

1> ec_timing:time_direct_vs_signature_dict().
Timing dict
Range: 2 - 5621 mics
Median: 3 mics
Average: 3 mics
Timing ec_dict implementation of ec_dictionary
Testing ec_dict
Range: 3 - 6097 mics
Median: 3 mics
Average: 4 mics
2>

So for the direct dict call, we average about 3 mics per call, while for the Signature Wrapper we average around 4. Thats a 25% cost for Signature Wrappers in this example, for a very small number of calls. Depending on what you are doing that is going to be greater or lesser. In any case, we can see that there is some cost associated with the Signature Wrapper Implementations.

What about native Signatures though? Lets take a look at ec_rbdict. The ec_rbdict also implements the ec_dictionary Signature, but it is not a Signature Wrapper. It is a native implementation of the Signature. To use ec_rbdict directly we have to create a creation helper just like we did for dict.

create_rbdict() ->
lists:foldl(fun(El, Dict) ->
        ec_rbdict:add(El, El, Dict)
    end, ec_rbdict:new(),
    lists:seq(1,100)).

This is exactly the same as create_dict with the exception that dict is replaced by ec_rbdict.

The timing function itself looks very similar as well. Again notice that we have to hard code the concrete name for the concrete
implementation, but we don’t for the ec_dictionary test.

time_direct_vs_signature_rbdict() ->
    io:format("Timing rbdict~n"),
    Dict = create_rbdict(),
    test_avg(fun() ->
         ec_rbdict:size(ec_rbdict:add(some_key, some_value, Dict))
         end,
         1000000),
   io:format("Timing ec_dict implementation of ec_dictionary~n"),
   time_dict_type(ec_rbdict).

And there we have our test. What do the results look like?

ec_dict vs ec_rbdict as an ec_dictionary Results

The main thing we are timing here is the additional cost of the dictionary Signature itself. Keep that in mind as we look at the
results.

Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.2  (abort with ^G)

1> ec_timing:time_direct_vs_signature_rbdict().
Timing rbdict
Range: 6 - 15070 mics
Median: 7 mics
Average: 7 mics
Timing ec_dict implementation of ec_dictionary
Testing ec_rbdict
Range: 6 - 6013 mics
Median: 7 mics
Average: 7 mics
2>

So no difference it time. Well the reality is that there is a difference in timing, there must be, but we don’t have enough resolution in the timing system to be able to figure out what that difference is. Essentially that means it’s really, really small – or small enough not to worry about at the very least.

Conclusion

Signatures are a viable, useful approach to the problem of interfaces in Erlang. The have little or no over head depending on the type of implementation, and greatly increase the flexibility of the a library while retaining testability and locality.

Terminology

Behaviour
: A normal Erlang Behaviour that defines a contract

Signature
: A combination of an Behaviour and functionality to make the functions callable in a concrete way

Native Signature Implementation
: A module that implements a signature directly

Signature Wrapper
: A module that does translation between a preexisting module and a Signature, allowing the preexisting module to be used as a Signature Implementation.

Code Referenced

Erlang PubNub Client and Chat

I was thoroughly impressed with PubNub, a publish/subscribe service, when I first read their articles and played around with it some in Javascript. But obviously I need an Erlang API if I’m going to really use it! So I’ve created ePubNub.

In the ePubNub README you’ll find information on some basic usage of the application. You don’t have to do anything more than use the epubnub.erl module to publish and subscribe (by either providing a PID to send messages to or a function handler to process each).

Here I’ve built a little more complicated app/release called epubnub_chat, and the source is also on github.

The first thing we need is the epubnub app as a dependency in the epubnub_chat.app file:

   {applications, [kernel, stdlib, epubnub]},

We’ll use a simple_one_for_one for supervising channel subscribed processes. In epubnub_chat_sup we have 3 API functions for the user to use (start_link is run by the _app.erl module on startup): connect/1, connect/2, disconnect/1:

connect(Channel) ->
    supervisor:start_child(?SERVER, [Channel]).

connect(EPN, Channel) ->
    supervisor:start_child(?SERVER, [EPN, Channel]).

disconnect(PID) ->
    epubnub_chat:stop(PID).

EPN is a record containing the necessary url and keys for talking to the PubNub service and is created with the new functions in epubnub:

-spec new() -> record(epn).
new() ->
    #epn{}.

-spec new(string()) -> record(epn).
new(Origin) ->
    #epn{origin=Origin}.

-spec new(string(), string()) -> record(epn).
new(PubKey, SubKey) ->
    #epn{pubkey=PubKey, subkey=SubKey}.

-spec new(string(), string(), string()) -> record(epn).
new(Origin, PubKey, SubKey) ->
    #epn{origin=Origin, pubkey=PubKey, subkey=SubKey}.

You can pass none to connect/1 and it will send none to the chat gen_server and it will use defaults of pubsub.pubnub.com, demo and demo, for the url, publish key and subscribe key respectively.

Now in the epubnub_chat gen_server we need the following API functions:

start_link(Channel) ->
    start_link(epubnub:new(), Channel).

start_link(EPN, Channel) ->
    gen_server:start_link(?MODULE, [EPN, Channel], []).

message(Server, Msg) ->
    gen_server:cast(Server, {message, Msg}).

stop(Server) ->
    gen_server:cast(Server, stop).

The start_link functions are run when the supervisor spawns a simple_one_for_one supervisor for the process. This returns {ok, PID}. This PID must be remembered so we can talk to the process we have started thats subscribed to a specific channel. We pass this PID to the message/2 and stop/1 functions, which we’ll see at the and when we use the program.

start_link/1 and /2 call init/1 with the provided arguments:

init([EPN, Channel]) ->
    {ok, PID} = epubnub_sup:subscribe(EPN, Channel, self()),
    {ok, #state{epn=EPN, pid=PID, channel=Channel}}.

Here we use the epubnub_sup subscribe/3 function and not epubnub:subscribe because we want it to be supervised. We store the PID for this process so we can terminate it later.

The epubnub subscribe process was given the PID, returned by self/1, of the current process which is the gen_server process and will send messages that are published to the channel to that process. We handle these messages in handle_info/2:

handle_info({message, Message}, State) ->
    io:format("~p~n", [Message]),
    {noreply, State}.

Lastly, we have to handle the messages from message/2 and stop/1:

handle_cast({message, Msg}, State=#state{epn=EPN, channel=Channel}) ->
    epubnub:publish(EPN, Channel, Msg),
    {noreply, State}.

terminate(_Reason, #state{pid=PID}) ->
    epubnub:unsubscribe(PID),
    ok.

The handle_cast/2 function published your message to the channel this process subscribed to with epubnub:publish/3 and terminate calls epubnub:unsubscribe/1 before this process ending which sends a terminate message to the subscibred process.

Now lets see this program in action:

[tristan@marx ~/Devel/epubnub_chat]
09:55 (master)$ sinan dist
starting: depends
starting: build
starting: release
starting: dist
[tristan@marx ~/Devel/epubnub_chat]
09:55 (master)$ sudo faxien ir
Password:
Do you want to install the release: /Users/tristan/Devel/epubnub_chat/_build/development/tar/epubnub_chat-0.0.1.tar.gz
Enter (y)es, (n)o, or yes to (a)ll? > ? y
Replacing existing executable file at: /usr/local/lib/erlang/bin/epubnub_chat
Replacing existing executable file at: /usr/local/lib/erlang/bin/5.8.2/epubnub_chat
Replacing existing executable file at: /usr/local/lib/erlang/bin/erlware_release_start_helper
Replacing existing executable file at: /usr/local/lib/erlang/bin/5.8.2/erlware_release_start_helper
ok
[tristan@marx ~/Devel/epubnub_chat]
09:56 (master)$ epubnub_chat
....
Eshell V5.8.2  (abort with ^G)
1> {ok, Server} = epubnub_chat_sup:connect("chat").
{ok,}
2>
=PROGRESS REPORT==== 10-Apr-2011::09:57:14 ===
          supervisor: {local,inet_gethost_native_sup}
             started: [{pid,},{mfa,{inet_gethost_native,init,[[]]}}]

=PROGRESS REPORT==== 10-Apr-2011::09:57:14 ===
          supervisor: {local,kernel_safe_sup}
             started: [{pid,},
                       {name,inet_gethost_native_sup},
                       {mfargs,{inet_gethost_native,start_link,[]}},
                       {restart_type,temporary},
                       {shutdown,1000},
                       {child_type,worker}]

2> epubnub_chat:message(Server, <"hello there!">).
ok
3> <"hello there!">
<"I'm from the webapp!">

3> q().
ok

You can go to the PubNub tutorial page to chat between yourself, or get someone else to join!

That’s it! Simple and quick global communication that scales for you!

I’ve really enjoyed playing with PubNub and hope I get to use it for a real project soon.

eCloudEdit Part 2: CouchDB

In my last post I showed the Webmachine backend to James Yu’s CloudEdit app in Backbone.js. What was left out was, where are the documents stored? Here I’ll show how this is done with CouchDB. And you can give the app a try at http://erlware.org:8080

First, a new Erlang app is needed that we’ll call ece_db.

ece_db/
├── doc
├── ebin
│   ├── ece_db.app
│   └── overview.edoc
├── include
└── src
    ├── ece_db.erl
    ├── ece_db_app.erl
    └── ece_db_sup.erl

Three modules are implemented, one that starts the app by calling the supervisor’s start function, the supervisor itself that sets up a simple_one_for_one and the gen_server that handles the frontends requests for creating and modifying documents in CouchDB.

We’ll ignore the ece_db_app.erl module for this post and start with ece_db_sup.erl. On requests from the Webmachine resource module we don’t want one to wait on the other, requests should be handled in parrallel. One option is to not create a process for the database backend and instead have all of the functions in the database interface run in the process of the Webmachine resource. However, two reasons to not do this are that there are multiple pieces of data that must be read out of a configuration file and used to setup what is needed to talk to CouchDB. We do not want to be doing this for every request! Furthermore a supervised gen_server allows us the possibility of retrying requests with no extra code and crashing without retrying but not taking down the resource’s process that is handling the user’s HTTP request. It is a lot nicer and easier to just let things fail when we can!

  {ece_db, [{server, "HOSTNAME"},
            {port, 80},
            {database, "ecloudedit"}]}
-spec init(list()) -> {ok, {SupFlags::any(), [ChildSpec::any()]}} |
                       ignore | {error, Reason::any()}.
init([]) ->
    RestartStrategy = simple_one_for_one,
    MaxRestarts = 0,
    MaxSecondsBetweenRestarts = 1,

    SupFlags = {RestartStrategy, MaxRestarts, MaxSecondsBetweenRestarts},

    Restart = temporary,
    Shutdown = 2000,
    Type = worker,

    {ok, Server} = application:get_env(server),
    {ok, Port} = application:get_env(port),
    {ok, DB} = application:get_env(database),

    AChild = {ece_db, {ece_db, start_link, [Server, Port, DB]},
              Restart, Shutdown, Type, [ece_db]},

    {ok, {SupFlags, [AChild]}}.

First, we have the config entries in config/sys.config that will be read in by application:gen_env. Here we set the server URL, port number and name of the database to use. CouchDB can easily be run locally but for my running copy I use a database hosted by the great service Cloudant. Next is the init function for ece_db_sup.erl. A key thing to note for a simple_one_for_one is that NO process is started after the supervisor’s init function returns like with other types of supervisor children. Instead we must explicitly call start_child. This is how we are able to create a supervised gen_server process for every HTTP request. Below is the code in ece_db_sup for starting and stopping the process:

start_child() ->
    supervisor:start_child(?SERVER, []).

terminate_child(PID) ->
    ece_db:terminate_child(PID).

In the last post I showed the functions that start and stop the ece_db process. Here they are again:

init([]) ->
    {ok, PID} = ece_db_sup:start_child(),
    {ok, #ctx{db=PID}}.
finish_request(ReqData, Ctx) ->
    ece_db_sup:terminate_child(Ctx#ctx.db),
    {true, ReqData, Ctx}.

On each request Webmachine calls the init function for the resource that is matched in the dispatch table. In that init function we call start_child in ece_db_sup which returns {ok, PID}. The PID is the process id we’ll be sending messages. Now in ece_db we implement the API functions needed to start the process and to interact with it.

start_link(Server, Port, DB) ->
    gen_server:start_link(?MODULE, [Server, Port, DB], []).

all(PID) ->
    gen_server:call(PID, all).

find(PID, ID) ->
    gen_server:call(PID, {find, ID}).

create(PID, Doc) ->
    gen_server:call(PID, {create, Doc}).

update(PID, ID, JsonDoc) ->
    gen_server:call(PID, {update, ID, JsonDoc}).

terminate(PID) ->
    gen_server:call(PID, terminate).

Each function, besides the one for starting the process, takes a PID. This PID is the process gen_server:call to which it sends its message. To handle these messages we have the gen_server callbacks.

%%%===================================================================
%%% gen_server callbacks
%%%===================================================================

%% @private
init([Server, Port, DB]) ->
    CouchServer = couchbeam:server_connection(Server, Port, "", []),
    {ok, CouchDB} = couchbeam:open_db(CouchServer, DB),
    {ok, #state{db=CouchDB}}.

%% @private
handle_call(all, _From, #state{db=DB}=State) ->
    Docs = get_docs(DB, [{descending, true}]),
    {reply, mochijson2:encode(Docs), State};
handle_call({find, ID}, _From, #state{db=DB}=State) ->
    [Doc] = get_docs(DB, [{key, list_to_binary(ID)}]),
    {reply, mochijson2:encode(Doc), State};
handle_call({create, Doc}, _From, #state{db=DB}=State) ->
    {ok, Doc1} = couchbeam:save_doc(DB, Doc),
    {NewDoc} = couchbeam_doc:set_value(<<"id">>, couchbeam_doc:get_id(Doc1), Doc1),
    {reply, mochijson2:encode(NewDoc), State};
handle_call({update, ID, NewDoc}, _From, #state{db=DB}=State) ->
    IDBinary = list_to_binary(ID),
    {ok, Doc} = couchbeam:open_doc(DB, IDBinary),
    NewDoc2 = couchbeam_doc:set_value(<<"_id">>, IDBinary, {NewDoc}),
    NewDoc3 = couchbeam_doc:set_value(<<"_rev">>, couchbeam_doc:get_rev(Doc), NewDoc2),
    {ok, {Doc1}} = couchbeam:save_doc(DB, NewDoc3),
    {reply, mochijson2:encode(Doc1), State;
handle_call(terminate, _From, State) ->
    {stop, normal, State}.

init takes the server URL, the port number and the name of the database to store the documents as arguments and creates a CouchBeam database record. To handle the other messages we have the handle_call function to match on the different tuples sent to gen_server:call for the different function calls. all uses the internal function get_docs with the option to have the function in descending order (so the newest is first) and returns those after encoding to JSON. find uses the same function but only wants a certain document which in the case of CouchDB is specified with a key value. create first saves the new document which returns the document that is actually stored in CouchDB. CouchDB adds the _id and _rev key/value pairs it generates. Since our frontend expects the id to be the value of a key id and not _id we use couchbeam_doc:set_value to set a value of the new key id with the value of the documents id retrieved from the document with couchbeam_doc:get_id. For update we first open the document corresponding to the id passed in as an argument to get the _rev value. _rev needs to be set in the document we send to save_doc so CouchDB knows this is a valid update based on the previous version of the document. Thus we set the new documents _id and _rev values and send to save_doc.

Lastly, we have the get_docs function. Since the frontend expects the key id to exist and CouchDB stores the id with the key _id we use a simple view to return the documents with id containing the value of the documents _id. Views are defined with map/reduce functions that CouchDB runs across the databases documents and builds an index from. These indexes are then queried based on keys to find documents. Thus access is very quick, indexes are built with BTrees, and a map/reduce does not have to be run on the documents for every request. Additionally, when a new document is created the view's map/reduce functions only need to run over the new documents to update the BTree. Updates to the view's indexes are either done on each use of the view or can be manually told to run.

function (doc)
{
  emit(doc._id, {id : doc._id, title : doc.title, created_at : doc.created_at, body : doc.body});
}
get_docs(DB, Options) ->
    {ok, AllDocs} = couchbeam:view(DB, {"all", "find"}, Options),
    {ok, Results} = couchbeam_view:fetch(AllDocs),

    {[{<<"total_rows">>, _Total},
      {<<"offset">>, _Offset},
      {<<"rows">>, Rows}]} = Results,

    lists:map(fun({Row}) ->
                      {<<"value">>, {Value}} = lists:keyfind(<<"value">>, 1, Row),
                      Value
              end, Rows).

In get_docs the couchbeam:view function is used to construct a view request to be sent to the CouchDB server. couchbeam_view:fetch sends the view request and returns the results as Erlang terms. After this we extract the rows from the results and use lists:map to extract out just the value of each row to be returned.

That's it! There are places that could use improvement. One being the use of lists:map over every returned row to construct documents the frontend can deal with. Additionally, the need to duplicate _id as id for the frontends use could be removed through modifications on the frontend.

In the next installment I'll be updating the code -- and maybe making some of these performance enhancements -- with James Yu's recent changes in his Part 2.

eCloudEdit: Erlang, Webmachine and Backbone.js

To experiment with using a pure client-side rendering talking to an Erlang backend I’ve taken James Yu’s CloudEdit tutorial an app written with Backbone.js and Rails and converted the backend to use Webmachine and CouchDB. You can see eCloudEdit in action here. The Backbone.js code is the same so to understand that please see James’ post, here I’ll describe the Erlang backend.

To begin with we setup two applications, one for handling the web interaction and a second for handling the database interaction. We end up with this directory layout:

  |-eCloudEdit
   |-bin
   |-config
   |-doc
   |-lib
   |---ece_db
   |-----doc
   |-----ebin
   |-----include
   |-----src
   |---ece_web
   |-----doc
   |-----ebin
   |-----include
   |-----priv
   |-------images
   |-------javascripts
   |---------controllers
   |---------models
   |---------views
   |-------stylesheets
   |-----src

Under ece_web/priv is where all the html and Javascript from CloudEdit is placed, with nothing changed. To serve up the static content through Webmachine we use this resource which we won’t worry about in this article. Lets look at how webmachine deciced where to route requests. This is handled by a set of dispatch rules that are passed into Webmachine on startup. We can see how this is done by looking at ece_web_sup and how the supervisor loads Webmachine:

-spec init(list()) -> {ok, {SupFlags::any(), [ChildSpec::any()]}} |
                          ignore | {error, Reason::any()}.
init([]) ->
    WebChild = {webmachine_mochiweb,
                {webmachine_mochiweb, start, [config()]},
                permanent, 5000, worker, dynamic},

    RestartStrategy = one_for_one,
    MaxRestarts = 3,
    MaxSecondsBetweenRestarts = 10,
    SupFlags = {RestartStrategy, MaxRestarts, MaxSecondsBetweenRestarts},

    {ok, {SupFlags , [WebChild]}}.

config() ->
    {ok, IP} = application:get_env(webmachine_ip),
    {ok, Port} = application:get_env(webmachine_port),
    {ok, App}= application:get_application(),
    LogDir = code:priv_dir(App) ++ "/logs",
    {ok, Dispatch} = file:consult(filename:join([code:priv_dir(App), "dispatch"])),

    [{ip, IP},
     {port, Port},
     {log_dir, LogDir},
     {backlog, 128},
     {dispatch, Dispatch}].

The dispatch terms are loaded from a file dispatch in the application’s priv directory. For this app the dispatch file contains:

{["documents", id], ece_resource_documents, []}.
{["documents"], ece_resource_documents, []}.
{['*'], ece_resource_static, []}.

There are two resources, one for handling the requests for creating, updating and viewing documents and one for serving up all other requests (static html, js and css files). The first rule matches paths like, /documents/foo, id is set to foo and the request is sent to the documents resource. If there is nothing after /documents it is still sent to the documents resource but there is no id.

Webmachine is essentially a REST toolkit. You build resources by defining functions that handle the different possible HTTP requests. For this webapp we’ll only be dealing with GET, POST and PUT. GET is used if we’d like to retrieve information on documents, POST is for creating a new document and PUT is for updating a document.

-record(ctx, {db}).

init([]) ->
    {ok, PID} = ece_db_sup:start_child(),
    {ok, #ctx{db=PID}}.

allowed_methods(ReqData, Ctx) ->
    {['HEAD', 'GET', 'POST', 'PUT'], ReqData, Ctx}.

content_types_accepted(ReqData, Ctx) ->
    {[{"application/json", from_json}], ReqData, Ctx}.

content_types_provided(ReqData, Ctx) ->
    {[{"application/json", to_json}], ReqData, Ctx}.

finish_request(ReqData, Ctx) ->
    ece_db_sup:terminate_child(Ctx#ctx.db),
    {true, ReqData, Ctx}.

To handle GET requests we implemented the to_json function we specified in content_types_provided to handle GET requests for json data.

to_json(ReqData, Ctx) ->
    case wrq:path_info(id, ReqData) of
        undefined ->
            All = ece_db:all(Ctx#ctx.db),
            {All, ReqData, Ctx};
        ID ->
            JsonDoc = ece_db:find(Ctx#ctx.db, ID),
            {JsonDoc, ReqData, Ctx}
    end.

wrq:path_info is used to find the value of id, which we set in the dispatch file, if it is undefined we know the request is for all documents, while if it has a value we know to find the contents of the document with that id and return its contents. We’ll see the content of ece_db:all/1 and ece_db:find/2 in the next article. Just know they both return JSON data structures.

Now we must support POST for creating documents or we have nothing to return to a GET request.

process_post(ReqData, Ctx) ->
    [{JsonDoc, _}] = mochiweb_util:parse_qs(wrq:req_body(ReqData)),
    {struct, Doc} = mochijson2:decode(JsonDoc),
    NewDoc = ece_db:create(Ctx#ctx.db, {Doc}),
    ReqData2 = wrq:set_resp_body(NewDoc, ReqData),
    {true, ReqData2, Ctx}.

wrq:req_body/1 returns the contents of the body sent in the HTTP request. Here it is the conents of the document to store. We decode it to an Erlang data structure and pass it to the ece_db app for inserting into the database. After inserting to the database the create/2 function returns the new document with a new element id (in this case generated by CouchDB). This is required so we know the document’s id which is used by the Backbone.js frontend. In order to return it from the POST request we must set response body to the contents of the document with wrq:set_resp_body/2

Lastly, updating documents requires support for PUT. In contents_type_accepted/2 we’ve specified that PUT requests with JSON content is sent to the function from_json/2:

from_json(ReqData, Ctx) ->
    case wrq:path_info(id, ReqData) of
        undefined ->
            {false, ReqData, Ctx};
        ID ->
            JsonDoc = wrq:req_body(ReqData),
            {struct, Doc} = mochijson2:decode(JsonDoc),
            NewDoc = ece_db:update(Ctx#ctx.db, ID, Doc),
            ReqData2 = wrq:set_resp_body(NewDoc, ReqData),
            {true, ReqData2, Ctx}
    end.

If this request was not routed through the first rule in our dispatch file it does not have an id and thus can not be an update. When this happens we return false so the frontend is aware something has gone wrong. For requests containing an id we pass the contents of the requests body to ece_db‘s update/2 function.

In the next post I’ll show how ece_db is implemented with Couchbeam for reading and writing the documents to CouchDB on Cloudant.

Follow

Get every new post delivered to your Inbox.