Skip to content

eCloudEdit Part 2: CouchDB

by on February 12, 2011

In my last post I showed the Webmachine backend to James Yu’s CloudEdit app in Backbone.js. What was left out was, where are the documents stored? Here I’ll show how this is done with CouchDB. And you can give the app a try at http://erlware.org:8080

First, a new Erlang app is needed that we’ll call ece_db.

ece_db/
├── doc
├── ebin
│   ├── ece_db.app
│   └── overview.edoc
├── include
└── src
    ├── ece_db.erl
    ├── ece_db_app.erl
    └── ece_db_sup.erl

Three modules are implemented, one that starts the app by calling the supervisor’s start function, the supervisor itself that sets up a simple_one_for_one and the gen_server that handles the frontends requests for creating and modifying documents in CouchDB.

We’ll ignore the ece_db_app.erl module for this post and start with ece_db_sup.erl. On requests from the Webmachine resource module we don’t want one to wait on the other, requests should be handled in parrallel. One option is to not create a process for the database backend and instead have all of the functions in the database interface run in the process of the Webmachine resource. However, two reasons to not do this are that there are multiple pieces of data that must be read out of a configuration file and used to setup what is needed to talk to CouchDB. We do not want to be doing this for every request! Furthermore a supervised gen_server allows us the possibility of retrying requests with no extra code and crashing without retrying but not taking down the resource’s process that is handling the user’s HTTP request. It is a lot nicer and easier to just let things fail when we can!

  {ece_db, [{server, "HOSTNAME"},
            {port, 80},
            {database, "ecloudedit"}]}
-spec init(list()) -> {ok, {SupFlags::any(), [ChildSpec::any()]}} |
                       ignore | {error, Reason::any()}.
init([]) ->
    RestartStrategy = simple_one_for_one,
    MaxRestarts = 0,
    MaxSecondsBetweenRestarts = 1,

    SupFlags = {RestartStrategy, MaxRestarts, MaxSecondsBetweenRestarts},

    Restart = temporary,
    Shutdown = 2000,
    Type = worker,

    {ok, Server} = application:get_env(server),
    {ok, Port} = application:get_env(port),
    {ok, DB} = application:get_env(database),

    AChild = {ece_db, {ece_db, start_link, [Server, Port, DB]},
              Restart, Shutdown, Type, [ece_db]},

    {ok, {SupFlags, [AChild]}}.

First, we have the config entries in config/sys.config that will be read in by application:gen_env. Here we set the server URL, port number and name of the database to use. CouchDB can easily be run locally but for my running copy I use a database hosted by the great service Cloudant. Next is the init function for ece_db_sup.erl. A key thing to note for a simple_one_for_one is that NO process is started after the supervisor’s init function returns like with other types of supervisor children. Instead we must explicitly call start_child. This is how we are able to create a supervised gen_server process for every HTTP request. Below is the code in ece_db_sup for starting and stopping the process:

start_child() ->
    supervisor:start_child(?SERVER, []).

terminate_child(PID) ->
    ece_db:terminate_child(PID).

In the last post I showed the functions that start and stop the ece_db process. Here they are again:

init([]) ->
    {ok, PID} = ece_db_sup:start_child(),
    {ok, #ctx{db=PID}}.
finish_request(ReqData, Ctx) ->
    ece_db_sup:terminate_child(Ctx#ctx.db),
    {true, ReqData, Ctx}.

On each request Webmachine calls the init function for the resource that is matched in the dispatch table. In that init function we call start_child in ece_db_sup which returns {ok, PID}. The PID is the process id we’ll be sending messages. Now in ece_db we implement the API functions needed to start the process and to interact with it.

start_link(Server, Port, DB) ->
    gen_server:start_link(?MODULE, [Server, Port, DB], []).

all(PID) ->
    gen_server:call(PID, all).

find(PID, ID) ->
    gen_server:call(PID, {find, ID}).

create(PID, Doc) ->
    gen_server:call(PID, {create, Doc}).

update(PID, ID, JsonDoc) ->
    gen_server:call(PID, {update, ID, JsonDoc}).

terminate(PID) ->
    gen_server:call(PID, terminate).

Each function, besides the one for starting the process, takes a PID. This PID is the process gen_server:call to which it sends its message. To handle these messages we have the gen_server callbacks.

%%%===================================================================
%%% gen_server callbacks
%%%===================================================================

%% @private
init([Server, Port, DB]) ->
    CouchServer = couchbeam:server_connection(Server, Port, "", []),
    {ok, CouchDB} = couchbeam:open_db(CouchServer, DB),
    {ok, #state{db=CouchDB}}.

%% @private
handle_call(all, _From, #state{db=DB}=State) ->
    Docs = get_docs(DB, [{descending, true}]),
    {reply, mochijson2:encode(Docs), State};
handle_call({find, ID}, _From, #state{db=DB}=State) ->
    [Doc] = get_docs(DB, [{key, list_to_binary(ID)}]),
    {reply, mochijson2:encode(Doc), State};
handle_call({create, Doc}, _From, #state{db=DB}=State) ->
    {ok, Doc1} = couchbeam:save_doc(DB, Doc),
    {NewDoc} = couchbeam_doc:set_value(<<"id">>, couchbeam_doc:get_id(Doc1), Doc1),
    {reply, mochijson2:encode(NewDoc), State};
handle_call({update, ID, NewDoc}, _From, #state{db=DB}=State) ->
    IDBinary = list_to_binary(ID),
    {ok, Doc} = couchbeam:open_doc(DB, IDBinary),
    NewDoc2 = couchbeam_doc:set_value(<<"_id">>, IDBinary, {NewDoc}),
    NewDoc3 = couchbeam_doc:set_value(<<"_rev">>, couchbeam_doc:get_rev(Doc), NewDoc2),
    {ok, {Doc1}} = couchbeam:save_doc(DB, NewDoc3),
    {reply, mochijson2:encode(Doc1), State;
handle_call(terminate, _From, State) ->
    {stop, normal, State}.

init takes the server URL, the port number and the name of the database to store the documents as arguments and creates a CouchBeam database record. To handle the other messages we have the handle_call function to match on the different tuples sent to gen_server:call for the different function calls. all uses the internal function get_docs with the option to have the function in descending order (so the newest is first) and returns those after encoding to JSON. find uses the same function but only wants a certain document which in the case of CouchDB is specified with a key value. create first saves the new document which returns the document that is actually stored in CouchDB. CouchDB adds the _id and _rev key/value pairs it generates. Since our frontend expects the id to be the value of a key id and not _id we use couchbeam_doc:set_value to set a value of the new key id with the value of the documents id retrieved from the document with couchbeam_doc:get_id. For update we first open the document corresponding to the id passed in as an argument to get the _rev value. _rev needs to be set in the document we send to save_doc so CouchDB knows this is a valid update based on the previous version of the document. Thus we set the new documents _id and _rev values and send to save_doc.

Lastly, we have the get_docs function. Since the frontend expects the key id to exist and CouchDB stores the id with the key _id we use a simple view to return the documents with id containing the value of the documents _id. Views are defined with map/reduce functions that CouchDB runs across the databases documents and builds an index from. These indexes are then queried based on keys to find documents. Thus access is very quick, indexes are built with BTrees, and a map/reduce does not have to be run on the documents for every request. Additionally, when a new document is created the view's map/reduce functions only need to run over the new documents to update the BTree. Updates to the view's indexes are either done on each use of the view or can be manually told to run.

function (doc)
{
  emit(doc._id, {id : doc._id, title : doc.title, created_at : doc.created_at, body : doc.body});
}
get_docs(DB, Options) ->
    {ok, AllDocs} = couchbeam:view(DB, {"all", "find"}, Options),
    {ok, Results} = couchbeam_view:fetch(AllDocs),

    {[{<<"total_rows">>, _Total},
      {<<"offset">>, _Offset},
      {<<"rows">>, Rows}]} = Results,

    lists:map(fun({Row}) ->
                      {<<"value">>, {Value}} = lists:keyfind(<<"value">>, 1, Row),
                      Value
              end, Rows).

In get_docs the couchbeam:view function is used to construct a view request to be sent to the CouchDB server. couchbeam_view:fetch sends the view request and returns the results as Erlang terms. After this we extract the rows from the results and use lists:map to extract out just the value of each row to be returned.

That's it! There are places that could use improvement. One being the use of lists:map over every returned row to construct documents the frontend can deal with. Additionally, the need to duplicate _id as id for the frontends use could be removed through modifications on the frontend.

In the next installment I'll be updating the code -- and maybe making some of these performance enhancements -- with James Yu's recent changes in his Part 2.

About these ads

From → Tutorials

6 Comments
  1. Hey!
    Thank you for interesting posts!

    In that example I do not clearly understand why did you used “one DB request – one supervisor child call (which is gen_server)”. Isn’t that simpler and faster to start “ece_db” as an app and send requests to ece_db gen_server? We do not need to set up db and “table” everytime in this case:

    CouchServer = couchbeam:server_connection(Server, Port, “”, []),
    {ok, CouchDB} = couchbeam:open_db(CouchServer, DB),
    {ok, #state{db=CouchDB}}.

    It’s done once till crush.

    PS: I’m erlang newbie so I can misunderstand the whole concurrent consept here.)

    • Tristan Sloughter permalink

      I assume you are referring to the use of the simple_one_for_one for the database gen_server?

      If it was done the way you suggest only 1 DB request would be handled at a time. Each time a resource wanted to make a DB call it would pass a message to the ece_db gen_server and wait for a response. This message would only be handled once it is done handling any other resources who sent messages before it.

      The way I do it here is a new ece_db gen_server process is created for every web request to the resource. This way a resource is not blocked for database access by another resource.

      If we weren’t using CouchDB here we’d do something a little different. Say if it was MySQL we’d have a pool of connections that resources would have to share instead of one being created each time. But since its CouchDB the “connection” we create isn’t really a connection like in the MySQL sense. It is simply a record that stores the information we need to send the RESTful HTTP request to the CouchDB server.

  2. Hello would youu mind sharing which blog platform you’re working
    with? I’m planning to start my ownn blog in thee near future but I’m having a tough time deciding between BlogEngine/Wordpress/B2evolution and Drupal.
    The reaspn I ask is because your design seemss different then most blogs and I’m lookming
    for something completely unique. P.S My apologies for being off-topic
    but I hadd to ask!

Trackbacks & Pingbacks

  1. Tweets that mention eCloudEdit Part 2: CouchDB « -- Topsy.com
  2. Erlang Simple One for One Supervisor « (defun side of kungfooguru)
  3. Backbone Example Sites with Tutorials | Hugo Mineiro Portfolio

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: