Replication with Cloudant, Pt. 3

By Max Thayer

In part two of this series, we talked about using the _replicator database to create replications that persist across node restarts. This time, we'll talk about application-level design patterns using replication. How does replication expand what you can build?

Per-User Databases

Because Cloudant provides a per-database security model, it's not uncommon to see applications with one database per user, where each user has permissions to read and write documents in their database. But then, how do you get an aggregate view of all your users' data? Replication!

There are three steps to execute for applications using per-user databases when a new user signs up:

  1. Replicate a database with all the design docs / starting docs the new user will need.

  2. Add the user, their credentials, and permissions for their database to the _users database.

  3. Continuously replicate some / all of the user's data to a master database, which aggregates user data.

As an example, I'll use two databases plus those for my users: setup and master.

N.B.: For more information on the _users database, see Authenication and Authorization

setup

setup contains design documents with all the indexes, validation functions, and attachments your users will need. When a user initially signs up, you would replicate setup into their database, and suddenly they're ready to go. For example, your application might call the _replicate endpoint with a request JSON like this:

{
  source: "https://<USERNAME>:<PASSWORD>@<ACCOUNT>.cloudant.com/setup",
  target: "https://<USERNAME>:<PASSWORD>@<ACCOUNT>.cloudant.com/<USER>",
  create_target: true
}
Note: <USER> in this context is a unique user that logs into your application and <USERNAME> represents the credentials used to login to Cloudant.

That HTTP call will return when the replication is complete, thus, when your user's account has been successfully provisioned. If you want that call to return immediately, and/or if you want setup to push any changes to it to your user's database, set continuous: true in the request JSON.

master

master contains a portion of all our users' information. Let's say we're building a Twitter-like application, where users have a public feed, but can also have private posts. In that case, we'd use filtered replication to only aggregate public documents to the master database. Since we want this replication to last a long time, we'll insert a document into the _replicator database to represent it:

{
  source: "https://<USERNAME>:<PASSWORD>@<ACCOUNT>.cloudant.com/<USER>",
  target: "https://<USERNAME>:<PASSWORD>@<ACCOUNT>.cloudant.com/master",
  continuous: true,
  filter: "setup/public"
}

In continuous replications, the source will ping the target every few seconds to check for new documents. To modify how often those heartbeat requests occur, and otherwise configure replications set up through _replicator, check out our Replication Guide.

What's that filter field refer to? A function under the filters field in the _design/setup document on this user's database. That naming setup/public maps to a DESIGN_DOC/FUNCTION_NAME pair in the source database. What would that design doc look like?

{
  ... // indexes and stuff
  filters: {
    public: function (doc) {
      if (doc._deleted || doc.public === true) {
        return true;
      } else {
        return false;
      }
    }
  }
}

The public function returns true if the document's public field is true, or if the document has been deleted; otherwise, false. This function is run for every document the source might replicate; documents for which the function returns false are not replicated.

Why replicate deleted documents? If a user deletes a document, it won't have a public field anymore, but it will have a new _deleted field. Replicating deleted documents deletes them on the target, so that users can reliably delete their information. Otherwise, our public feed might refer to documents that no longer exist in the user's database.

Local-First Storage

PouchDB keeps blowing my mind. Most of how I use it lies in the server- and client-layers of my apps, replicating filtered subsets of the database directly where I need it. That gives me local access speeds, while replicating changes back and forth as they occur, and allowing my apps to keep on marching even when they lose connectivity.

N.B.: Since Cloudant doesn't yet support CORS, you'll need to have clients replicate through a proxy.*

We discussed PouchDB recently, but here's a refresher on how to use it:

var db = new PouchDB("dbname"),
    remote = "https://<USERNAME>:<PASSWORD>@<ACCOUNT>.cloudant.com/<DATABASE>",
    opts = {
      continuous: true
    };

db.replicate.to(remote, opts);
db.replicate.from(remote, opts);

Now your PouchDB instance is bidirectionally replicating with your Cloudant instance. Any changes made on either end will replicate to the other.

I use this personally to sync images across devices. I use Quilter to replicate changes to my filesystem into a Cloudant database, while an app using PouchDB on my phone via PhoneGap replicates any pictures I take into that same Cloudant database, which then replicate down to my filesystem. So as to not clog my phone, that PhoneGap app only replicates from my phone to Cloudant, not the other way around. If I take a picture when my phone doesn't have service, that's fine; it'll copy the images up when it regains connectivity.

Nifty, eh?

Checkpoints

By default, replication writes checkpoint documents to the source and target databases in order to replicate more efficiently, but doing that requires you have write access to the source database. If you want to replicate without write permission to the source, set use_checkpoints to false when you initiate the replication.

use_checkpoints turns off the usage of checkpoint documents, significantly impacting the replication's efficiency, but alleviating the need for write permissions on the source. You can use this to perform small replications, such as sharing a small CouchApp. However, because disabling checkpoints so impacts replication efficiency, I can't in good faith recommend it for anything more than a few megabytes. If the replication job fails and needs to restart, it will have to restart from the beginning for lack of checkpoints.


Replications blow my mind. Cloudant has plenty of convenient things going for it, like its HTTP API or incremental MapReduce, but the ability for arbitrary nodes -- whether clusters, applications, or browsers and phones -- to keep each other up to date allows me to craft in days applications that would otherwise take teams of engineers months.

As always, if you have any trouble, check our docs, post your question to StackOverflow, ping us on IRC, or if you'd like to discuss the matter in private, email us at support@cloudant.com.

Happy coding!

Create an account and try Cloudant DBaaS yourself

Sign Up for Updates!

Recent Posts