There are lots of convenient things about Cloudant, like its HTTP API or incremental MapReduce, but the thing that really blows my mind is replication, where any number of distributed nodes can masterlessly exchange state, bringing themselves into sync, whether fully or partially. If any nodes lose connection, they can still take writes, and will automatically come back up to speed when they're reconnected.
OK, so what?
Well, database servers can be nodes, so we can create ad-hoc masterless clusters using replication. Datacenters can be nodes, too, so we can get global applications the same speed of access as region-local apps. Or, browsers and phones can be nodes, so we can replicate right into the application, letting the app continue to operate flawlessly even while offline.
Cloudant hosts database clusters all around the world, and lets you replicate data between databases. So, how long will it take to replicate a document around the world? Specifically, this image:
We'll insert the image to a database at one end of the world. Then, we'll replicate it to the next, and the next, and the next... until the image has been replicated from one end of the Earth to the other. How long will it take?
First, get the experiment and its dependencies. Get node.js, then run this:
git clone email@example.com:garbados/datathatmoves.git cd datathatmoves npm install
Now, we'll need accounts in every data center:
Get one account per data center
Run cp config.example config.json
Enter each account's credentials into config.json
Run npm start
If you stop the npm start script mid-way, use bin/cleanup.js to reset your databases.
tl;dr: a video of the script
On average, my setup reported the image replicating around the world in under four seconds, or 3738.75 ms. Here's the sample output from one run:
1. create the necessary databases on all clusters... 2. put replication documents on all clusters... 3. upload document to replicate... 4. done! Lagoon 2 (US West) 1239 ms Meritage (US West) 1414 ms Malort (Chicago) 1941 ms Julep (US East) 2644 ms Jenever (Amsterdam) 2686 ms Mead (London) 2985 ms Sling (Singapore) 3459 ms In all, took 3459 ms
That's an overestimation, actually, since it's both the time it took the document to replicate, plus the time it took for my application to figure out it'd replicated.
Although our test case is small, teams at Samsung, Akamai, Microsoft, and others do this every day with datasets spanning many terabytes in order to get application data as close to the client as possible.
Recent technologies like PouchDB act as replication-ready nodes, bringing data right to the user. For example, Quilter uses PouchDB to sync your filesystem with Cloudant; I use it to sync my images with EggChair. This strategy is sometimes called local-first storage, and I'm personally a big fan of it.
By letting your application make changes locally like that, and letting those changes sync with the cluster over time (that is, about four seconds), the user is never waiting on the server to confirm changes, making their experience quick and seamless. If the app loses connection to the Internet, nothing changes for the users; all their relevant data is local to the app. When they regain connectivity, the app brings itself up to speed automatically.
Best of all? You don't code any of this. PouchDB handles it for you. Cloudant handles it for you. It's handled, so you can code what you love.
A growing list of technologies replicate like Cloudant. CouchDB did it first, but now PouchDB does too, as does Couchbase-Lite, and the upcoming Cloudant Sync. As we and this developer community advance the replication protocol, we'll also develop more and more tools for more and more environments to replicate anything, anywhere.
Replication opens the door to build robust, distributed, masterless systems that operate seamlessly in the face of connectivity issues, hardware failure, and even data conflicts. I am excited beyond words to stand at such a frontier :D
Happy coding!Create an account and try Cloudant yourself