Join the fun, have fun with JOINs

By Kim Stebel

If you’ve mainly worked with SQL databases in the past and are just coming to grips with Cloudant, you’ve probably run into the question of how to do JOINs. In this tutorial, we will start from a relational view of data and explain how to structure and query data in a Cloudant database.


source: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

Create an account and try Cloudant DBaaS yourself

The everything-in-one-document approach

In some cases, approximating JOINs in a document-oriented database like Cloudant is outright trivial. If you want to model a one-to-n relationship, you can put all n-related documents into the document they belong to. For example, if you have employees and their skillsets in your database and each employee has several skills, you can put them into an array in the employee document like so:

{
  "type": "Employee",
  "name": "Anna",
  "skills": [
    { "name": "java", proficiency: 10 },
    { "name": "couchdb",  proficiency: 5 }
  ]
}

Is this a good idea?

It can be, if:

  • The document does not get so big that it impairs performance. This can be mitigated somewhat by using database views or show functions.
  • The information in the inner document only appears there and does not need to be duplicated into other documents, or such duplication is acceptable for your application. For example, if you wanted to rename the "couchdb" skill to "couchdb / cloudant", you would have to update it in all employee documents.
  • The document does not get updated concurrently. If it does, there will likely be unnecessary conflicts that will need to be resolved by the application.

So while this approach is very simple, it is too simple for most purposes.

How to do it better

Instead, you can use an approach that more closely mimics how you solve this problem in a relational database. Each employee has their own document which contains a field for the department he or she belongs to:

{
  "type":"employee",
  "name": "Anna",
  "department": "Marketing"
}

This fixes the issues of the previous solution, but now you need to write a view to get all information about the department — including a list of its employees — in a single query. Here is what the map function looks like:

function(doc) {
  if (doc.type == “employee”) {
    emit([doc.department, 2], null);
  }
  if (doc.type == "department") {
    emit([doc._id, 1], null);
  }
}

You can then perform a range query on the view using start_key=["marketing", 1] and end_key=["marketing", 2] and include_docs=true, yielding the department document as well as all employee documents for the marketing department in one query. As a bonus, you could add a reduce function that lets you count the number of employees by department.

n-to-m relationships

Using a variant of this technique, you can even model n-to-m relationships. In this example, you have a database for employees and projects. Each employee can work on one or more projects and each project can have one or more employees working on it. There is one JSON document for each employee, one for each project, and one for each employee-project relationship.

JSON documents for employees look like this:

{
  “type”: “employee”,
  "_id": "Anna"
}

This is a document describing a project:

{
  “type”: “project”,
  “_id”: “Alpha”
}

The following document represents the fact that Anna works on project Alpha:

{
  "type": "works on",
  "employee_id": "Anna",
  "project_id": "Alpha"
}

Now you create a view that lets you query for all employees working on a given project. The view again uses an array as its key. Each key array has two elements: The ID of the project and the type of the document emitted. Since the documents that model the relationship are not the ones you would like to retrieve in a query, the view has to emit the IDs of the documents that should be included — in this case the IDs of the employee documents. This is done by using an object with an _id field as the emitted value.

function(doc) {
  if (doc.type == “works on”) {
    emit([doc.project_id, 2], {
      "_id": doc.employee_id
    });
    emit([doc.employee_id, 2], {
      "_id": doc.project_id
    });
  }
  if (doc.type == "project") {
    emit([doc._id, 1], null);
  }
  if (doc.type == "employee") {
    emit([doc._id, 1], null);
  }
}

Querying this view with include_docs=true, startkey=["alpha", 1] and endkey=["alpha", 2] will give you information about the alpha project as well as all employee working on it. Querying with include_docs=true, startkey=["anna", 1] and endkey=["anna", 2] will give you the document for Anna as well as all projects she is working on.

Let us know in the comments if you'd like to join the fun, and we'll update this post with an example data set so you can try it for yourself.

Sign Up for Updates!

Recent Posts