SQL, Cloudant, and Going Schemaless

By Diana Thayer

Shortly after I arrived in Boston, I attended a meetup where a local programmer demoed an app he'd built using a NoSQL database. When he said it was schemaless, the room burst into confusion. Several people, in different words, asked "You mean you just work with unstructured data? How is that viable?" The presenter might as well have told them that, well, your body doesn’t really need bones.

But he gave a good response, which I kept close at heart:

"Schemaless doesn't mean you can't have a schema. It means you have the flexibility to define it how you want, when you want."

Simon Metson wrote about one way of giving your databases structure. In this post, I’ll cover another way suited to going schemaless from an SQL setup.

If you're coming from a background in SQL, think of Cloudant documents as SQL rows, but which can each have an arbitrary number of columns. Cloudant databases can store any number of documents regardless of schema, so a single database can host multiple tables. This leads us to a couple of guidelines for moving data from a SQL system to a NoSQL solution:

  • If it was a row, make it a document.

  • If it was a table, give every document therein a unique "type" identifying it as a member of that table. (Ex: every row from "blog" table becomes a document with type: "blog")

N.B.: If you need to model data in the Third Normal Form, you can set document IDs explicitly (ex: "[user]-[role]" or "garbados-admin") to prevent duplication. If you don’t set an ID, Cloudant will generate one.

When all our documents are in the same database, they're much easier to query and manipulate with views. JOIN-like operations are much easier when all your documents are in the same database. Check these StackOverflow threads for examples of handling relational data in CouchDB and Cloudant:

If you can get your SQL data into a CSV, or even a series of them, you can use this Python script to upload them to Cloudant, using requests:

import requests
import csv
import json

# configuration values
config = dict(
    username="your_cloudant_username", 
    password="your_cloudant_password",
    database="database_to_upload_into")

# dict of table names and their csv representations
csv_lookup = {
    # "table_name": "path/to/file.csv"
    "example_table": "data.csv",
}

# dict of request data, which we'll upload to Cloudant
requests_data = {}

for table, filepath in csv_lookup.iteritems():
    request_data = dict(docs=[]) 
    # get our data
    with open(filepath, 'rb') as f:
        reader = csv.DictReader(f)
        # put into request body    
        for row in reader:
            row['type'] = table # add doctype based on table
            request_data['docs'].append(row)
    requests_data[table] = request_data

# authenticate with cloudant via cookie
auth = "name={username}&password={password}".format(**config)
auth_url = "https://{username}.cloudant.com/_session".format(**config)
auth_headers = {"Content-Type": "application/x-www-form-urlencoded"}
r = requests.post(auth_url, 
                 data=auth,
                 headers=auth_headers)
# save auth cookie
cookies = r.cookies

# upload!
upload_url = "https://{username}.cloudant.com/{database}/_bulk_docs".format(**config)
upload_headers = {"Content-Type":"application/json"}
for table, request_data in requests_data.iteritems():
    r = requests.post(upload_url, 
                     data=json.dumps(request_data), 
                     cookies=cookies,
                     headers=upload_headers)
    # if it worked, print the results so we can seeeeee
    if r.status_code in [200, 201, 202]: # on OK, Created or Accepted
        print "Upload success:", table
    # problems?! D:
    else:
        print r.status_code
        print r.text
        break

However, that's only one way of doing it. Schemaless means flexibility, so you may find your data requires a different architecture. If you have any questions about how your current schema might map to Cloudant, let us know on IRC in #cloudant, by email at support@cloudant.com, or write a question on StackOverflow.

Alternatively, our friends at Foxweave provide a tool that can migrate from all kinds of systems into Cloudant, including MySQL, MS SQL, PostgreSQL, and others. Check it out!

Happy coding!

Create an account and try it yourself

Sign Up for Updates!

Recent Posts