Webizing your database with Linked Data in JSON-LD on Cloudant

By admin

The Web, as of course you know, is covered with links. These links connect related pages of information together and allow the person (or machine) browsing the web to follow those links to gather more information. Linked Data is that same idea, brought to data.

Data is often stored and distributed in esoteric formats. You made up the data formats for your application, or the developer before you did. I made up the formats for mine, and so on.

Even when the data is available in a parse-able format (CSV, XML, JSON, etc), there is often little provided with the data to explain what's inside. If there is descriptive meta data provided, it's often only meant for the next developer to read when implementing yet-another-parser for said data.

Really, it's all quite abysmal...

Enter, JSON-LD! JSON-LD (JSON Linked Data) is a simple way of providing semantic meaning for the terms and values in a JSON document. Providing that meaning with the JSON means that the next developer's application can parse and understand the JSON you gave them.

JSON-LD Logo

Webized data in a Webized Database

Way back in 1998, Sir Tim Berners-Lee (aka. TBL; inventor of the Web you're using to read this blog post), presented the idea of "Webizing" things.

The essential process in webizing is to take a system which is designed as a closed world, and then ask what happens when it is considered as part of an open world. Practically, this effect on a computer language is to replace the names/tokens/identifiers for URIs. Thus, where before reference could only be made to something in the same document/program/module one can with equal ease make reference to something in a different one somewhere in that abstract space which is the Web.

It is now 2014, and the future of a Webized Database is a reality! In fact, it's been a reality since 2010 when Cloudant went online for the general public.

Cloudant's HTTP API sets it squarely on the right hand side of TBL's webize table.

x webize(x)
Hypertext WWW
Data Linked data
Top-down structured design Bottom-up ontology design
Data Hiding Data Re-use
Goto Considered Harmful Goto drives the economy
unix file system ACL'd r/w linked data
Large-scale structure: Hierarchy Large-scale structure Scale free
"Tired" "Wired"

Every Cloudant database has a URL. Every document inside of that database has a URL. The results of every MapReduce or Full-Text Search index has a URL. Every...you get the idea. ;)

Given that URLs are the stock and trade of a Cloudant database, let's look at the additional magic we can provide to data with JSON-LD to add more meaning and connectivity to concepts and other data on the Web.

Get a Cloudant Account, and start Webizing your data!

First, let's set some @context

You likely already have a load of JSON document somewhere...or you know of some social network APIs that could get you some.

The key feature that JSON-LD provides is the ability to give the keys and values in those JSON documents semantic meaning. It does this by associating the keys in your JSON documents to URL's that can be used by a developer or an application to know what to expect the value to be and how that value should be treated (its type, schema, other objects, etc).

This globally addressable meta information about a document is provided along side the existing JSON inside a @context object. The ''along side'' bit is key. It means that you can provide meaning and additional value to your data when distributing it (externally or to others within your organization) without changing your existing JSON format.

JSON-LD libraries often let you provide both some raw JSON and the context document. The spec, however, also allows you to ship them together.

Let's look at how that can be done now, to some existing JSON documents, and without modifying anything we've previously written to our database.

Think of it as "append only awesomeness."

_show me some @context!

If you've used Cloudant, you're likely familiar with the idea of a Design Document for storing your MapReduce View functions and Full-Text Search indexing functions.

Design Documents can also contain Show Functions. These functions provide an opportunity to transform or otherwise modify a JSON document on its way out.

NOTE: _show functions aren't the most performance friendly piece of the Cloudant puzzle. However, they are very cache friendly. They include an E-Tag header that can be stored and used with the If-None-Match to return a much faster 304 Not Modified response with an empty body--which is what browsers do.

Friend-of-a-Friend at Cloudant

Here's some initial JSON I made up (somewhat) at random for this blog post. It states that I know Simon Metson, Max Thayer, and Mike Miller:

{
  "_id": "BenjaminYoung",
  "first_name": "Benjamin",
  "last_name": "Young",
  "knows": ["SimonMetson", "MaxThayer", "MikeMiller"]
}

You can look at that, and make sense of it. However, if I gave that same JSON to an application that expected a name object with given and surname fields...then new custom processing code would have to be written. It would have to be written using either a) the developers understanding of what I meant by first_name or by referencing some documentation that hopefully explained it more clearly.

So, let's give that thing some @context!

We'll be using the FOAF vocabulary which is a document format for describing people, who they know, etc.

{
  "@context": {
    "@base": "http://bigbluehat.cloudant.com/foaf/_design/json-ld/_show/foaf/",
    "_id": "@id",
    "first_name": "http://xmlns.com/foaf/0.1/givenName",
    "last_name": "http://xmlns.com/foaf/0.1/familyName",
    "knows": {
      "@id": "http://xmlns.com/foaf/0.1/knows",
      "@type": "@id"
    }
  }
}

That bit of @context provides valuable meta data to someone who understands a Friend-of-a-Friend document (or someone or something that can follow URLs to explain it to themselves). The person data in my Cloudant database is no longer just some esoteric JSON I made up when writing a blog post about people. It actually has value when you look it up in the FoaF context.

Here's the _show function to add that:

function (doc, req) {
  var tmp = doc;
  var path = req.path;
  path.pop(); // drop the current doc name
  var base = "http://" + req.headers.Host + "/" + path.join("/") + "/";
  tmp['@context'] = {
    "@base": base,
    "_id": "@id",
    "first_name": "http://xmlns.com/foaf/0.1/givenName",
    "last_name": "http://xmlns.com/foaf/0.1/familyName",
    "knows": {
      "@id": "http://xmlns.com/foaf/0.1/knows",
      "@type": "@id"
    }
  };
  delete tmp.couchapp;
  delete tmp._revisions;
  return {"json": tmp};
}

Note: feel free to hack on this further. This sample code is in the Cloudant Labs Spellbook.

Accessing that newly contextualized JSON document looks like this:

http://bigbluehat.cloudant.com/foaf/_design/json-ld/_show/foaf/BenjaminYoung

The response, as you likely guessed, looks like this:

{
  "_id": "BenjaminYoung",
  "_rev": "2-c81a120b45cdb4330673d4ff615cc020",
  "first_name": "Benjamin",
  "last_name": "Young",
  "knows": [
    "SimonMetson",
    "MaxThayer",
    "MikeMiller"
  ],
  "@context": {
    "@base": "http://bigbluehat.cloudant.com/foaf/",
    "_id": "@id",
    "first_name": "http://xmlns.com/foaf/0.1/givenName",
    "last_name": "http://xmlns.com/foaf/0.1/familyName",
    "knows": {
      "@id": "http://xmlns.com/foaf/0.1/knows",
      "@type": "@id"
    }
  }
}

We now have a properly contextualized document that has some meaning. Changing that meaning would look like simply making other @context objects and information available.

Once contextualized, a JSON-LD library can parse that data, transform it into RDF triples for graph-based processing, or simply use the data type information to store it properly within their data store--assuming they don't also have the flexibility of a schema-free document database like Cloudant.

Conclusion

Cloudant provides a "webized" database ready for global data sharing with contextualized meaning that can be used for everything from data interchange to form filling automation and processing. We'll cover some more of that greatness in future posts.

Special Thanks

Also, a quick thank you to dlongley and m4nu for their fabulous help in #json-ld on irc.freenode.net They filled in the gaps in my Linked Data know-how quite swimmingly. Thanks to you both!

Sign Up for Updates!

Recent Posts