A Technical Look at Cloudant Search

By David Hardtke

As we just announced, Cloudant has added fault-tolerant, federated full-text search to our document database offering.

Cloudant Search builds upon our existing database to offer full-text indexing and search. It makes use of the Apache Lucene query syntax and large portions of the Lucene framework, but uses a new data storage model entirely within the CouchDB database framework. The inverted index (the map of terms to documents in which they appear) is calculated using our incremental map-reduce framework. This means the database is always available for document insertion and reading even if the indexing task backs up - other search solutions only allow documents to be added as quickly as they can be indexed. Newly indexed documents are immediately searchable -- there is no latency while the inverted indices are rebuilt.

Documents within Cloudant are distributed to multiple nodes using consistent hashing in a ring like structure. The inverted index data associated with each document is saved on the same node as the document itself. Searches are done at the shard level, and results are combined in a federation layer. If a particular node in the cluster is offline, other copies of the shard are used. Cloudant utilizes the same underlying distribution and federation technology from the open-source Bigcouch project.

Important features of Cloudant Search include:

  • Text analysis can utilize existing Java based Lucene analyzer. Java code can be loaded into the database and stored along with the search indices.
  • Support for indexing in many languages (Java, Javascript, python, erlang, ruby, etc.) User indexing code is also stored in database.
  • Ability to run ad-hoc queries against multiple independent map-reduce views of data.
  • Full support for all JSON types during indexing (Strings, Numbers, JSONObjects). Pure Lucene search solutions only support String tokens.
  • Support for following query types: Boolean, Range (text, numerical, date, and compound key ranges), Phrase, Sorted, Prefix, Dates
  • Ability to query a “stale” snapshot of the database in case where indexing has not caught up to document insertion
  • Horizontal scalability through addition of database nodes
  • HTTP based search API consistent with CouchDB view API.

We see several distinct use cases where Cloudant Search will be superior to existing products. Since Cloudant runs as both a low cost hosted service and as a dedicated deployment for the largest customers, developers can build products around Cloudant Search and know that they will scale to terabytes of indexed data and high IOPS rates. An ideal use case would be mobile applications that require real-time geographical bounding box search coupled with text queries. Cloudant Search's native support for numerical index types allows this. Cloudant Search's reliance on CouchDB views also opens up unique possibilities. Multiple algorithms can be used simultaneously to search, score and sort documents.

To get started with Cloudant search, you first need to specify how your database is indexed by uploading a new CouchDB design document (indexing instructions). Then you can use the search API. There is also a full description of the query syntax.

Sign Up for Updates!

Recent Posts