DEV Community

Cover image for Query in Apache CouchDB: Clouseau
Jordan Soo Yen Yih
Jordan Soo Yen Yih

Posted on • Edited on

Query in Apache CouchDB: Clouseau

In previous articles, we have talk about how to query with CouchDB Views and Mango Query. Both methods are working very well and able to cover a lot of use cases. Why Clouseau?

There are still very limited for CouchDB Views and Mango Query when we talk about search, there are a lot of complex searching required which makes the Views function and Mango Index more complex and harder to build, at the same time need to have a great search performance. You are still able to build your own search engine from scratch with Mango and Views. However it is very tough and you have to put a lot of resources to build a good search engine, ton of work like text preprocessing, tokenization, algorithm, ranking and etc...😰
out_of_control_gif

Thanks to Clouseau brought CouchDB search to the next level🥳

Start from CouchDB v3, CouchDB can build and query full-text search indexes using an external Java service that embeds Apache Lucene. If you have been already familiar with Elasticsearch, then it is very easy for you to catch up with CouchDB + Clouseau as they are using the same Lucene Syntax.

Installation

To setup Clouseau works together with CouchDB, you may refer to my tutorial post or the official docs here.

How to use?

It is like Mango Query, create a design document for the search index function, then search with the index function.

Example Search Index Function:

function(document) {
    index("default", document._id);
    if (document.title) {
        index("title", document.title, {"store": true});
    }
    if (document.status) {
        index("status", document.status, { "store": false });
    }
}
Enter fullscreen mode Exit fullscreen mode

Design Document in full view:

{
  "_id": "_design/search",
  "_rev": "1-15807c8c7e310b566c0a41997d79b7fd",
  "views": {},
  "language": "javascript",
  "indexes": {
    "posts": {
      "analyzer": "standard",
      "index": "function(doc) {\r\n    index(\"default\", doc._id);\r\n    if (doc.status) {\r\n        index(\"status\", doc.status, { \"store\": false });\r\n    }\r\n    if (doc.title) {\r\n        index(\"title\", doc.title, {\"store\": true});\r\n    }\r\n}"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Above search index function allows us to search with document ID, title and status. By default it is searching with document ID if we didn't provide any key. The "store" with boolean we pass in in the third argument is to indicate whether you want to return the value in the search result, the default value is false.

GET /YOUR_DATABASE_NAME/_design/search/_search/posts?q=ea885d7d-7af2-4858-b7bf-6fd01bcd4544
Enter fullscreen mode Exit fullscreen mode

Result:

{
  "total_rows": 1,
  "bookmark": "g2wAAAABaANkABFjb3VjaGRiQDEyNy4wLjAuMWwAAAACYQBuBAD_____amgCRj_6gH-AAAAAYQFq",
  "rows": [
    {
      "id": "ea885d7d-7af2-4858-b7bf-6fd01bcd4544",
      "order": [
        1.6563715934753418,
        1
      ],
      "fields": {
        "title": "Post Two Title"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Let us try to search with post's status:

GET /YOUR_DATABASE_NAME/_design/search/_search/posts?q=status:submitted
Enter fullscreen mode Exit fullscreen mode

Result:

{
  "total_rows": 2,
  "bookmark": "g2wAAAABaANkABFjb3VjaGRiQDEyNy4wLjAuMWwAAAACYQBuBAD_____amgCRj_0mliAAAAAYQJq",
  "rows": [
    {
      "id": "c2ec3b79-d9ac-45a8-8c68-0f05cb3adfac",
      "order": [
        1.287682056427002,
        0
      ],
      "fields": {
        "title": "Post One Title"
      }
    },
    {
      "id": "4a2348ca-f27c-427f-a490-e29f2a64fdf2",
      "order": [
        1.287682056427002,
        2
      ],
      "fields": {
        "title": "Post Three Title"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Analyzers📈

Analyzers are settings that define how to recognize terms within text. Analyzers can be helpful if you need to index multiple languages.

There are 6 analyzers that are supported by the search:

  • classic - The standard Lucene analyzer, circa release 3.1.

  • email - Like the standard analyzer, but tries harder to match an email address as a complete token.

  • keyword - Input is not tokenized at all.

  • simple - Divides text at non-letters.

  • standard - The default analyzer. It implements the Word Break rules from the Unicode Text Segmentation algorithm

  • whitespace - Divides text at white space boundaries.

Based on your use cases to pick the suitable analyzer for your search index.

Geographical Searches🗺

Besides that, you can also do geographical searches in CouchDB with Lucene's built-in geospatial capabilities.😍

Example geographical data:

{
    "name":"Aberdeen, Scotland",
    "lat":57.15,
    "lon":-2.15,
    "type":"city"
}
Enter fullscreen mode Exit fullscreen mode

Example search index for the geographic data:

function(doc) {
    if (doc.type && doc.type == 'city') {
        index('city', doc.name, {'store': true});
        index('lat', doc.lat, {'store': true});
        index('lon', doc.lon, {'store': true});
    }
}
Enter fullscreen mode Exit fullscreen mode

HTTP Request:

GET /YOUR_DATABASE_NAME/_design/YOUR_DESIGN_DOC_NAME/_search/SEARCH_INDEX_NAME?q=lat:[0+TO+90]&sort="<distance,lon,lat,-74.0059,40.7127,km>"
Enter fullscreen mode Exit fullscreen mode

Abbreviated Result:

{
    "total_rows": 205,
    "bookmark": "g1A...XIU",
    "rows": [
        {
            "id": "city180",
            "order": [
                8.530665755719783,
                18
            ],
            "fields": {
                "city": "New York, N.Y.",
                "lat": 40.78333333333333,
                "lon": -73.96666666666667
            }
        },
        {
            "id": "city177",
            "order": [
                13.756343205985946,
                17
            ],
            "fields": {
                "city": "Newark, N.J.",
                "lat": 40.733333333333334,
                "lon": -74.16666666666667
            }
        },
        {
            "id": "city178",
            "order": [
                113.53603438866077,
                26
            ],
            "fields": {
                "city": "New Haven, Conn.",
                "lat": 41.31666666666667,
                "lon": -72.91666666666667
            }
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

Thank you for reading.

There are more you can do with CouchDB search. Do checkout the official documentation here and also Lucene Syntax as CouchDB search query syntax is using the Lucene Syntax.😊

thats_all_folk

Top comments (0)