Abhilash Padmanabhan

Posted on Aug 15, 2020 • Edited on Aug 16, 2020

Microsoft Connectors: enterprise search made easy

#microsoft #search #connectors #bing

Organizations that use Microsoft O365 benefit from maintaining their file assets and documents over collaborative services like Sharepoint. But that's not always the case as smaller teams maintain documents pertaining to their product in their own CMS. Moreover data from legacy systems tend to become obsolete when left unmanaged. With data staying at multiple sources, information becomes localized.

This post is about how organizational content sourced from multiple parties can be collected and accessed in a Unified channel with Microsoft O365. We deal with a feature called Connectors which is a Preview offering from Microsoft at the time of writing.

With the new Connectors program from Microsoft, it's possible to stream your multi-sourced data into one channel and search it effectively with an interface that we already know: Bing Search.

How connectors work

Connector services are based on the concept of search indexing. There are two classes of connectors:

Built-in connectors
Custom connectors

Built-in connectors are specific to O365 like DataLake, SQL & File System Connectors. For other systems, Microsoft supports custom connector development, a flexible developer-friendly way to organize & manage the data flow.

Note: Microsoft Connectors are soon to be generally available and here is a link with instructions to try out the preview.

Creating a Custom connector

Building a custom connector deals a lot with utilizing Search APIs available within Microsoft Graph, the developer platform for integrating business services. Every interaction is based on HTTP RESTful APIs from Microsoft Graph and requires an authenticated session (this deals with Azure Active Directory).

A custom Connector would involve 3 major steps:

Create your connection
Setup a schema
Indexing: Create items

Authentication

Search API can be accessed like any other service in Graph. You'll need to provision an Azure Active Directory app with autonomous settings as instructed here.

Create your connection

Connection defines a named entity for your data collection. Ideally every connection would possess a name indicating your data store.

Setup a schema

To start indexing items, connector needs a structure in which data would be persisted - list of properties just like how we define field names & types for a database table.
In addition, there's a range of customizable parameters that give control over ways to make search more effective.

Both primitive types as well as Collections are supported.

Parameter	What it means
name*	32-char max-length property name
type*	data type (provided by Microsoft Graph)
isSearchable	true if field should support full-text search
isQueryable	true if field should support name:value search `filetype:docx`
isRetrievable	true if field is used for display
isRefinable	true if field should act as a filter
labels	semantic type of property: `title`, `url`, `authors` to name a few
aliases	alternate name for the property

Indexing

Once we have connection with schema ready, it's time to index data. Every data item from the external system needs to be organized as per the schema as a property map:

{
  "name": "Logistics Management",
  "title": "Logistics Management: Procedure",
  "url": "https://<sys>/en/docs/oms/logistics-procedure.docx",
  "authors": [ "Alex Jones", "John Wick" ],
  "createdBy": "Alex Jones",
  "lastModifiedBy": "John Wick",
  "updatedTimestamp": "2020-08-15T06:01:38Z",
  "filename": "logistics-procedure",
  "filetype": "docx"
}

In addition, each item requires a resource ID (cannot exceed 32 characters in length) that'll be used to denote an item in the Index. This ID has to be a path parameter of Items Endpoint in Search API.

What Bing.com offers

To make sure Bing fetch these results and display to users, it needs a Search Layout to be mapped to your connection.

Setup a new vertical that refers to the data source.
Design a Layout for your results. Microsoft Search uses Adaptive card design, to develop a graphical representation of properties, text and images & extensively format your template.

And here we go!

How to manage these connectors

As a Tenant Administrator, high-level info about Connectors is available over Microsoft Admin portal. However, it doesn't show item info. The custom connector is solely responsible for the indexing. The indexing process is time-intensive as there's lot to be carried out under the hood:

Fetch data from systems
Transform data per external schema
Create data item on the index

This process can be optimized in multiple ways:

Limit the amount of data used for searching. Excessive indexing can in turn affect the relevancy ranking. Limitation: An item cannot exceed 4 MB size.
Write data as batches instead of one at a time. However, large batches involve excessive data transfer ultimately causing bottleneck. A judiciously tried batch value can help avoid multiple failure scenarios, alternative check out Batch API. Limitation: Microsoft has limited concurrent operations to 4 for now, expected to increase soon.
As long as data items are independent, handle errors at item-level rather than process-level. A tracking mechanism indicating amount of documents processed at any instant can help control the process efficiently.

Dealing with a system of about 300-350 GiB data, to automate indexing process with frequent updates, I built a connector management tool, with a UI made with React to initiate, schedule & track the progress of indexing, along with reports to understand reasons of failure.

Links to docs:

Create Connection
Setup schema
Data indexing
Property data types
Autonomous authentication
Verticals & Result Types

And THE END! Hope the post is pretty intuitive in getting to know about what connectors are and how to develop Search experience with Bing. Thanks for your time, appreciate a feedback. See you with another post.

Peace! ✌️

DEV Community

Microsoft Connectors: enterprise search made easy

How connectors work

Creating a Custom connector

Authentication

Create your connection

Setup a schema

Indexing

What Bing.com offers

How to manage these connectors

Links to docs:

Top comments (0)

Read next

New AI Method Teaches Language Models to Cite Sources Accurately Without Manual Training

Breakthrough: Smart Networks that Understand Context Show 50% Better Performance in Heterogeneous Systems

AI Creates Groundbreaking 3D-Aware System for High-Quality Image Editing and Synthesis

Refactoring my career