In this demo tutorial, we will show how the diff and patch operation can be applied to monitor changes in TerminusDB schema, TerminusDB documents, JSON schema, and with other document databases like MongoDB.
A little background on JSON diff and patch
A fundamental tool in Git’s strategy for distributed management of source code is the concept of the diff and the patch. These foundational operations are what make git possible. Diff is used to construct a patch that can be applied to an object such that the final state makes sense for some value of makes sense.
But what about structured data? Do similar situations arise with structured data that require diff and patch operations? Sure they do.
In applications, when two or more people are updating the same object, such as an online store, this sort of curation operation is often achieved with a lock on the object. Which means only one person can win. And locks are a massive source of pain, not only because you can’t achieve otherwise perfectly reasonable concurrent operations, but because you risk getting stale locks and having to figure out when to release them.
When more than one person is working on a dataset, there are often times when there is a conflict. Without adequate workflow and conflict measures, quite often someone's change gets squashed and as a result data can start to become inaccurate. In the long run, this causes all sorts of issues with reporting, customer service, and business intelligence. This is where diff and patch comes in, where users can see a before and after state each time they submit their changes to the database. Here, any conflicts can be flagged and a human review can oversee these changes to ensure data accuracy in the long run. Better data, better decisions.
Using Diff and Patch with TerminusDB Python
Prerequisites
You will need to install the TerminusDB Python client, check out here.
Ensure you have the docker container running on localhost
.
In this script we demonstrate how diff
will give you a Patch
object back and with that object you can apply patch
to modify an object and we show this for TerminusDB schema, TerminusDB documents and JSON schema.
In terminusDB, documents and schemas are represented in JSON-LD format. With diff and patch, we can easily compare any documents and schemas to see what has been changed.
Let us look at a document as a Python object:
class Person(DocumentTemplate):
name: str
age: int
jane = Person(name="Jane", age=18)
janine = Person(name="Janine", age=18)
You can directly apply a diff to get a patch object:
result_patch = client.diff(jane, janine)
pprint(result_patch.content)
With the patch object (result_patch
here), you can either review its content or you can apply it to an object and you can get an after object back.
after_patch = client.patch(jane, result_patch)
pprint(after_patch)
assert after_patch == janine._obj_to_dict()
As you can see, the after_patch
object (document) is the same as janine
. You can put this document back in the database using replace_document
to commit this change.
Diff and patch also works with JSON-LD documents:
jane = { "@id" : "Person/Jane", "@type" : "Person", "name" : "Jane"}
janine = { "@id" : "Person/Jane", "@type" : "Person", "name" : "Janine"}
result_patch = client.diff(jane, janine)
pprint(result_patch.content)
It is also not limited to JSON-LD, it can works with schemas:
class Company(DocumentTemplate):
name: str
director: Person
schema1 = WOQLSchema()
schema1.add_obj("Person", Person)
schema2 = WOQLSchema()
schema2.add_obj("Person", Person)
schema2.add_obj("Company", Company)
result_patch = client.diff(schema1, schema2)
pprint(result_patch.content)
Note that diff and patch will work on most JSON formats.
Another application example is to compare 2 JSON schemas:
schema1 = {
"type": "object",
"properties": {
"name": { "type": "string" },
"birthday": { "type": "string", "format": "date" },
"address": { "type": "string" },
}
}
schema2 = {
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"birthday": { "type": "string", "format": "date" },
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" },
"country": { "type" : "string" }
}
}
}
}
result_patch = client.diff(schema1, schema2)
pprint(result_patch.content)
See the full script here
Using Diff and Patch with MongoDB
In this script we demonstrate how diff and patch can be used in your MongoDB workflow. The first part of the script is the MongoDB tutorial on how to use Pymongo and in the second part we demonstrate the extra step to review the changes before applying a patch to your MongoDB collection.
As we discovered in the last section, diff and patch can apply to any JSON format. Since MongoBD also uses JSON format to describe their data, we can use diff and patch to do similar things.
Here we use the tutorial for Pymongo as an example:
client = MongoClient(os.environ["MONGO_CONNECTION_STRING"])
# Create the database for our example (we will use the same database throughout the tutorial
connection = client['user_shopping_list']
collection_name = connection["user_1_items"]
item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "10%",
"batch_number" : "RR450020FRG",
"price" : 340,
"category" : "kitchen appliance"
}
item_2 = {
"_id" : "U1IT00002",
"item_name" : "Egg",
"category" : "food",
"quantity" : 12,
"price" : 36,
"item_description" : "brown country eggs"
}
collection_name.insert_many([item_1,item_2])
expiry_date = '2021-07-13T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
item_3 = {
"item_name" : "Bread",
"quantity" : 2,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}
collection_name.insert_one(item_3)
Imagine we want to change item_1:
new_item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "50%",
"batch_number" : "RR450020FRG",
"price" : 450,
"category" : "kitchen appliance"
}
We can compare the old and new item 1 with diff and patch:
tbd_endpoint = WOQLClient("http://localhost:6363/")
# Find the item back from database in case someone already changed it
item_1 = collection_name.find_one({"item_name" : "Blender"})
patch = tbd_endpoint.diff(item_1, new_item_1)
pprint(patch.content)
Again, we can review before making the change at MongoDB:
collection_name.update_one(patch.before, {"$set": patch.update})
This is another more complicated example:
expiry_date = '2021-07-15T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
new_item_3 = {
"item_name" : "Bread",
"quantity" : 5,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}
item_3 = collection_name.find_one({"item_name" : "Bread"})
item_id = item_3.pop('_id') # We wnat to pop it out and optionally we can add it back
patch = tbd_endpoint.diff(item_3, new_item_3)
pprint(patch.content)
# Add _id back, though it still works without
before = patch.before
before['_id'] = item_id
collection_name.update_one(before, {"$set": patch.update})
See the full script here
Using Diff and Patch with MongoDB JavaScript
Just like last section, diff and patch can be used to compare documents and schemas to see what has been changed using the JavaScript client.
In this script we will demonstrate it.
We created a function called patchMongo
:
const mongoPatch = function(patch){
let query = {};
let set = {};
if('object' === typeof patch){
for(var key in patch){
const entry = patch[key];
if( entry['@op'] == 'SwapValue'){
query[key] = entry['@before'];
set[key] = entry['@after'];
}else if(key === '_id'){
query[key] = ObjectId(entry);
}else{
let [sub_query,sub_set] = mongoPatch(entry);
query[key] = sub_query;
if(! sub_set === null){
set[key] = sub_set;
}
}
}
return [query,set]
}else{
return [patch,null]
}
}
We created an object that we can put back to update the data in MongoDB:
let patchPromise = client.getDiff(jane,janine,{});
patchPromise.then( patch => {
let [q,s] = mongoPatch(patch)
console.log([q,s]);
const res = db.inventory.updateOne(q, { $set : s});
console.log(res);
if (res.modifiedCount == 1){
console.log("yay!")
}else{
console.log("boo!")
}
console.log(patch);
});
See the full script here
We hope you found this tutorial useful. We’ve included some additional links below for further reading:
JSON Diff and Patch documentation.
Read more about JSON diff and patch and what it means for data collaboration.
Top comments (1)
Check out DeltaJSON as this might be useful for complex objects, it can even handle data moving in arrays.