Get ready for the first section of the MongoDB Developer Certification! This part carries 8% weighting and focuses on document Types and Shapes.
Since this post has become longer than anticipated, it will be divided into two parts.
1.1 Identify the set of value types MongoDB BSON supports.
To focus, I took the liberty to split the types into two categories, like so:
Common types
Type | Size | Number | Alias | Notes |
---|---|---|---|---|
ObjectId | 12 bytes | 7 | "objectId" | |
Boolean | 1 byte | 8 | "bool" | true or false |
32-bit integer | 4 bytes | 16 | "int" | between -2^31 and 2^31-1 |
64-bit integer | 8 bytes | 18 | "long" | between -2^63 and 2^63-1 |
Decimal128 | 16 bytes | 19 | "decimal" | up to 34 decimal digits |
Double | 8 bytes | 1 | "double" | 15 to 17 decimal digits |
String | 2 | "string" | Variable size (UTF-8 encoded) | |
Object | 4 bytes | 3 | "object" | + size of the object |
Array | 4 bytes | 4 | "array" | + size of elements |
Binary data | 5 | "binData" | ||
Date | 8 bytes | 9 | "date" | |
Timestamp | 8 bytes | 17 | "timestamp" | |
Null | 0 bytes | 10 | "null" |
Not common types
Type | Size | Number | Alias | Notes |
---|---|---|---|---|
Min key | -1 | "minKey" | ||
Max key | 127 | "maxKey" | ||
Regular Expression | 11 | "regex" | ||
JavaScript | 13 | "javascript" | ||
DBPointer | 12 | "dbPointer" | Deprecated. | |
Symbol | 14 | "symbol" | Deprecated. | |
Undefined | 6 | "undefined" | Deprecated. |
The most common types
Despite the extensive list of types, the MongoDB documentation could provide more detailed information on each one. However, we can focus on the most common types.
ObjectId
ObjectIds are 12 bytes compound by:
- A 4-byte timestamp, measured in seconds since Unix epoch.
- A 5-byte random value generated once per process. This random value is unique to the machine and process.
- A 3-byte incrementing counter, initialized to a random value.
So, it's small, likely unique, fast to generate, and ordered.
Example of ObjectId value: 66b7ccfcde5c167d5c6c9561
With Mongosh it's possible to retrieve the timestamp from an ObjectID.
$ ObjectId('66b7ccfcde5c167d5c6c9561').getTimestamp()
> 2024-08-10T20:26:36.000Z
Important
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
- Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
- Are generated by clients, which may have differing system clocks.
Int32 and Int64
If a number can be converted to an integer32
, MongoDB will store it as such; otherwise, it will be converted to an integer64
.
Using Mongosh, you can explicitly specify which type you want to use.
> db.types.insertOne(
{
"intValue": 2147483647,
"intValueExplicity": Int32(1),
"longValue": 9223372036854775807,
"longValueExplicity": Long("9223372036854775807"),
});
In Mongosh when you wish to explicitly inform that value is a long value, it must be passed as a string.
Decimal128
Values are 128-bit decimal-based floating-point numbers that emulate decimal rounding with exact precision, supporting 34 digits of precision like this 9.999999999999999999999999999999999
.
This functionality is intended for applications that handle monetary data, such as financial, tax, and scientific computations.
Mongosh inserting the value 10,000,000,000,000.123456789
> db.decimal.insertOne({value: new Decimal128("10000000000000.123456789")})
Retrieving the data
> db.decimal.find()
{
_id: ObjectId('66c8c08ada326cac4262e372'),
value: Decimal128('10000000000000.123456789')
}
Double
Double is less precise than Decimal128. If your application doesn't deal with numbers that need to be stored with such precision, you can use double for saving decimal numbers.
Mongosh inserting the value 10,000,000,000,000.123456789
> db.double.insertOne({value: 10000000000000.123456789})
Retrieving the data
> db.double.find()
{
_id: ObjectId('66c8bfbeda326cac4262e371'),
value: 10000000000000.123
}
String
BSON strings are stored as UTF-8 making it possible to store most international data.
Important
Given strings using UTF-8 character sets, using sort() on strings will be reasonably correct. However, because internally sort() uses the C++strcmp
API, the sort order may handle some characters incorrectly.To verify this observation, I asked MongoDB to return just 4 documents in descending order. I think Иванов should come before O'Connor, but I don't know if И wiki is equal to N.
Boolean
No mysteries here, boolean types can hold only true or false values.
Date
BSON Date is a 64-bit
integer
that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This results in a representable date range of about 290 million years into the past and future.
Given the following code
public class Date
{
public DateTime DateTimeUtc { get; set; } = DateTime.UtcNow;
public DateTime DateTimeLocal { get; set; } = DateTime.Now;
}
DateTime is always saved as UTC.
Timestamp
I had difficulty finding a straightforward way to save timestamps with C#, so I used Mongosh instead.
BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date type. Check MongoDB timestamps
Insert with Mongosh console
> db.date.insertOne({timestamp: new Timestamp()})
Retrieve the data
> db.date.findOne({_id: ObjectId('66d478ac10620d368380a43f')})
{
_id: ObjectId('66d478ac10620d368380a43f'),
timestamp: Timestamp({ t: 1725200556, i: 4 })
}
Object and Array
With MongoDB, it's possible to store complex objects structures or arrays within a document.
Example of object
{
_id: ObjectId("66c1fec432fc73d4982e5ee9"),
name: "Liam Wilson",
address: {
street: "",
zipcode: "",
city: ""
}
}
Example of array
{
_id: ObjectId("66c1fec432fc73d4982e5ee9"),
name: "Liam Wilson",
address: ["stree name 1", "stree name 2"]
}
Example of array of objects
{
_id: ObjectId("66c1fec432fc73d4982e5ee9"),
name: "Liam Wilson",
address: [{
street: "",
zipcode: "",
city: ""
}]
}
Null
Being schema-less, MongoDB allows having each document with different types for the same field, which means that in the same collection is possible to have different documents.
The example bellow shows that by creating a field type as a string and updating the value to null, the type changes to null.
> db.string.insertOne({value: "lorem ipsum"})
Let's check the type
> db.string.aggregate([{$project: {value: 1, nameType: {$type: "$value"}}}])
<
{
_id: ObjectId('66c9d9d875b145385f4c7db6'),
value: 'lorem ipsum',
nameType: 'string'
}
I'll cover the aggregate method on CRUD post.
Let's update the value field to null
> db.string.updateOne({_id: ObjectId('66c9d9d875b145385f4c7db6')}, {$set: {value: null}})
Checking the type again.
> db.string.aggregate([{$project: {value: 1, nameType: {$type: "$value"}}}])
<
{
_id: ObjectId('66c9d9d875b145385f4c7db6'),
value: null,
nameType: 'null'
}
BinaryData
BSON Binary Values are a fundamental data type in the BSON format, which is used for storing data in MongoDB. They essentially represent raw binary data, such as images, audio files, or other binary-encoded information.
I'll focus only on UUID and describe it, but here's the complete table with possible binary types.
Number | SubType |
---|---|
0 | Generic binary subtype |
1 | Function data |
2 | Binary (old) |
3 | UUID (old) |
4 | UUID |
5 | MD5 |
6 | Encrypted BSON value |
7 | Compressed time series data |
128 | Custom data |
UUID
Universally Unique Identifier aka UUID is a 128bits value represented as 32 hexadecimal characters. More about UUID
Let's get into the code and see the differences between UUID old and UUID.
Saving data as GUID type in C# will lead us to have the data as a sub-type 3 - UUID (old).
It took me some time to understand why I was having it as UUID old instead of having a subtype 4 - UUID.
To save the following UUID 057f3e75-24a0-468c-8788-5b3bbb7be407
as a subtype 4 - UUID, I have to use an attribute with my propriety as GuidRepresentation.Standard
. Otherwise, it'll save as UUID old.
public class Binary
{
string Uuid_AsString { get; set; } = "057f3e75-24a0-468c-8788-5b3bbb7be407";
Guid Uuid_SubType3 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");
[BsonGuidRepresentation(GuidRepresentation.Standard)]
Guid Uuid_SubType4 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");
}
MongoDB Compass shows the binary value for UUID like this UUID('057f3e75-24a0-468c-8788-5b3bbb7be407')
better than Binary.createFromBase64('dT5/BaAkjEaHiFs7u3vkBw==', 3)
for the binary UUID old. It helps when you have to find some document by its UUID.
And the document
{
"Uuid_AsString": "057f3e75-24a0-468c-8788-5b3bbb7be407",
"Uuid_SubType3": {
"$binary": {
"base64": "dT5/BaAkjEaHiFs7u3vkBw==",
"subType": "03"
}
},
"Uuid_SubType4": {
"$binary": {
"base64": "BX8+dSSgRoyHiFs7u3vkBw==",
"subType": "04"
}
}
}
Notice that the base64 binary value diverges between them. It can lead us to problems, so be careful. Since sub-type 3 is old, make sure you always use sub-type 4.
Converting these two values from Base64 to text show us that sub-type 3 value dT5/BaAkjEaHiFs7u3vkBw==
is converted into 753e7f05-a024-8c46-8788-5b3bbb7be407
and sub-type 4 value BX8+dSSgRoyHiFs7u3vkBw==
is converted into 057f3e75-24a0-468c-8788-5b3bbb7be407
.
I found this amazing tool this week https://cryptii.com/ so check that.
The other problem I ran into was retrieving the data. Filtering by Uuid_SubType4 directly wasn't working and nothing was returned. I have to use GuidRepresentationMode = GuidRepresentationMode.V3
code to retrieve the data.
This code
GuidRepresentationMode = GuidRepresentationMode.V3
is already obsolete and will be removed in a later release. I haven't found out yet another way and MongoDB documentation still shows it as a solution. Let me know in the comments if you know another way to solve that.
// This property will be removed in a later release.
BsonDefaults.GuidRepresentationMode = GuidRepresentationMode.V3;
After forcing my application to use GuidRepresentationMode.V3
I have to alter my POCO to this:
public class Binary
{
string Uuid_AsString { get; set; } = "057f3e75-24a0-468c-8788-5b3bbb7be407";
[BsonGuidRepresentation(GuidRepresentation.CSharpLegacy)]
Guid Uuid_SubType3 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");
[BsonGuidRepresentation(GuidRepresentation.Standard)]
Guid Uuid_SubType4 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");
}
After that, I was able to filter my data.
var filterSubType4 = Builders<Binary>.Filter
.Where(p => p.Uuid_SubType4 == Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407"));
var result = binary.Find(filterSubType4).ToList();
I had no idea that the UUID topic would take so long to describe. Anyway, here is my takeaway:
- Set your properties as
GuidRepresentation.Standard
to ensure using a binary sub-type 4;- It helps when you need to find some data with MongoDB Compass because UUID value is shown instead of a Base64 value.
- Having GUID properties not set, mark them as
GuidRepresentation.CSharpLegacy
; - Force your application to use
BsonDefaults.GuidRepresentationMode = GuidRepresentationMode.V3
;- This helps querying your data;
- Remember this is an obsolete propriety.
- Binary data are preferred over string. It's faster and smaller.
References
- https://www.mongodb.com/docs/manual/reference/bson-types/
- https://www.mongodb.com/docs/mongodb-shell/reference/data-types/
- https://www.mongodb.com/developer/products/mongodb/bson-data-types-decimal128/
- https://studio3t.com/knowledge-base/articles/mongodb-best-practices-uuid-data/ -https://blog.georgekosmidis.net/mongodb-shell-or-compass-query-with-a-guid.html
- https://www.mongodb.com/docs/drivers/csharp/current/fundamentals/serialization/guid-serialization/
- https://www.mongodb.com/developer/languages/csharp/mongodb-classmaps-optimal-performance/#serializing-guids-in-a-consistent-way
- https://cryptii.com/
Top comments (0)