Performance: SQL Server vs MongoDB (JSON data)

Anyone reading the title of this post may immediately start to question my sanity and/or competency. After all, SQL is a relational datastore and MongoDB is a document datastore, and wouldn't it just be obvious that a document datastore would be more effective at storing unstructured JSON documents? I have been learning more about NoSQL databases and that's the common wisdom, but I've been around long enough to doubt any hype over any technology. I want to know: how much better? Do the new JSON features in SQL Server 2019 compare well? I wanted to see how both systems perform when the rubber meets the road.

Setup

To do the comparisons, I'm using DICOM files as the data. The protocol details are too deep to get into here, but they're used in health care to transfer information and images between systems. They are distinct documents, with information stored in tags and sequences, and readily translate to a JSON format. Here's a full example. Updates are not really part of the DICOM standard, so we're only looking at insert and select performance.

The comparison will go like this:

  • Stand up basic instances of SQL Server and MongoDB on the same box. No frills, no special configurations. Both instances will have the same basic indexing on key fields.
    • Box: Core i7 CPU, 16 GB RAM, 1 TB 7200 RPM data disk. SQL Server 2019, MongoDB 6.0.1.
  • Build a simple console app that will ingest DICOM files and insert the data into both systems. Insert performance will be tracked and stored. Single thread, single inserts, nothing optimized so that we can do an apples to apples comparison (as much as is possible).
  • Run a bunch of data through the ingest, and then record query performance.

I went around the web and downloaded several public sample sets, for a total of 19,719 DICOM files.

Indexing was done slightly differently between the systems, using the key columns SOP Class UID, SOP Instance UID, Patient ID, Study Instance UID, and Series Instance UID. MongoDB would be storing the DICOM JSON as-is, and is capable of indexing on the unstructured data. SQL Server can emulate something similar using computed columns, but I didn't want the additional computations to impact query performance later. Instead, I extracted the index data into discrete columns, with non-clustered indexes, and stored the JSON data in an unindexed nvarchar(max) column.

Inserts

Running the files through the systems took almost an hour (unoptimized, remember?). Once finished, SQL Server's data and log files totaled 27.7 GB, and MongoDB files totaled 10.4 GB. So, right off the bat we're looking at a significant improvement in data storage by MongoDB. However, several of the DICOM files were over the 16 MB limit for MongoDB, so that's a win for SQL Server. Honestly, I don't understand why the MongoDB devs are so reluctant to increase that limit. It's 2022, 16 MB feels like a real "let them eat cake" attitude, like the apocryphal quote "640K RAM should be enough for anyone". Yes, I know there are workarounds like GridFS, but we're trying to keep a vanilla comparison.

Comparing the insert performance is a matter of querying the Ingest History table/collection in both systems.

SQL

select AVG(ElapsedTimeMilliseconds)
from [dbo].[IngestHistory]

Results: 64 ms

MongoDB

db.ingestHistory.aggregate([{$group: {_id:null, avgValue: {$avg: "$ElapsedTimeMilliseconds"}}}])

Results: 28 ms

Selects

I queried the data in 3 different ways: root level tags, and tags at 1 & 2 levels deep, using queries that are similar to real-world use. Root level tags were evaluated by finding a specific file by SOP Instance UID. Nested tags were evaluated by finding specific files, filtered by patient.

Find item by SOP Instance UID

SQL

select SopInstanceUid 
from dbo.DicomFile 
where SopInstanceUid = '1.3.6.1.4.1.14519.5.2.1.8700.9668.298614890072296783817276711233'

Results: 16 ms

MongoDB

db.dicomFile.find({"00080018.Value": "1.3.6.1.4.1.14519.5.2.1.8700.9668.298614890072296783817276711233"}).explain("executionStats")

Results: 8 ms

Find items by tag (1 level deep)

SQL

select DicomJson 
from dbo.DicomFile
where PatientId = 'Pseudo-PHI-008'
and isjson(DicomJson) > 0
and json_value(DicomJson, '$."00080014".Value[0]') = '1.3.6.1.4.1.14519.5.2.1.8700.9668.156605591388582060627280144362'

Results: ~300 ms

MongoDB

db.dicomFile.find({"00100020.Value":["Pseudo-PHI-008"], "00080014.Value":["1.3.6.1.4.1.14519.5.2.1.8700.9668.156605591388582060627280144362"]}).explain("executionStats")

Results: 2 ms

Find items by tag (2 levels deep)

SQL

select SopInstanceUid 
from dbo.DicomFile 
where PatientId = 'Pseudo-PHI-008'
and isjson(DicomJson) > 0
and json_value(DicomJson, '$."00120064".Value[0]."00080104".Value[0]') = 'Basic Application Confidentiality Profile'

Results: ~300 ms

MongoDB

db.dicomFile.find({"00100020.Value":["Pseudo-PHI-008"], "00120064.Value.00080104.Value":["Basic Application Confidentiality Profile"]}).explain("executionStats")

Results: 2 ms

Conclusion

TL;DR: MongoDB is obviously better at inserts and selects, but SQL Server isn't terrible.

So, the data shows a clear advantage of MongoDB over SQL Server for storing unstructured documents. I assume that this is also true for other NoSQL document datastores, but will probably extend this test to other vendors in the future. Insert performance is twice as fast, and select performance is approximately two orders of magnitude greater. The only times SQL Server came close to query performance is when it was a root level tag that was indexed in a separate column. MongoDB did way better than my expectations at parsing nested and unindexed tags.

But, that's not to say SQL Server did terribly. In fact, the JSON features performed better than I was expecting. In all the results, it's a difference of milliseconds. If we're talking about a small website or business application with a few thousand users, no one is going to notice the difference. In addition, as noted above MongoDB has a file size limitation, where SQL Server can store up to 2 GB for JSON data. For some environments that are locked into SQL Server without an approved NoSQL solution, it's not ideal but still possible to be successful. That said, if you can use MongoDB or another NoSQL solution, definitely use that instead.

Comments

Popular posts from this blog

Adding Font Awesome to .NET MAUI App

Comparing SELECT Performance of Joining Integers vs GUIDs in SQL Server