Performance: Indexing JSON in SQL Server
Continuing the JSON in SQL Server posts found here and here, a co-worker asked the interesting question "can you index the key values in a small SQL Server text column using JSON?" I didn't know the answer to that, so here we go.
Setup
The test box remains the same: Core i7 CPU, 16 GB RAM, 1 TB 7200 RPM data disk. SQL Server 2019.
The DicomFile table had a new column added, populated from the indexed SQL columns, and a new nonclustered index added.
update DicomFile
set IndexedJson = concat('{'
, '"SopClassUid": "', SopClassUid, '",'
, '"SopInstanceUid": "', SopInstanceUid, '",'
, '"PatientId": "', PatientId, '",'
, '"StudyInstanceUid": "', StudyInstanceUid, '",'
, '"SeriesInstanceUid": "', SeriesInstanceUid, '"'
, '}')
from DicomFile;
create nonclustered index IDX_DicomFile_IndexedJson ON DicomFile (IndexedJson);
For comparison, I also created a heap table with only the indexed and full JSON data.
[IndexedJson] [varchar](1000) NULL,
[DicomJson] [nvarchar](max) NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
insert into DicomFileHeap
select IndexedJson, DicomJson
from DicomFileHeap;
create nonclustered index IDX_DicomFileHeap_Json ON DicomFileHeap (IndexedJson);
Queries
SQL index
from dbo.DicomFile
where SopInstanceUid = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'
Result: 0 ms (no surprise there)
Unindexed JSON query
from dbo.DicomFile
where isjson(DicomJson) > 0
and json_value(DicomJson, '$."00080018".Value[0]') = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'
Result: too long, canceled query
Indexed JSON query
from DicomFile
where isjson(IndexedJson) > 0
and json_value(IndexedJson, '$."SopInstanceUid"') = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'
Result: 40 ms
Indexed JSON heap query
from DicomFileHeap
where isjson(IndexedJson) > 0
and json_value(IndexedJson, '$."SopInstanceUid"') = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'
Result: 40 ms
Inserts
Clustered index
select top 1 SopClassUid, SopInstanceUid, PatientId, StudyInstanceUid, SeriesInstanceUid, DicomJson, IndexedJson
from DicomFile
order by newid()
Results: 180 ms
Heap
select top 1 *
from DicomFileHeap
order by newid()
Results: 30 ms
Conclusion
TL;DR: Indexed JSON is close but not as fast as regular SQL indexes. Heaps provide better insert performance.
It's no surprise that the execution time of a regular SQL index is almost instant. The plan for that is an index seek, versus JSON queries that have to do either an index scan or table scan. Querying the non-indexed full JSON data is not a viable strategy, as the table scan has to parse gigabytes worth of text data.
I think this has been covered in other, more typical SQL Server discussions, but heaps provide better insert performance. This is because they avoid the overhead of the clustered index.
Going back to the original question of these SQL vs NoSQL comparisons, why do this? Well, in general, I think JSON text data should be avoided in a regular SQL Server data model. It misses a strength of a relational database: interoperability and a defined data model. If your application will do better with a document-style data model, a NoSQL datastore like MongoDB will serve you best. Basically, I think a primary use case for using JSON in SQL is if you're in a shop where NoSQL is discouraged or not allowed, and only 1 application will be using the datastore. In that case, maintaining the data model definitions only has to be done in one place, and performance will be good enough for small to medium sized applications.
Comments
Post a Comment