Performance: Indexing JSON in SQL Server

- October 13, 2022

Continuing the JSON in SQL Server posts found here and here, a co-worker asked the interesting question "can you index the key values in a small SQL Server text column using JSON?" I didn't know the answer to that, so here we go.

Setup

The test box remains the same: Core i7 CPU, 16 GB RAM, 1 TB 7200 RPM data disk. SQL Server 2019.

The DicomFile table had a new column added, populated from the indexed SQL columns, and a new nonclustered index added.

alter table DicomFile add IndexedJson varchar(1000);

update DicomFile
set IndexedJson = concat('{'
, '"SopClassUid": "', SopClassUid, '",'
, '"SopInstanceUid": "', SopInstanceUid, '",'
, '"PatientId": "', PatientId, '",'
, '"StudyInstanceUid": "', StudyInstanceUid, '",'
, '"SeriesInstanceUid": "', SeriesInstanceUid, '"'
, '}')
from DicomFile;

create nonclustered index IDX_DicomFile_IndexedJson ON DicomFile (IndexedJson);

For comparison, I also created a heap table with only the indexed and full JSON data.

CREATE TABLE [dbo].[DicomFileHeap](
[IndexedJson] [varchar](1000) NULL,
[DicomJson] [nvarchar](max) NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

insert into DicomFileHeap
select IndexedJson, DicomJson
from DicomFileHeap;

create nonclustered index IDX_DicomFileHeap_Json ON DicomFileHeap (IndexedJson);

Queries

SQL index

select DicomJson
from dbo.DicomFile
where SopInstanceUid = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'

Result: 0 ms (no surprise there)

Unindexed JSON query

select DicomJson
from dbo.DicomFile
where isjson(DicomJson) > 0
and json_value(DicomJson, '$."00080018".Value[0]') = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'

Result: too long, canceled query

Indexed JSON query

select DicomJson
from DicomFile
where isjson(IndexedJson) > 0
and json_value(IndexedJson, '$."SopInstanceUid"') = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'

Result: 40 ms

Indexed JSON heap query

select DicomJson
from DicomFileHeap
where isjson(IndexedJson) > 0
and json_value(IndexedJson, '$."SopInstanceUid"') = '1.3.6.1.4.1.14519.5.2.1.6834.5010.753273133725609601126044674777'

Result: 40 ms

Inserts

Clustered index

insert into DicomFile

(SopClassUid, SopInstanceUid, PatientId, StudyInstanceUid, SeriesInstanceUid, DicomJson, IndexedJson)
select top 1 SopClassUid, SopInstanceUid, PatientId, StudyInstanceUid, SeriesInstanceUid, DicomJson, IndexedJson
from DicomFile
order by newid()

Results: 180 ms

Heap

insert into DicomFileHeap
select top 1 *
from DicomFileHeap
order by newid()

Results: 30 ms

Conclusion

TL;DR: Indexed JSON is close but not as fast as regular SQL indexes. Heaps provide better insert performance.

It's no surprise that the execution time of a regular SQL index is almost instant. The plan for that is an index seek, versus JSON queries that have to do either an index scan or table scan. Querying the non-indexed full JSON data is not a viable strategy, as the table scan has to parse gigabytes worth of text data.

I think this has been covered in other, more typical SQL Server discussions, but heaps provide better insert performance. This is because they avoid the overhead of the clustered index.

Going back to the original question of these SQL vs NoSQL comparisons, why do this? Well, in general, I think JSON text data should be avoided in a regular SQL Server data model. It misses a strength of a relational database: interoperability and a defined data model. If your application will do better with a document-style data model, a NoSQL datastore like MongoDB will serve you best. Basically, I think a primary use case for using JSON in SQL is if you're in a shop where NoSQL is discouraged or not allowed, and only 1 application will be using the datastore. In that case, maintaining the data model definitions only has to be done in one place, and performance will be good enough for small to medium sized applications.

Search This Blog

Desiderata Errata