Performance: SQL Server vs MongoDB (Reporting)

As a continuation of this performance comparison, I wanted to also compare performance in SQL Server's supposed strong point: reporting on very large relational data sets. Does it truly perform better than MongoDB? Again, this will attempt to be a mostly vanilla comparison, without the different optimizations that can be done in both platforms, for a couple of typical reporting queries.

Setup

The test box remains the same: Core i7 CPU, 16 GB RAM, 1 TB 7200 RPM data disk. SQL Server 2019, MongoDB 6.0.1.

Data was randomly generated using a console app and inserted into both platforms. For SQL Server, I created both a relational table structure, and a columnstore indexed table that is more typically seen in reporting databases. The columnstore table was created as a flat copy of the relational tables, and a SELECT FROM INSERT statement populated the data. MongoDB received the same data, with the Customer and Product data embedded in each document to avoid doing any lookups or joins.

Sum of Quantity Sold & Total Price per Customer

SQL Server (clustered index)

select cust.[Name], prod.[Name], sum(fact.QuantitySold), sum(fact.TotalPrice)
from FactTransaction fact
join DimCustomer cust on fact.CustomerId = cust.Id
join DimProduct prod on fact.ProductId = prod.Id
where cust.[Name] = 'The GlowUp'
group by cust.[Name], prod.[Name]

Results: 15 minutes

SQL Server (columnstore index)

select CustomerName, ProductName, sum(fact.QuantitySold), sum(fact.TotalPrice)
from TransactionCCI fact 
where CustomerName = 'The GlowUp'
group by CustomerName, ProductName

Results: 27 seconds

MongoDB

db.transactions.aggregate([{$match: {"Customer.Name": "The GlowUp"}}, {$group: {"_id":null, CustomerName: {"$first": "$Customer.Name"}, ProductName: {"$first": "$Product.Name"}, QuantitySoldSum: {$sum: "$QuantitySold"}, TotalPrice: {$sum: {"$toDouble": "$TotalPrice"}}}}])

Results: 18 minutes

Highest Total Price per Customer, per Product

SQL Server (clustered index)

select max(cust.Name) as CustomerName , max(prod.Name) as ProductName, max(TotalPrice) as TotalPrice
from FactTransaction fact
join DimCustomer cust on fact.CustomerId = cust.Id
join DimProduct prod on fact.ProductId = prod.Id
group by fact.ProductId, fact.CustomerId
order by CustomerName, ProductName

Results: 16.5 minutes

SQL Server (columnstore index)

select CustomerName , ProductName, max(TotalPrice) as TotalPrice
from TransactionCCI fact
group by ProductName, CustomerName
order by CustomerName, ProductName

Results: 32 seconds

MongoDB

db.transactionsTest.aggregate([{$group: {"_id": {"CustomerName": "$Customer.Name", "ProductName": "$Product.Name"}, "TotalPrice": {"$max": {"$toDouble": "$TotalPrice"}}}}]).sort({"CustomerName": 1, "ProductName": 1})

Results: 58 minutes

Conclusion

TL;DR: SQL Server is better at reporting against very large data sets.

"Normal" relational table structures perform better than MongoDB. Depending on the query, performance was either slightly better or much better. However, both structures were blown out of the water by a columnstore index. The performance difference is unquestionable. When dealing with reporting against very large data sets, if it's possible to use columnstore, do it.

A final note, in the future I would like to do a performance comparison of SQL Server's columnstore index versus a NoSQL columnstore platform, such as Cassandra.

Comments

Popular posts from this blog

Adding Font Awesome to .NET MAUI App

Comparing SELECT Performance of Joining Integers vs GUIDs in SQL Server

Performance: SQL Server vs MongoDB (JSON data)