It's just Emanuele showing you what you can do in the Microsoft Data Platform

How to lose hundreds of thousands of dollars by using functions in SQL Server

H

Ahh, functions, the greatest tool at a programmer’s disposal, they make code reusable and easy to read, they’ve been essential since the first function call was made in the last century…

Unfortunately for the developers out there, when working for SQL Server, especially when tuning for performance, you need to get everything you knew about programming and throw it out of the window, because in SQL we work with sets.

This is what you get when developers develop

Let’s suppose we have a Datawarehouse, like all the enterprise companies do, and let’s suppose that they use SQL Server since it’s a cool company.

The business, being the business, has decided that at a Datawarehouse level there needs to be an universal constant whenever a field returns a NULL value; e.g. they want “Undefined” to be displayed as opposed to NULL whenever it comes up in a query

A good developer would encapsulate this constant value into a function and return the value, so the other developers would just call the function as opposed to trying to remember which was the text to be displayed and how it’s spelled.

Such a function would be super simple, something like this:

CREATE OR ALTER FUNCTION dbo.NullConstant()
RETURNS nvarchar(9)
AS
BEGIN

    RETURN N'Undefined'

END

I’m using the ContosoDW sample database, so I don’t have to come up with stuff from thin air, and I want to see how are the sales by product going

SELECT  COALESCE(C.ProductCategoryName,dbo.NullConstant()) ProductCategoryName, 
		SUM(S.SalesQuantity) AS SalesQuantity, 
		SUM(S.SalesAmount) AS SalesAmount
FROM dbo.FactOnlineSales AS S 
INNER JOIN dbo.DimProduct AS P ON S.ProductKey = P.ProductKey 
INNER JOIN dbo.DimProductSubcategory AS SC ON P.ProductSubcategoryKey = SC.ProductSubcategoryKey 
INNER JOIN dbo.DimProductCategory AS C ON SC.ProductCategoryKey = C.ProductCategoryKey 
GROUP BY C.ProductCategoryName

Did you notice the function on the 1st line? Seems legit from a usability side, right?
Let’s see the execution plan:

Nothing too fancy, right? A problem isn’t really shown here, no warnings, finished in 4 seconds which could be acceptable for a reporting query; the problem is hidden.
(suspense)
Let’s check the plan properties (right click, properties on the select operator)

A lot of things going on here, let’s see:

  • The query took 4.2s
  • The query is single threaded (same CPU time as Elapsed Time) plus it’s shown in the plan above
  • There is an estimated DOP of 4 in my hardware, but for some reason it couldn’t generate a valid parallel plan

So it’s not like that the optimizer chose the serial plan since it was the most efficient, but it wasn’t even a matter of choice because the parallel plan can’t exist!
Why? Of course, it’s in the title, the scalar function screwed us!

Going parallel

As implied everywhere until now, the function is a problem, let’s try to remove it:

SELECT  COALESCE(C.ProductCategoryName,'Undefined') ProductCategoryName, 
		SUM(S.SalesQuantity) AS SalesQuantity, 
		SUM(S.SalesAmount) AS SalesAmount
FROM dbo.FactOnlineSales AS S 
INNER JOIN dbo.DimProduct AS P ON S.ProductKey = P.ProductKey 
INNER JOIN dbo.DimProductSubcategory AS SC ON P.ProductSubcategoryKey = SC.ProductSubcategoryKey 
INNER JOIN dbo.DimProductCategory AS C ON SC.ProductCategoryKey = C.ProductCategoryKey 
GROUP BY C.ProductCategoryName

Same query as before, only the functions is gone , let’s see the results on the plan and duration:

That’s a way different and totally parallel plan! but does it makes any sense?

Indeed, in my modest machine and with this modest Data warehouse we managed to cut to 1/3 the total time required by this query to complete, using all of our CPUs fully

Let’s talk money

In a pretty small real-world case, your DW would have at least 8 cores, that means 8 licensed cores. SQL Server license is $13,748 per 2 core pack, when we’re talking about the Enterprise version.

At $7,128 per core in licensing costs ( ignoring +25% for Software Assurance) and 8 total cores in the PROD environment, this means that cores for a total of $49,896‬ have been licensed but are not actually when queries containing a scalar function are used.

For standard edition the total loss goes down to “only” $12,551 , but if you’re using this edition, can you really afford to waste 12K for one server only?

All of the above, not including the costs for hardware and electricity if you’re on premise; on the cloud the cost hemorrhage is even worse! But I’ve already talked about it.

How to fix it the ugly way

If you REALLY want a function to have your constants around, then at least make it a ITVF so that the optimizer can work with it:

CREATE OR ALTER FUNCTION dbo.ApplyNullConstant()
RETURNS TABLE
AS RETURN
SELECT N'Undefined' NullConstant

A refactor of the code is still necessary, as this function doesn’t return a scalar value anymore, but a table, hence we car rewrite the 1st query as one following alternatives:

SELECT  COALESCE(C.ProductCategoryName,(SELECT NullConstant FROM dbo.ApplyNullConstant())) ProductCategoryName, 
		SUM(S.SalesQuantity) AS SalesQuantity, 
		SUM(S.SalesAmount) AS SalesAmount
FROM dbo.FactOnlineSales AS S 
INNER JOIN dbo.DimProduct AS P ON S.ProductKey = P.ProductKey 
INNER JOIN dbo.DimProductSubcategory AS SC ON P.ProductSubcategoryKey = SC.ProductSubcategoryKey 
INNER JOIN dbo.DimProductCategory AS C ON SC.ProductCategoryKey = C.ProductCategoryKey 
GROUP BY C.ProductCategoryName
WITH Base AS (
	SELECT  C.ProductCategoryName , 
			SUM(S.SalesQuantity) AS SalesQuantity, 
			SUM(S.SalesAmount) AS SalesAmount
	FROM dbo.FactOnlineSales AS S 
	INNER JOIN dbo.DimProduct AS P ON S.ProductKey = P.ProductKey 
	INNER JOIN dbo.DimProductSubcategory AS SC ON P.ProductSubcategoryKey = SC.ProductSubcategoryKey 
	INNER JOIN dbo.DimProductCategory AS C ON SC.ProductCategoryKey = C.ProductCategoryKey 
	GROUP BY C.ProductCategoryName
)
SELECT COALESCE(B.ProductCategoryName,NC.NullConstant) AS ProductCategoryName,SalesQuantity, SalesAmount
FROM Base B
CROSS APPLY dbo.ApplyNullConstant() NC

In both case we’re getting the parallel plan and the 1.5s total elapsed time, neat!

[In this case] SQL Server 2019 can do the work for you

Thanks to the genius mind of Karthik Ramachandra , SQL Server 2019 introduces FROID , which is an AI feature that basically does the function inlining for you, if possibile.

This is a pretty basic case, so FROID works particularly well; both of the queries above run with this plan:

Now the plan is parallel for both versions of the queries, and it’s faster too:


That’s because the optimizer it’s using another feature of SQL 2019, batch mode on rowstore:

This comes out “for free” in SQL Server 2019, but don’t expect it to work as flawlessly with more complex scalar functions.

Takeaway

In this case study, even by continuing to throw hardware at the issue, the result wouldn’t have changed, and there is no clear indication that there is a problem at all

Investing on someone that can operate the database in an efficient manner is as valuable as investing in the technology itself.
As my friends in Pirelli would say “Power is nothing without control”!

About the author

Emanuele Meazzo

Add comment

It's just Emanuele showing you what you can do in the Microsoft Data Platform

Emanuele Meazzo

My Social Profiles

If you're interested in following me around in the social world, here are some URLs for you