I work with a vendor application which needs to do all sorts of data crunching in some nightly batches, and, depending on the size of data, this process may take a lot of time.
Some time a go a colleague said: “we need more parallelism, we want to parallelize all the things! What can you think may go wrong?”
This opened a stream of consciousness that could be really handled only by opening an ad-hoc blog about SQL Server performance, issues and maybe simply neat stuff too, so, here we are.
Back to the original question:
When can you have issues with parallelism in SQL Server?
To which I replied:
Which one? Parallelism in the query, or parallelism of queries (i.e. more queries flying around)?
For parallelism in the query,
i.e. for higher (MAX)DOP, you may have if the cardinality estimator (and then the query optimizer) estimates an incorrect cardinality for the query:
- Bad cost estimate for the query, the cost surpasses the Cost Threshold for Parallelism set at the instance level, a parallel plan is evaulated
- Most likely, the parallel plan will be chosen and executed, splitting the load between N+1 threads (where 1 is the orchestrator)
- Each plan will then contain the parallelism operators in the Execution Plan
- If we’re very unluckly, the division of the rows to elaborate between threads will very skewed; we’ll have one thread that has to work more than the others, and we’ll have to wait for him
If all of this happens, you get yourself in a situation of using more CPU time (to create the parallel plan), more threads (N+1), more memory grants and gaining more CXPACKET waitstats as a bonus in your DMVs.
- More CPU time consumed, means less CPU time for other tasks
- More threads means, in an extreme case and high concurrent system, to exaust the available SQL threads, encountering the dreaded THREADPOOL wait (article coming)
- More memory grants means that SQL Server grants memory space for the threads, even if they may not use it (because we had a bad estimate to begin with), which may cause a RESOURCE_SEMAPHORE wait (article coming), blocking all the new queries that aren’t able to get a hold of enough memory to run
What about parallelism of queries?
You’ll have to consider the following:
- If queries access the same objects, you may encounter some waits related to Locks (LCK_*) while the various threads are trying to read/write from/to the same tables, if you’re unlucky you can encounter some deadlocks too!
Tools to consider:
- Setting an appropriate isolation level
- Memory Optimized Objects (even better on SQL Server 2016 SP1+)
- Partitioning (But you need to have the exact conditions, the right queries and don’t forget to set the lock escalation to partition as opposed to table!)
- (For Datawarehouses) If you’re trying to explicitly parallelize inserts to save time, maybe you can let SQL Server do it’s thing and bulk insert with a minimally logged operation
- This is especially true if using Columnstore Indexes, of course; A bulk insert in a compressed rowgroup is blazing fast
- If you’re trying to do more stuff at the same time, maybe you don’t have the RAM for it, maybe all your threads are using different objects which don’t fit in memory at the same time and each thread is stealing memory from the other threads, only to then read from disk again when it’s its turn to run again; As you all know, reading from disk is immensely slower than reading from memory, you can verify if this is the case by verifying Page Life Expectancy (PLE)
- Let’s say that you’re really going big on parallelism, and are launching a lot of parallel queries, these queries have a big cost and SQL Server decides to use a parallel plan, well, you may be generating so many threads that actually exceeds the maximum number of worker threads allowed, at this point, the server becomes unresponsive as it cannot even accept new connections until some executing threads are over (this is the dreaded THREADPOOL)
Other than the above being considered, SQL Server is really though as an environment for concurrent operations, so if you don’t follow in any of the traps described, of course you should parallelize all the things!