SQL Server 2022 – QAT Backups

One of the amazing aspects of my job at Pure Storage, is that I get opportunities to work with new and emerging tech, oftentimes before it is available to the general public. Such is the case with SQL Server 2022, where I got to help test QAT backups for SQL Server 2022.

TL;DR

Using QAT compression for your native SQL Server backups will give you better compression with less CPU overhead, than “legacy” compression. So I get smaller backup files, faster, with less CPU burn. What’s there to not like?

Tell Me More About This Q… A… T… Thing!

So QAT stands for Intel’s Quick Assist Technology, which is a hardware accelerator for compression. It’s actually been around for many years, but most regular folks like myself never got exposed to it, because you need a QAT expansion card in your server to even have access to its powers. And for us SQL Server folks, we had nothing that took advantage of QuickAssist Technology… until now thanks to SQL Server 2022.

In SQL Server 2022, Microsoft has introduced QAT support for Backup Compression. And as I demonstrated in this blog post, your backup files are essentially byte-for-byte copies of your data files (when not using compression or encryption). And I don’t know about you and your databases, but the SQL Server environments I see these days, database sizes continue to grow and grow and grow… so I hope you use compression to save backup time and space!

But I Don’t Have QAT Cards In My SQL Servers

I said earlier that QAT has been around for a number of years, available as expansion cards. But because SQL Server had no hooks to use QAT, I strongly doubt that any of us splurged for QAT cards to be added into our SQL Servers. But there’s two things coming that’ll change all of that…

First, SQL Server 2022 has both QAT hardware support AND QAT software emulation. This means you can leverage QAT goodness WITHOUT a QAT expansion card.

Second, the next generation of Intel server processors will have QAT hardware support built in! So the next time you do a hardware refresh, and you buy the next gen of Intel server CPUs, you’ll have QAT support!

Third, if you cannot get the latest snazzy CPUs in your next hardware refresh, QAT cards are CHEAP. Like, less than $1k, just put it on a corporate charge card cheap.

IMPORTANT – QAT hardware support is an Enterprise Edition feature. But you can use QAT software mode with Standard Edition. And if you stay tuned, you’ll come to find that I’ve become a big fan of QAT software mode.

How’d You Test This Andy?

In my team’s lab, we have some older hardware lying around that I was able to leverage to test this out. Microsoft sent us an Intel 8970 QAT card, which we installed into one of our bare metal SQL Servers, an older Dell R720 with 2x Xeon E5-2697 CPUs and 512GB of RAM.

Database being backed up is a 3.4TB database, with the data spread across 9 data files across 9 data volumes. The data volumes were hosted on a FlashArray and the backup target was a FlashBlade.

To test, I used the above database and executed a bunch of BACKUP commands with different combinations of parameters. I leveraged Nic Cain’s BACKUP Test Harness to generate my T-SQL backup code. If you haven’t used it before, it’ll generate a bunch of permutations of BACKUP commands for you, mixing and matching different parameters and variables. I was particularly pleased that it also included baseline commands like a plain old BACKUP, and a DISK=NUL variant. I did have to make some modifications to the test harness to add in COMPRESSION options: NO_COMPRESSION, MS_XPRESS (i.e. legacy COMPRESSION), and QAT_DEFATE.

Sample BACKUP command used

Tangent: Backup READER & WRITER Threads

So I’ve always known that if you specify more output backup files, that’ll decrease your backup tremendously. But I never quite understood why, until I started this exercise and Anthony Nocentino taught me a bit about BACKUP internals.

In a backup operation, there’s reader threads that are consuming and processing your data, and there are writer threads that’s pushing your data out to your backup target files. If you run a bare bones basic BACKUP command, you get one READER thread and one WRITER thread to do your work. If you add additional DISK = ‘foobar.bak’ parameters, that’ll give you more WRITER threads; 1 per DISK target specified. If you want to get more READER threads, your database has to be split across multiple data VOLUMES (not files or filegroups).

If you were paying attention above, you’ll note that my test database consists of 9 data files across 9 data volumes. I set it up this way because I wanted more READER threads available to me, to help drive the BACKUP harder and faster.

Keep in mind, there’s always a trade-off in SQL Server. In this case, the more threads you’re running, the more CPU you’ll burn. And if you’re doing a bunch of database backups in parallel, or trying to run your backups at the same time as something else CPU heavy (other maintenance tasks, nightly processing, etc.) you may crush your CPU.

Tangent: FlashBlade as a Backup Target

FlashBlade is a scale-out storage array, whose super-power amounts to parallel READ and WRITE of your data. Each chassis has multiple blades and you can stripe your backup files across each of the different blades for amazing throughput. When you look at the sample BACKUP command, you’ll see different destination IP addresses. It is through these multiple Virtual IPs, which go to same appliance, but helps to stripe the backup data across multiple blades in FlashBlade.

Test Results

Legend

Compression TypeDefinition
NO_COMPRESSIONNo BACKUP compression used at all.
MS_XPRESS“Legacy” BACKUP compression used.
QAT_DEFLATE (Software)QAT BACKUP compression – Software emulation mode used
QAT_DEFLATEQAT BACKUP compression – Hardware offloading used

Baseline: DISK = NUL

BACKUP summary results – DISK = NUL
All test results are the average of 3 executions per variable permutation.

Remember, when using DISK = NUL, we’re NOT writing any output – all of the backup file data is essentially thrown away. This is used to test our “best case” scenario, from a READ and BACKUP processing perspective.

It’s interesting to see that without WRITE activity, QAT acceleration did help speed up our BACKUP execution vs legacy compression. And QAT does offer slightly better backup file compression vs legacy compression. But what I find the most impactful is CPU utilization, from both QAT hardware and software modes, is MUCH lower than legacy compression!

Note the Backup Throughput column. We actually hit a bit of a bottleneck here on the READ side, due to an older FibreChannel card in my test server and only having 8x PCIe lanes to read data from my FlashArray. The lab hardware I have access to isn’t cutting edge tech for performance testing, rather older hardware meant more for functionality testing. Moral of this story? Sometimes you I/O subsystem “issues” are because of network OR underlying server infrastructure, like the PCIe lanes and subsequent bandwidth limitations encountered here.

The Best: Files = 8; MTS = 4MB, BufferCount = 100

BACKUP summary results – Files = 8, MTS = 4MB, BufferCount = 100
All test results are the average of 3 executions per variable permutation.

I’m skipping over all of my various permutations to show the best results, which used 8 backup files, MAXTRANSFERSIZE = 2MB, and BUFFERCOUNT = 100.

Much like the DISK = NUL results, QAT yields superior compressed backup file size and CPU utilization. And in this case, Elapsed Time is now inverse – NO_COMPRESSION took the most amount of time, whereas in the DISK = NUL results, NO_COMPRESSION took the least amount of time. Why might that be? Well in the DISK = NUL scenarios, we don’t have to send data over the wire to write a backup target, whereas in this case we did. And using compression of any sort means we will have to send less data out and write less data to our backup target.

Stuck with TDE?

I also TDE encrypted my test database, then re-ran more tests. I found it interesting to see how a TDE database wound up taking more time across the board. And I found it interesting that with TDE + legacy compression, CPU usage was slightly lower but throughput was worse, vs non-TDE + legacy compression.

Parting Thoughts

Of course, the above is just a relatively small set of tests, against a single database. Yet, based on these results and other testing I’ve seen by Glenn Berry, I will admit that I’m VERY excited about SQL Server 2022 bringing QAT to the table to help improve BACKUP performance.

Even if you are stuck with older CPUs and do not have a QAT hardware card to offload to, QAT software mode beats legacy compression across the board.

I do need to test RESTORE next, because your BACKUPs are worthless if they cannot be restored successfully. But alas, that’s for another time and another blog post!

Thanks for reading!

Advertisement