A closer look at everything related to SQL Server

TempDB Bottlenecks and More

There has been many articles, documents and blog posts written about various problems and resolutions of TempDB. This is my effort at understanding and gathering relevant information about TempDB bottlenecks in one place for myself and any audience of this blog.

INTRODUCTION OF TEMPDB

Tempdb is a SQL Server temporary system database. Everytime SQL Server restarts, new Tempdb database is created. Whatever data was stored in the tempdb is lost. When TempDB gets recreated its data and log files reset to their last configured size. SQL Server records only enough information in the tempdb transaction log to roll back a transaction, but not to redo transactions during database recovery. This feature increases the performance of INSERT statements in tempdb. Tempdb database is always in Simple recovery mode. If you try to change it to Full Recovery mode, you will receive the following error message.

If you try to backup TempDB, you will get another error saying backup and restore is not allowed on TempDB database.

This system database is used by all the other user databases on SQL Server instance for their temporary work. It is the most busy database on almost all production SQL Servers. Here is a short list of few of the features that use tempdb.

  • Temporary tables, table variables, and table-valued functions.
  • Work tables associated with Cursors, Order By, Group by and Hash Query Plans.
  • Version stores used by Read Committed Snapshot Isolation
  • Index creation, Online index creation
  • DBCC CHECKDB
  • User-defined functions

These objects stored in TempDB are categorized as following:

USER OBJECTS – such as temporary local and global tables, table variables, table-valued functions etc.

INTERNAL OBJECTS – such as objects created to support indexing operations and query execution like work tables.

VERSION STORES – such as created when RSCI or SI isolation levels are used by transactions.

Following TSQL code uses DMV sys.dm_db_file_space_usage to get the TempDB space usage, separating out the above 3 categories.

— tempdb data file usage
SELECT GETDATE() AS runtime,
SUM(user_object_reserved_page_count) * 8 AS usr_obj_kb,
SUM(internal_object_reserved_page_count) * 8 AS internal_obj_kb,
SUM(version_store_reserved_page_count) * 8 AS version_store_kb,
SUM(unallocated_extent_page_count) * 8 AS freespace_kb,
SUM(mixed_extent_page_count) * 8 AS mixedextent_kb
FROM sys.dm_db_file_space_usage;

TEMPDB PROBLEMS

Problem 1: TempDB database grows out of disk space and only option left is to restart the SQL Server service.

There could be many ways to approach the resolution. But I believe monitoring the space usage is the best weapon. Here is a great post by William Durkin, who creates an Extended Event session to monitor space usage of data and log files of TempDB database.

— XEvent Session to track tempdb file growths
CREATE EVENT SESSION TempDBFileGrowth
ON SERVER
ADD EVENT sqlserver.database_file_size_change(
SET collect_database_name = ( 0 )
ACTION (sqlserver.client_app_name, sqlserver.client_hostname,
sqlserver.database_id, sqlserver.session_nt_username,
sqlserver.sql_text)
WHERE ([database_id] = ( 2 )))
ADD TARGET package0.event_file
( SET filename = N’TempDBFileGrowth.xel’,
max_file_size = ( 10 ), max_rollover_files = ( 5 ))
WITH (
MAX_MEMORY = 16384KB,
EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY = 1 SECONDS,
MEMORY_PARTITION_MODE = PER_NODE,
TRACK_CAUSALITY = OFF,
STARTUP_STATE = ON
);
GO
ALTER EVENT SESSION TempDBFileGrowth ON SERVER STATE = START;
GO

Create this Extended Event session for monitoring tempdb space usage. But be careful, this is for troubleshooting only. You can leave it running continuously but first change the dispatch_latency to 30 seconds which is the default. Dispatch latency controls how often XEvent data in buffer pool will be flushed and written to the disk. If this frequency is very low, on busy systems, it could affect the performance of SQL Server.

Problem 2: Tempdb PFS and SGAM page contentions.

Imagine a particular SQL Server instance processing concurrent queries. Each one of them creating tiny temp tables, use them and then drop them. These temporary tables are stored in Tempdb database. PFS page in the tempdb data file keeps track of all the page allocations and deallocations. Only one thread at a time can make changes in PFS and as a result you can see it creates a bottleneck. This behavior of Tempdb is known as PFS Page Contention. This issue can be resolved by having multiple Tempdb data files. The guidance is that you should start with 1 data file per CPU core till you reach 8 files. Add more files in chunk of 4 if you are still seeing PFS page contentions. You also need to enable trace flag 1117 to have these multiple files grow at same rate. This is a server wide setting so it changes the behavior of all user databases on the instance and not only TempDB.

Sometimes Tempdb reports contention in SGAM page. This happens when SQL Server searches for a mixed pages to allocate to a new object created and scans SGAM page. This results in severe contention on the SGAM page when multiple mixed-page allocations are underway. Deallocation activities also modify this page and create more contentions. In this situation, trace flag 1118 can be implemented to allow SQL Server to allocate full extents to the new objects. This is a server wide setting so it changes the behavior of all user databases on the instance and not only TempDB. Monitor this behavior using DMV sys.dm_os_waiting_tasks. You will notice wait types along with resource_description such as 2:1:1 and 2:1:3 (PFS and SGAM Pages in TempDB database in File 1).

SELECT
[owt].[session_id], [owt].[exec_context_id],
[owt].[wait_duration_ms], [owt].[wait_type],
[owt].[blocking_session_id], [owt].[resource_description],
CASE [owt].[wait_type]
WHEN N’CXPACKET’ THEN
RIGHT ([owt].[resource_description],
CHARINDEX (N’=’, REVERSE ([owt].[resource_description])) – 1)
ELSE NULL
END AS [Node ID],
[es].[program_name], [est].text,
[er].[database_id], [eqp].[query_plan],
[er].[cpu_time]
FROM sys.dm_os_waiting_tasks [owt]
INNER JOIN sys.dm_exec_sessions [es] ON
[owt].[session_id] = [es].[session_id]
INNER JOIN sys.dm_exec_requests [er] ON
[es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
[es].[is_user_process] = 1
ORDER BY
[owt].[session_id], [owt].[exec_context_id];
GO

The above query is from Paul Randal. Another useful blog about Tempdb is from Pamela Lahoud of Microsoft.

You will see later in this blog post that since SQL Server 2016, trace flags 1117 and 1118 are not needed because they are now the default behavior of TempDB database.

Problem 3: Tempdb Metadata contentions

Third type of bottleneck we see in Tempdb is from system tables. This is also known as metadata contention and it is occurring because of hardware improvements which enables creation and deletion of temporary objects at a faster rate in TempDB database. This is a good thing because we want TempDB to be faster but it created another type of contention which has to do with keeping track of all these metadata changes in the system table. SQL Server 2019 has pretty much resolved this issue by providing an option to place the TempDB system tables on In-Memory files. In-memory tables do not face any latches or locks as it is in RAM and not on the disk. Since these system tables only contain tempdb metadata their footprint is not that big on the SQL Server memory.

You can use the following script to view all the sessions that are waiting for page-related wait types and get information about the objects that the pages belong to. sys.fn_PageResCracker is a new function available in SQL Server 2019 that takes the Page_resource value from sys.dm_db_exec_requests dmv and give you the detail of locks and latches at the page level.

USE master
GO

SELECT
er.session_id, er.wait_type, er.wait_resource,
OBJECT_NAME(page_info.[object_id],page_info.database_id) as [object_name],
er.blocking_session_id,er.command,
SUBSTRING(st.text, (er.statement_start_offset/2)+1,
((CASE er.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE er.statement_end_offset
END – er.statement_start_offset)/2) + 1) AS statement_text,
page_info.database_id,page_info.[file_id], page_info.page_id, page_info.[object_id],
page_info.index_id, page_info.page_type_desc
FROM sys.dm_exec_requests AS er
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) AS st
CROSS APPLY sys.fn_PageResCracker (er.page_resource) AS r
CROSS APPLY sys.dm_db_page_info(r.[db_id], r.[file_id], r.page_id, ‘DETAILED’) AS page_info
WHERE er.wait_type like ‘%page%’
GO

Problem 4: Version Store growth in read-only secondary replica

If you have Availability Group setup for your SQL Server instance, make sure to monitor version store growth in Tempdb database on secondary replica. In addition to following query, you can also monitor this with performance counters like Version Store Size (KB), Version Cleanup rate (KB/s) and Version Generation rate (KB/s). Here is an interesting write up about this problem on AG secondaries.

— Version store usage
SELECT DB_NAME(database_id) AS [database_name], reserved_page_count, reserved_space_kb
FROM sys.dm_tran_version_store_space_usage
ORDER BY reserved_space_kb DESC;

IMPROVEMENTS IN TEMPDB

  1. Optimizing SQL Server Eager Writes

One long standing issue with TempDB in the past was that when you perform SELECT INTO temporary table operation, poor performance on disk I/O occurred in tempdb database. This was resolved in SQL Server 2014 RTM and back ported to SQL Server 2012 SP2 CU1. SQL Server has a concept of Eager Writes that prevent flooding the buffer pool with pages that are newly created from bulk activities and need to be written to disk. SQL Server 2014 enhancements now relaxes the need to flush these pages to the tempdb data files as quickly as the older versions because it recognizes objects created in tempdb would be temporary.

2. Tempdb Installation Defaults

Having multiple tempdb data files to remove PFS and SGAM page contentions has been long standing recommendations from Microsoft. Prior to SQL Server 2016, DBAs would manually add the additional TempDB data files after installing SQL Servers. and enable trace flags 1118 and 1117 to evenly grow these files. SQL Server 2016 installation now includes this step in the setup. It recognizes the number of CPU cores and based on that gives the option of creating multiple files. You can change the no. of files, file location etc. Also there is no need to use the previously mentioned trace flags, as this has now became the default behavior of TempDB database.

3. Tempdb system tables move to In-Memory OLTP

Once again SQL Server 2019 brought a huge improvement to TempDB performance. You can enable “Optimize for Tempdb Metadata” feature and it will put all the TempDB metadata (system tables) in memory. This takes away all type of pagelatch_XX wait types. When temporary objects are created and removed at super fast pace, due to huge improvements in hardware, disk is no longer the bottleneck for these operations. These metadata objects reside in RAM (memory) now.

TEMPDB BEST PRACTICES

  1. Configure multiple tempdb data files. Start with 8 if you have 8 or more logical cores. Then add more in chunks of 4 if you still see PFS contention. Multiple files are beneficial even if all are on the same drive which is dedicated for Tempdb.
  2. Enable trace flag 1118 as a startup trace flag for SQL Server prior to SQL Server 2016. This prevents SGAM contentions.
  3. Enable trace flag 1117 as a startup trace flag for SQL Server prior to SQL Server 2016. This makes multiple tempdb files to grow equally.
  4. Always put the tempdb data and log files on fastest disk. Have it local to the machine rather than in SAN. It keeps your SAN path free for user database traffic
  5. Put Tempdb database on a separate volume then your user databases.
  6. Just pre-allocate the tempdb files as large as possible, with enough room to give you time to react to unforeseen scenarios such as SQL Server encounters long running query.
  7. Do not explicitly drop temp tables at the end of a stored procedure, they will get cleaned up when the session that created them ends.
  8. Do not alter temp tables after they have been created.
  9. Do not truncate temp tables.
  10. Move index creation statements on temp tables to the new inline index creation syntax that was introduced in SQL Server 2014.

SUMMARY

The key to keep your TempDB healthy is proactive and continuous monitoring with DMVs and XEvents. Setup up proper alerting to notify when space usage reaches certain threshold. Document the TempDB configurations into your SOP (Standard Operating Procedure) documentation to be consistent across all your SQL Servers. Keep an eye on version store usage and be careful about their growth on read-only secondary replicas. Consider using Memory Optimized Tempdb Metadata feature of SQL Server 2019.

Comments on: "TempDB Bottlenecks and More" (3)

  1. […] Ameena Lalani shares some good info on the tempdb database: […]

  2. I run the script and it has error:

    you can see that , this ‘ is not exists in the :

    .[resource_description])) – 1)

    it is the – sign does matter now.

    Msg 102, Level 15, State 1, Line 38
    Incorrect syntax near ‘–’.

    Completion time: 2020-10-21T11:49:47.2018162+08:00

    any reason for it?

    and for the script:

    SELECT
    [owt].[session_id], [owt].[exec_context_id],
    [owt].[wait_duration_ms], [owt].[wait_type],
    [owt].[blocking_session_id], [owt].[resource_description],
    CASE [owt].[wait_type]
    WHEN N’CXPACKET’ THEN
    RIGHT ([owt].[resource_description],
    CHARINDEX (N’=’, REVERSE ([owt].[resource_description])) – 1)
    ELSE NULL
    END AS [Node ID],
    [es].[program_name], [est].text,
    [er].[database_id], [eqp].[query_plan],
    [er].[cpu_time]
    FROM sys.dm_os_waiting_tasks [owt]
    INNER JOIN sys.dm_exec_sessions [es] ON
    [owt].[session_id] = [es].[session_id]
    INNER JOIN sys.dm_exec_requests [er] ON
    [es].[session_id] = [er].[session_id]
    OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
    OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
    WHERE
    [es].[is_user_process] = 1
    ORDER BY
    [owt].[session_id], [owt].[exec_context_id];
    GO

    why need – 1 ?

    • You need minus for following reason. Brushing up on TSQL string functions may help here.
      Declare @name Varchar(30)
      SET @name=’Ameena Lalani’
      SELECT LEFT(@name, CHARINDEX(‘ ‘,@name)-1)
      SELECT RIGHT(@name, LEN(@name)-CHARINDEX(‘ ‘,@name))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: