A closer look at everything related to SQL Server

I attended PASS Summit 2019 in November last year, 6 months ago. Nice thing was that I took all my notes in OneNote which I had just recently started using. Then life got into my way and I totally forgot about it. Then COVID-19 happened and I still did not remember till today. So here it is.

This time, I didn’t purchase any PreCon. It is not because there were no good ones but the opposite that there were so many that I worried I will pay the steep price for one and it will not be as expected or I would wish later on that I had attended the other one. So I made a cowardly decision and went to none.

But I did enjoy spending time with several friends and met lot of people from my previous work places. No breakfast this time at PASS was a bummer. They gave us $25 Starbucks gift card in lieu of 3 day of breakfasts. Just to let organizer know, if they are reading this, a modest breakfast in Seattle was priced around $10 in 2019. Another suggestion, if prices for PASS merchandises are lowered, for example a $60 t-shirt down to $30 (not sacrificing the quality), I think more people would buy them and take souvenirs home from PASS summit and net profit in the end will come out to be the same. I will keep the rest of the suggestions to myself.

Disclaimer: These are my notes taken while attending sessions and any inaccuracies are my own.

Session: SQL Server 2019 new tools

Speaker: Vicky Harp (SQL Server tools team in Microsoft)

This was all demos, no slides session.

Since 2016 tools and Engine connection was broken when SSMS became available as a separate install from database engine installation.

Azure Data Studio (ADS) is a new tool to communicate with SQL Server engine and is more powerful than SSMS in some ways. But SSMS is going to stay. It is purely for SQL Server on Windows platform and it is a 32 bit application.

ADS supports hybrid environment and platform. Your job is to secure, HA and DR your data and not learn every new technology that comes your way. ADS serves this purpose very well. It is a one stop shop kind of tool. You can connect to windows, Linux and can even connect from your non windows client such as a Mac Book. It natively supports beside TSQL, PowerShell, Python, Scala, Spark, Java etc.

Notebook in ADS is another great feature. You can run the query typed in notebook and result and comments will be saved in the same notebook. You can create complete solution in it and can email this notebook to your colleague. You can open notebook in any other compatible tools such as SSMS. You can create SOP documents for your helpdesk team. It is not compatible with OneNote yet. But discussions on this subject is in progress.

There are lot of extensions available in ADS which further improves its usefulness. Extensions such as SQK server Agent, Admin Pack for SQL Server and SQL Server Profiler to name few are available. Each extension has multiple tasks.

SQL Server Edge running on IOT devices can analyze IOT data on the device itself instead of sending it to some server to analyze. Stream Analytics is used for this purpose.

Session: Working with SQL Server Containers

Speaker: Carlos Robles

Demos: C:\Users\amlalani\OneDrive\Training\PASS Summit 2019\SQL Server Containers

Why containers?

  • Containers are standard unit of software.
  • They provide security and isolation.
  • They are lightweight, portable and cross platform.
  • They are consistent and predictable and no surprises
  • Continuous Integration (CI) and Continuous Deliver (CD)
  • eliminates shared environments and resource contentions.

SQL Server as Application:

Before difficult to integrate, operational complexity,

Now Docker Compose uses YAML file for deployment

CI – Commit -> Trigger -> Notify -> Test

CD – Deliver -> Deploy

Jenkins is open source Automation Server. Pipelines are set of jobs.

Session: Intelligent Query and Performance

Speaker: Pedro Lopez and Joe Sack

  1. Intelligent DB vision:
    1. Anxiety free upgrades
    2. Don’t worry about regressions
    3. Query tuning assistance tools
  2. Performance IQP
    1. Resource Governor
    2. Soft NUMA became default in SQL Server 2019
    3. Many new improvements in newer version are not advertized so by just upgrading you get benefits out of the box.
  3. Management by default
    1. Accelerated Database Recovery ADR
    2. Online Resumable index creation
    3. Lightweight Profiling
    4. Automatic SQL
    5. Threat Protection
    6. GDPR questions can be answered with classification and vulnerability assessment tools

Memory Grant Feedback MGF:

Available in compatibility mode 140+

Memory misestimation can result in spills to disk and overestimation hurt concurrency.

If less than 50% memory need is granted, resulting in spills, then MGF will fix and add memory plus some buffer.

In 2017, Batch mode was introduced and was available for column-store indexes only.

In 2019, row-store Batch mode is available for all index types; row and column.

  1. Does Query optimizer has to do lot of work to first check if Query is eligible for Batch mode execution?

MFG has to learn again if plan is kicked out of cache for some reason.

Batch Mode Columnstore:

Built for Analytical workload and IO bound queries. Columnstore provides effective compression over traditional rowstore.

Rowstore Batch Mode:

If you cannot add Columnstore for some reason, maybe vendor does not support or table has trigger then Rowstore Batch Mode will come in handy. Adds new scan operator. Same for Heap and B-trees tables. Great for CPU bound queries.

Interleave MSTVFs:

Query processor treats it as a black box because it materializes at runtime only and QP cannot make any decision about the plan. SQL 2016 and prior version guesstimated 100 rows. Inefficient if 1000 or 10,000 rows actually returned. SQL 2017 + assumes 500 rows for 1st run and then adjusts. Downstream operations will benefit.

Batch Mode Adaptive Join (AJ):

If cardinality Estimate is wrong, wrong join operator will be selected in a plan which will be cached.

AJ will defer the choice of Hash or Nested join until after the first join input has been scanned.

Resource: aka.ms/IQP

Session: Data Classification

Speaker: Grant Fritchey

Security and compliance requirements are undeniable. It should be first and foremost.

Regulatory Landscape:

GDPR: European. Marriott and British Airways were fined. Half of UK firms are still not GDPR compliant.

FERPA: Department of Education law in US to protect adult student from parents. They don’t share any information even if the student is dependent and in hospital or sick.

HIPPA: Health regulation for medical industry. If a person has shared PII data knowingly they could go to jail.

CCPA: California flavor of GDPR regulation and will be enforced on Jan 1, 2020.

All of them have room for interpretation and none of them is perfect.

Session: Successfully communicate with Customer

Speaker: Dennise McInevney

Results of Bad communications:

Project Fails

Career hurt or ends

Frustration and stress

Time and Money wasted

  1. Empathy: Walk in your customer shoes. See their pain from their eyes. See how they use your product. See where can you help them.
  2. Us vs them mentality: Be one team. Make sure customer understands that you are on their side by you language and attitude.
  3. Talk/Speech: Be concise, clear and to the point in written and verbal communication
  4. Writing Skills:

Who are your customers? Identify them and identify their customers. Understand goals of both groups.

Requirements (How of things)  <  >   Goals (What of things)

  1. Understand customer goals first.
  2. Desired outcome second
  3. Understand business context

Don’t

Do

Jump to Solution

Listen carefully

Make assumptions

Ask Questions

Start designing

Be precise

Diagramming/coding

Identify approver first

Write down everything in a place easily accessible.

Don’t

Do

Email

Wiki\SharePoint

Chat

Agile Tools like Jira which keeps the history

Hallway conversation

Shared document

Get Alignment

Get Sign off

Ensure clarity

Don’t skip

Pick up phone when necessary

Joint accountability

Record the outcome in writing

If you are bilingual and can speak customer language, that is a great skill.

What you can do?

  • Practice Active listening
  • Don’t multitask
  • Check for understanding
  • Summarize
  • Review the facts

Change is inevitable

Don’t

Do

Stress Out

Take a deep breath

Get Mad

Ask questions

Cast blame

Update requirement

Managing Expectation:

After things are signed off, we need to manage the expectations such as how often we will have meetings, how status of projects will be reported and such.

Reporting Status:

  • Consistent format and time
  • If you want others to follow the reporting tool, make sure you are seriously using it.
  • No surprises
  • Alert if not working as expected, missed dependencies or duration.

Show as you Go:

Sketches, diagrams, demos at regular cadence and be open to feedback.

Deadlines:

Keep calm, communicate progress, what is okay to release , negotiate, trade off can happen.

Session: Adaptive QP to Intelligent QP

Speaker: Hugo Kornelis

SQLServerFast.com

http://sqlserverfast.com/blog

@Hugo_Kornelis

Query Processing (QP)

  • Phase 1 – Optimization (Cardinality Estimates, Execution Plan)
  • Phase 2 – Execution (Actual Estimates)

Merge is better if data is properly sorted. Puts everything in first input in memory.

Nested loop reads 1st row and start finding matching row in second source.

AJ is new join type and it has 3 inputs.

Only problem is AJ for Batch mode which was for columnstore index.

Interleaved Execution:

MSTVF – don’t know how big result is going to be

1 or 100 is hard-coded value and can never be right. In 2017, as soon as QP detect MSTVF, it paused the execution and starts another process that materialize the MSTVF first. You get the number of rows and now QP can give you the correct plan. But this plan is now cached and next time MSTVF comes with different parameter, it may not work efficiently. So the problem remains.

QP in SQL Server 2019:

Memory Grant Feedback (MGF):

Determines during compile time base on cardinality estimate. If underestimate, spills to the disk (tempdb). If overestimates, wasted memory and impacts concurrency.

First execution of MGF improves 2nd execution.

  • Adjusts down if > 50% memory unused the first time
  • Adjusts up after any SPILL + some buffer.

Stores last exec plan, forgets after recompile. After adapting, certain time for example x threshold. If more than that it stops adapting.

Extended events are available to track this info.

*** Check this demo out and put it in PTO workshop ***

UDF Inlining:

FROID is a framework that plugs in optimizer before optimization takes place. It looks at all the statements in UDF and converts into relational table and sends to QP which then generates the plan.

Session: Batch Mode on Rowstore

Speaker: Nikko Neuberger

Speaker Tip #1: Telling story and showing pictures and cartoons and bold words on screen worked very good with audience. He got laughs and attention (5 minutes)

Speaker Tip #2: If demo fails, keep talking while working and may refer to Demo GOD.

Read execution plant Right to left and also left to Right.

Exec plan query executes from Left to Right.

Nikko displayed drama script of row execution. He got 3 attendees and had piles of pages in different color.

Sort operator asks for data page from table one at a time.

In batch mode, all data pages are given at the same time to sort operator saving time.

Rt click + Act Ex rows + Est Exec rows -> shows row mode

*** Ask MS to remove est execution because not adding value and cluttering the GUI.

Rt click + see Bath execution mode and storage as Rowstore

He made an interesting analogy comparing Batch mode vs Row mode. Fast cars like Ferrari or Mustang vs a bicycle. Batch mode need little more space to work as fast car need highway to go fast.

In a Demo1 of 150 vs 140, timing was 12 sec and 18 sec respectively. CPU in both execution was high but in 140 it was more IO.

In a Demo2, Batch mode on Rowstore did 11 sec and Batch mode on columnstore did 6.8 sec. Fantastic! Since table had columnstore index, batch mode kicks in by default.

Demo3 was about Batch Mode Heuristics. You can see waittype of Batch_mode_scan_on_rowstore in an extended event sessions for Full batch when your run following queries.

  • Select count of rows in a table
  • Query table of 1 GB size
  • Query table of 10 GB size
  • Disable Batch mode and run query
  • Run query with TOP(20) and order by

In 2019, you will need columnstore because

  • Batch mode w/o columnstore index will not have compression
  • No advance storage engine features. Makes a difference in Logical Reads. Does the aggregation in memory instead of reading and bringing pages to query optimizer.
  • Columnstore index will always result in Optimization level = FULL
  • Hekaton disables Batch mode scan when columns with LOB types are present.

Session: Hash Match, the Operator (2.5 hour session)

Speaker: Hugo Kornelis

Speaker tip: He engaged audience just like Nikko did in his session. The dramatization in the beginning really catches audience attention. He had roller coaster rides picture and a story to go with it and he gave credits to all the pictures he used. Nice way to introduce the topic.

He said he is going to discuss 500 lever content of deep dive internals of operators and some of them are not documented. Such as hashing, hashing function, hashing table etc.

*** Download demos from fastsqlserver.com

Hashing … Laundry that needs sorting. But in Query optimizer world hashing is not sorting it is categorizing or bucketizing similar things.

  • Input  — Any
  • Output — Integer 0 – 9999
  • Deterministic — same input gets same output values
  • Hash Collision — Different inputs getting same output values.

Hashing function — organized by hash value or heap or bucket. It stores in memory entirely.

  1. There are 7 pointers for each hash value.
  2. Store 6 pointers in each structure
  3. Create and connect new 7-pointer structure

3 phases for hash Table match

  1. Build — Always used, process top input, builds hash table
  2. Probe — not always used, process second input if there is one match, detects has collision.
  3. Final — Returns data from Hash Table

Look at hash match operator properties in execution plan, you will see

Probe residual — will appear only if there are hash collision. Upto 4 byte of data, SQL Server gaurantees to return without Probe residual. It is needed when any operation on the column is done, when any non-integer data is present and when integer data is less than 4 byte. Impact of Probe residual on performance is not expensive. It is just one extra test.

Build Phase — Assuming no spills to the disk and all hash table fits in the memory.

  • Inner Join — Most simplest. Computes the hash and goes into bucket and finds the match, returns a joined row and goes to work with another row.
  • Rt. Outer Join — If no matching row is found, it returns rows from the outer table.
  • Both join type cause blocking in the build phase. There is no Final phase for them. Order of row is mostly retained but not guaranteed in real world where spills may occur.

Probe Phase

  • Lt. outer Join\Full outer Join — Marks the matched rows and returns the unmatched rows from the hash table. Adds a Final Phase for “Left returned” rows. No order for these rows is retained.
  • Lt. Semi Join (Exists) — Returns all the marked and matched rows from the hash table. No Final phase for them.
  • Lt. Anti Semi Join (not Exists) — returns all the unmarked and unmatched rows. Both semi and anti semi incur blocking in build and probe phase. All the output is produced in the Final Phase.
  • Rt. Semi Join — returns rows from Probe. Will not search for more rows when a first match is found.
  • Rt. Anti Semi Join — No match found return and no additional search is done. No final phase.

Hash match can also have 1 input.

Build Phase: (for aggregate)

Instead of adding to bucket, it searches if row already exists and does only update to the bucket. If no row found, adds a row and update the aggregation. This row is a place holder.

Final Phase — returns all aggregated rows from the hash table. There is no probe phase for it.

Example:

  • Select distinct(city) from customer — has match flow of distinct
  • Select city, count(*) from customer group by city — does hash aggregate

Hash Mach Scalar Aggregate:

Official name is Hash Match (Aggregate).

Stream aggregate more suitable for Select count(*) from table type of queries and is not supported in Batch mode. There is no Build Phase.

Hash match partial Aggregate:

Labelled differently in parallel parts of execution plan. There is no Build Phase. In compatibility mode 120 you can see this type of aggregate. No more visible in Modern SQL versions.

Hash Memory Usage:

Memory requested at the beginning of execution. Engine reserves at least requested memory. If requested memory not available, it waits. Tries to reserve up to requested memory before beginning execution. It does not provide a way to request additional memory during execution and hence when memory is insufficient, operators spell to tempdb on disk.

  • SQL Server 2016, batch mode may get extra memory during execution.
  • SQL Server 2017, batch mode on columnstore may get extra memory during execution
  • SQL Server 2019, all execution mode get extra memory during query execution.

Dynamic De-staging:

Happens when hash operator spills to disk.

  1. In-memory HJ — Never short of memory and everything fits nicely.
  2. Hybrid HJ — combines in-memory and Grace HJ. When out of memory, partitions buckets. If it is complex, then after 1st build and probe phases, does same as Grace HJ.
  3. Grace HJ — Algorithm is used when amount of data is more than the known memory. Divides set of buckets into partitions. Eg; buckets = 1 – 20,000. 5000 buckets at a time (hash partition function 1 – 4), 2 files for each partition is created. SQL server uses tempdb. They are actually structures in tempdb. Reads both inputs, stores rows in appropriate file. 1 file for build, 1 file for probe. Process each file individually.
  4. Recursive HJ — When Grace HJ mis-estimates partition size.
  • Dynamic Staging HM starts in memory.
  • Bail out happens when bucket size is larger than memory available. It abandons  further processing. HM then goes to Nested Loops or Sort and Merge join. Both are inefficient in this case.
  • Bit Vector filtering — is good. Always used when spills happen. In-memory bit map for has values of spilled partitions.
    1. Build: set bit for hash result of rows
    2. Probe: Check if bit is set for hash
      1. If set – write to file on disk
      2. If not – handle unmatched, don’t write to file

Memory Grant Computation:

Est size (row count + row size)

Note: For Var(char) data type it assumes to be 50% full

Est size (row count + row size) of probe input

Execution plan with multiple match operator.

Session: Query Store In-Depth

Speaker: Ola Hallengren

Query Store (QS) available since SQL Server 2016. It is like a flight recorder of query execution.

DBCC CLONEDATABASE includes query store Data.

Force a last know good plan after changing the compatibility level or cardinality estimation.

Disclaimer: These are my notes taken while attending sessions and any inaccuracies are my own.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: