A closer look at everything related to SQL Server

I attended PASS Summit 2019 in November last year, 6 months ago. Nice thing was that I took all my notes in OneNote which I had just recently started using. Then life got into my way and I totally forgot about it. Then COVID-19 happened and I still did not remember till today. So here it is.

This time, I didn’t purchase any PreCon. It is not because there were no good ones but the opposite that there were so many that I worried I will pay the steep price for one and it will not be as expected or I would wish later on that I had attended the other one. So I made a cowardly decision and went to none.

But I did enjoy spending time with several friends and met lot of people from my previous work places. No breakfast this time at PASS was a bummer. They gave us $25 Starbucks gift card in lieu of 3 day of breakfasts. Just to let organizer know, if they are reading this, a modest breakfast in Seattle was priced around $10 in 2019. Another suggestion, if prices for PASS merchandises are lowered, for example a $60 t-shirt down to $30 (not sacrificing the quality), I think more people would buy them and take souvenirs home from PASS summit and net profit in the end will come out to be the same. I will keep the rest of the suggestions to myself.

Disclaimer: These are my notes taken while attending sessions and any inaccuracies are my own.

Session: SQL Server 2019 new tools

Speaker: Vicky Harp (SQL Server tools team in Microsoft)

This was all demos, no slides session.

Since 2016 tools and Engine connection was broken when SSMS became available as a separate install from database engine installation.

Azure Data Studio (ADS) is a new tool to communicate with SQL Server engine and is more powerful than SSMS in some ways. But SSMS is going to stay. It is purely for SQL Server on Windows platform and it is a 32 bit application.

ADS supports hybrid environment and platform. Your job is to secure, HA and DR your data and not learn every new technology that comes your way. ADS serves this purpose very well. It is a one stop shop kind of tool. You can connect to windows, Linux and can even connect from your non windows client such as a Mac Book. It natively supports beside TSQL, PowerShell, Python, Scala, Spark, Java etc.

Notebook in ADS is another great feature. You can run the query typed in notebook and result and comments will be saved in the same notebook. You can create complete solution in it and can email this notebook to your colleague. You can open notebook in any other compatible tools such as SSMS. You can create SOP documents for your helpdesk team. It is not compatible with OneNote yet. But discussions on this subject is in progress.

There are lot of extensions available in ADS which further improves its usefulness. Extensions such as SQK server Agent, Admin Pack for SQL Server and SQL Server Profiler to name few are available. Each extension has multiple tasks.

SQL Server Edge running on IOT devices can analyze IOT data on the device itself instead of sending it to some server to analyze. Stream Analytics is used for this purpose.

Session: Working with SQL Server Containers

Speaker: Carlos Robles

Demos: C:\Users\amlalani\OneDrive\Training\PASS Summit 2019\SQL Server Containers

Why containers?

  • Containers are standard unit of software.
  • They provide security and isolation.
  • They are lightweight, portable and cross platform.
  • They are consistent and predictable and no surprises
  • Continuous Integration (CI) and Continuous Deliver (CD)
  • eliminates shared environments and resource contentions.

SQL Server as Application:

Before difficult to integrate, operational complexity,

Now Docker Compose uses YAML file for deployment

CI – Commit -> Trigger -> Notify -> Test

CD – Deliver -> Deploy

Jenkins is open source Automation Server. Pipelines are set of jobs.

Session: Intelligent Query and Performance

Speaker: Pedro Lopez and Joe Sack

  1. Intelligent DB vision:
    1. Anxiety free upgrades
    2. Don’t worry about regressions
    3. Query tuning assistance tools
  2. Performance IQP
    1. Resource Governor
    2. Soft NUMA became default in SQL Server 2019
    3. Many new improvements in newer version are not advertized so by just upgrading you get benefits out of the box.
  3. Management by default
    1. Accelerated Database Recovery ADR
    2. Online Resumable index creation
    3. Lightweight Profiling
    4. Automatic SQL
    5. Threat Protection
    6. GDPR questions can be answered with classification and vulnerability assessment tools

Memory Grant Feedback MGF:

Available in compatibility mode 140+

Memory misestimation can result in spills to disk and overestimation hurt concurrency.

If less than 50% memory need is granted, resulting in spills, then MGF will fix and add memory plus some buffer.

In 2017, Batch mode was introduced and was available for column-store indexes only.

In 2019, row-store Batch mode is available for all index types; row and column.

  1. Does Query optimizer has to do lot of work to first check if Query is eligible for Batch mode execution?

MFG has to learn again if plan is kicked out of cache for some reason.

Batch Mode Columnstore:

Built for Analytical workload and IO bound queries. Columnstore provides effective compression over traditional rowstore.

Rowstore Batch Mode:

If you cannot add Columnstore for some reason, maybe vendor does not support or table has trigger then Rowstore Batch Mode will come in handy. Adds new scan operator. Same for Heap and B-trees tables. Great for CPU bound queries.

Interleave MSTVFs:

Query processor treats it as a black box because it materializes at runtime only and QP cannot make any decision about the plan. SQL 2016 and prior version guesstimated 100 rows. Inefficient if 1000 or 10,000 rows actually returned. SQL 2017 + assumes 500 rows for 1st run and then adjusts. Downstream operations will benefit.

Batch Mode Adaptive Join (AJ):

If cardinality Estimate is wrong, wrong join operator will be selected in a plan which will be cached.

AJ will defer the choice of Hash or Nested join until after the first join input has been scanned.

Resource: aka.ms/IQP

Session: Data Classification

Speaker: Grant Fritchey

Security and compliance requirements are undeniable. It should be first and foremost.

Regulatory Landscape:

GDPR: European. Marriott and British Airways were fined. Half of UK firms are still not GDPR compliant.

FERPA: Department of Education law in US to protect adult student from parents. They don’t share any information even if the student is dependent and in hospital or sick.

HIPPA: Health regulation for medical industry. If a person has shared PII data knowingly they could go to jail.

CCPA: California flavor of GDPR regulation and will be enforced on Jan 1, 2020.

All of them have room for interpretation and none of them is perfect.

Session: Successfully communicate with Customer

Speaker: Dennise McInevney

Results of Bad communications:

Project Fails

Career hurt or ends

Frustration and stress

Time and Money wasted

  1. Empathy: Walk in your customer shoes. See their pain from their eyes. See how they use your product. See where can you help them.
  2. Us vs them mentality: Be one team. Make sure customer understands that you are on their side by you language and attitude.
  3. Talk/Speech: Be concise, clear and to the point in written and verbal communication
  4. Writing Skills:

Who are your customers? Identify them and identify their customers. Understand goals of both groups.

Requirements (How of things)  <  >   Goals (What of things)

  1. Understand customer goals first.
  2. Desired outcome second
  3. Understand business context

Don’t

Do

Jump to Solution

Listen carefully

Make assumptions

Ask Questions

Start designing

Be precise

Diagramming/coding

Identify approver first

Write down everything in a place easily accessible.

Don’t

Do

Email

Wiki\SharePoint

Chat

Agile Tools like Jira which keeps the history

Hallway conversation

Shared document

Get Alignment

Get Sign off

Ensure clarity

Don’t skip

Pick up phone when necessary

Joint accountability

Record the outcome in writing

If you are bilingual and can speak customer language, that is a great skill.

What you can do?

  • Practice Active listening
  • Don’t multitask
  • Check for understanding
  • Summarize
  • Review the facts

Change is inevitable

Don’t

Do

Stress Out

Take a deep breath

Get Mad

Ask questions

Cast blame

Update requirement

Managing Expectation:

After things are signed off, we need to manage the expectations such as how often we will have meetings, how status of projects will be reported and such.

Reporting Status:

  • Consistent format and time
  • If you want others to follow the reporting tool, make sure you are seriously using it.
  • No surprises
  • Alert if not working as expected, missed dependencies or duration.

Show as you Go:

Sketches, diagrams, demos at regular cadence and be open to feedback.

Deadlines:

Keep calm, communicate progress, what is okay to release , negotiate, trade off can happen.

Session: Adaptive QP to Intelligent QP

Speaker: Hugo Kornelis

SQLServerFast.com

http://sqlserverfast.com/blog

@Hugo_Kornelis

Query Processing (QP)

  • Phase 1 – Optimization (Cardinality Estimates, Execution Plan)
  • Phase 2 – Execution (Actual Estimates)

Merge is better if data is properly sorted. Puts everything in first input in memory.

Nested loop reads 1st row and start finding matching row in second source.

AJ is new join type and it has 3 inputs.

Only problem is AJ for Batch mode which was for columnstore index.

Interleaved Execution:

MSTVF – don’t know how big result is going to be

1 or 100 is hard-coded value and can never be right. In 2017, as soon as QP detect MSTVF, it paused the execution and starts another process that materialize the MSTVF first. You get the number of rows and now QP can give you the correct plan. But this plan is now cached and next time MSTVF comes with different parameter, it may not work efficiently. So the problem remains.

QP in SQL Server 2019:

Memory Grant Feedback (MGF):

Determines during compile time base on cardinality estimate. If underestimate, spills to the disk (tempdb). If overestimates, wasted memory and impacts concurrency.

First execution of MGF improves 2nd execution.

  • Adjusts down if > 50% memory unused the first time
  • Adjusts up after any SPILL + some buffer.

Stores last exec plan, forgets after recompile. After adapting, certain time for example x threshold. If more than that it stops adapting.

Extended events are available to track this info.

*** Check this demo out and put it in PTO workshop ***

UDF Inlining:

FROID is a framework that plugs in optimizer before optimization takes place. It looks at all the statements in UDF and converts into relational table and sends to QP which then generates the plan.

Session: Batch Mode on Rowstore

Speaker: Nikko Neuberger

Speaker Tip #1: Telling story and showing pictures and cartoons and bold words on screen worked very good with audience. He got laughs and attention (5 minutes)

Speaker Tip #2: If demo fails, keep talking while working and may refer to Demo GOD.

Read execution plant Right to left and also left to Right.

Exec plan query executes from Left to Right.

Nikko displayed drama script of row execution. He got 3 attendees and had piles of pages in different color.

Sort operator asks for data page from table one at a time.

In batch mode, all data pages are given at the same time to sort operator saving time.

Rt click + Act Ex rows + Est Exec rows -> shows row mode

*** Ask MS to remove est execution because not adding value and cluttering the GUI.

Rt click + see Bath execution mode and storage as Rowstore

He made an interesting analogy comparing Batch mode vs Row mode. Fast cars like Ferrari or Mustang vs a bicycle. Batch mode need little more space to work as fast car need highway to go fast.

In a Demo1 of 150 vs 140, timing was 12 sec and 18 sec respectively. CPU in both execution was high but in 140 it was more IO.

In a Demo2, Batch mode on Rowstore did 11 sec and Batch mode on columnstore did 6.8 sec. Fantastic! Since table had columnstore index, batch mode kicks in by default.

Demo3 was about Batch Mode Heuristics. You can see waittype of Batch_mode_scan_on_rowstore in an extended event sessions for Full batch when your run following queries.

  • Select count of rows in a table
  • Query table of 1 GB size
  • Query table of 10 GB size
  • Disable Batch mode and run query
  • Run query with TOP(20) and order by

In 2019, you will need columnstore because

  • Batch mode w/o columnstore index will not have compression
  • No advance storage engine features. Makes a difference in Logical Reads. Does the aggregation in memory instead of reading and bringing pages to query optimizer.
  • Columnstore index will always result in Optimization level = FULL
  • Hekaton disables Batch mode scan when columns with LOB types are present.

Session: Hash Match, the Operator (2.5 hour session)

Speaker: Hugo Kornelis

Speaker tip: He engaged audience just like Nikko did in his session. The dramatization in the beginning really catches audience attention. He had roller coaster rides picture and a story to go with it and he gave credits to all the pictures he used. Nice way to introduce the topic.

He said he is going to discuss 500 lever content of deep dive internals of operators and some of them are not documented. Such as hashing, hashing function, hashing table etc.

*** Download demos from fastsqlserver.com

Hashing … Laundry that needs sorting. But in Query optimizer world hashing is not sorting it is categorizing or bucketizing similar things.

  • Input  — Any
  • Output — Integer 0 – 9999
  • Deterministic — same input gets same output values
  • Hash Collision — Different inputs getting same output values.

Hashing function — organized by hash value or heap or bucket. It stores in memory entirely.

  1. There are 7 pointers for each hash value.
  2. Store 6 pointers in each structure
  3. Create and connect new 7-pointer structure

3 phases for hash Table match

  1. Build — Always used, process top input, builds hash table
  2. Probe — not always used, process second input if there is one match, detects has collision.
  3. Final — Returns data from Hash Table

Look at hash match operator properties in execution plan, you will see

Probe residual — will appear only if there are hash collision. Upto 4 byte of data, SQL Server gaurantees to return without Probe residual. It is needed when any operation on the column is done, when any non-integer data is present and when integer data is less than 4 byte. Impact of Probe residual on performance is not expensive. It is just one extra test.

Build Phase — Assuming no spills to the disk and all hash table fits in the memory.

  • Inner Join — Most simplest. Computes the hash and goes into bucket and finds the match, returns a joined row and goes to work with another row.
  • Rt. Outer Join — If no matching row is found, it returns rows from the outer table.
  • Both join type cause blocking in the build phase. There is no Final phase for them. Order of row is mostly retained but not guaranteed in real world where spills may occur.

Probe Phase

  • Lt. outer Join\Full outer Join — Marks the matched rows and returns the unmatched rows from the hash table. Adds a Final Phase for “Left returned” rows. No order for these rows is retained.
  • Lt. Semi Join (Exists) — Returns all the marked and matched rows from the hash table. No Final phase for them.
  • Lt. Anti Semi Join (not Exists) — returns all the unmarked and unmatched rows. Both semi and anti semi incur blocking in build and probe phase. All the output is produced in the Final Phase.
  • Rt. Semi Join — returns rows from Probe. Will not search for more rows when a first match is found.
  • Rt. Anti Semi Join — No match found return and no additional search is done. No final phase.

Hash match can also have 1 input.

Build Phase: (for aggregate)

Instead of adding to bucket, it searches if row already exists and does only update to the bucket. If no row found, adds a row and update the aggregation. This row is a place holder.

Final Phase — returns all aggregated rows from the hash table. There is no probe phase for it.

Example:

  • Select distinct(city) from customer — has match flow of distinct
  • Select city, count(*) from customer group by city — does hash aggregate

Hash Mach Scalar Aggregate:

Official name is Hash Match (Aggregate).

Stream aggregate more suitable for Select count(*) from table type of queries and is not supported in Batch mode. There is no Build Phase.

Hash match partial Aggregate:

Labelled differently in parallel parts of execution plan. There is no Build Phase. In compatibility mode 120 you can see this type of aggregate. No more visible in Modern SQL versions.

Hash Memory Usage:

Memory requested at the beginning of execution. Engine reserves at least requested memory. If requested memory not available, it waits. Tries to reserve up to requested memory before beginning execution. It does not provide a way to request additional memory during execution and hence when memory is insufficient, operators spell to tempdb on disk.

  • SQL Server 2016, batch mode may get extra memory during execution.
  • SQL Server 2017, batch mode on columnstore may get extra memory during execution
  • SQL Server 2019, all execution mode get extra memory during query execution.

Dynamic De-staging:

Happens when hash operator spills to disk.

  1. In-memory HJ — Never short of memory and everything fits nicely.
  2. Hybrid HJ — combines in-memory and Grace HJ. When out of memory, partitions buckets. If it is complex, then after 1st build and probe phases, does same as Grace HJ.
  3. Grace HJ — Algorithm is used when amount of data is more than the known memory. Divides set of buckets into partitions. Eg; buckets = 1 – 20,000. 5000 buckets at a time (hash partition function 1 – 4), 2 files for each partition is created. SQL server uses tempdb. They are actually structures in tempdb. Reads both inputs, stores rows in appropriate file. 1 file for build, 1 file for probe. Process each file individually.
  4. Recursive HJ — When Grace HJ mis-estimates partition size.
  • Dynamic Staging HM starts in memory.
  • Bail out happens when bucket size is larger than memory available. It abandons  further processing. HM then goes to Nested Loops or Sort and Merge join. Both are inefficient in this case.
  • Bit Vector filtering — is good. Always used when spills happen. In-memory bit map for has values of spilled partitions.
    1. Build: set bit for hash result of rows
    2. Probe: Check if bit is set for hash
      1. If set – write to file on disk
      2. If not – handle unmatched, don’t write to file

Memory Grant Computation:

Est size (row count + row size)

Note: For Var(char) data type it assumes to be 50% full

Est size (row count + row size) of probe input

Execution plan with multiple match operator.

Session: Query Store In-Depth

Speaker: Ola Hallengren

Query Store (QS) available since SQL Server 2016. It is like a flight recorder of query execution.

DBCC CLONEDATABASE includes query store Data.

Force a last know good plan after changing the compatibility level or cardinality estimation.

Disclaimer: These are my notes taken while attending sessions and any inaccuracies are my own.

I attended the PASS (Professional Association of SQL Server) Summit in Early November. I attended 2 pre-conferences on Monday and Tuesday.

Monday: SQL Server Security by Denny Cherry

Tuesday: Fixing Query Performance Problems from Estimates, Statistics, Heuristics, and Cardinality by Kimberly Tripp

This time I also participated in Speaker Idol competition.

I attended the regular sessions on Wednesday, Thursday and Friday.

 

Here is the brief summary of what I learned.

Monday 11-05-2018 (pre-conference)

SQL Server Security

Speaker: Denny Cherry

SQL Server security does not begin with SQL Server itself. It begins at your organization’s infrastructure.

Network Design:

This depends on what a company want to use; public IP or private IP addresses. Private IPs became more important when cloud infrastructure came into the picture. In the past, hackers used to exploit the public IP\DNS names and would broadcast messages but this problem is mostly gone. Additional protection has been put on.

Private IPs: 192.168.0.0 typically used for home networks, 172.16.0.0 and, 10.0.0.0  typically used by business networks.

Public IPs: Any IP address not part of the Private IP address is a public IP.

VLANS: Virtual Local Area Networks. This is great for segmenting SQL Server network and saves money on switches. A downside is that things need to be configured correctly the first time. A good strategy is to block all ports on web servers, only 1433 open to talking to SQL Server.

ACL (access control list) works in tandem with firewalls. ACL works on new access, not on the existing one. If there is a lot of access to configure, it is better to use the firewall to block everything and poke holes (swiss cheese) to let the stuff through.

Switches have ports which are connected to email servers, database servers and web servers. On top of the firewall is NAT (network address translation) layer to allow to talk to machine outsized of your network.

There are 3 types of network design. Each has its own benefits and shortcomings.

  1. Everything behind the router
  2. Web servers in front of the router
  3. Web server in DMZ (VLAN)

On each server, configured to block all access but required services. Inbound firewall is blocked by default. For best practice block public IP from outbound traffic. No matter what port number you are using for SQL Server. Web server web config will have that information so hacker scanning will get it. If your web config is encrypted only then it is true protection from a hacker.

Accessing SQL Server via the internet is not recommended. There are lots of VPN options available. Desktop sharing app should be regularly monitored because an unknowing employee can leave a back door open after one of the session. Your first layer of security is VPN.

Physical Server Security:

How are people allowed into your data center located at your business? The cleaning crew has access? Is there dual authentication to enter? E.g.; pin + biometric. Does alarm sound if accessed outside of defined and approved times?

Screensaver on company computers to have the company’s log (enforced by group policy) is a good practice.

Social Engineering is the easiest way to gain access to data. Need to train your people who have access to data.

Cloud security is as secure as you want it to be. Inherently cloud is not more or less secure than on-premises environments. Some companies require firewalls from 2 different companies if one fails other kicks in. Some companies require logins to the cloud only work from the office. You can bypass default Azure routing and can define your own. Microsoft or any other vendor (AWS, Google) cannot see your data in the cloud. Data is encrypted at the storage level. Active Directory account is required to access your data. Microsoft especially is taking data security very seriously. The engineers who designed the system are not allowed in Data Centers where they can identify which servers hold which company’s data.

With increased security comes challenges. For example, California is implementing GDPR kind regulation in 2019. Everyone has right to be forgotten after website visit. It is a double edge sword. Now forgotten customer cannot pay their bills online.

Data Encryption Options:

  • Application Encryption: Could be .net encryption or could be Always Encrypted. AE is the best, data is protected all the way. Introduced in SQL server 2016. Only small change in the application connection string “Column Encryption Setting=enabled”
  • SSL Encryption: SQL Server offers SSL protection for transit data. No client change is needed. It is configured via SQL Server Configuration Manager. Availability group can use different certificates on each replica.
  • Database Encryption: This is commonly known as column level encryption. This type does not require application code change. All encryption workload is centralized on the database server. Great solution for protecting PII data.
  • Storage Encryption: TDE (transparent data encryption) encrypts data at rest (at disk level). Data within the table is still plaintext. Backup files are protected and cannot be restored on another server without the proper encryption key. TDE disables instant file initialization. TDE will not encrypt Filestream data. To protect Filestream use bit locker.

Data Theft is not a matter of if, it’s a matter of when. Check Ihavebeenpawned.com. The biggest concern for the company is the reputation.

Data Classification is based on your business and based on what data is stored.

  • General Data
  • Low Business Value data
  • Medium Business Value data
  • High Business Value data
  • PII

Business users have to identify the data classification. DBA should not be driving it. DBA help business to accomplish it. SSMS 17.5 has data classification available. It only looks at column names and does not comb the data.

How can you protect the data if someone takes the picture or screenshot of PII data? None of the encryption methods we discussed so far will work in this case.

 

Protecting Password:

.NET apps can encrypt the password in config file when the first time app connects to DB.

Domain policies are guaranteed to protect Windows accounts. There is no guarantee domain policies will protect SQL accounts.

SQL CLR runs in a separate memory space from SQL Server binaries. MS supports to run it in SAFE mode external. An unsafe assembly can call any managed code. CLR may have security issues, therefore, this feature is gone from Azure. Text manipulation works better in CLR then SQL.

To protect TDE keys, keep on-prem preferably few copies, easier central location. Other copy put on USB and give it HR to keep in a fire-proof safe. Never store with backup.

Anti-Virus:

SQL Server should not need an anti-virus running on the server. Anti-virus product can cause CPU issues, IO issues and RAM issues. Their behavior may change over time. Affects SQL server performance. To check if anti-virus is installed, Select * from sys.dm_os_loaded_module.

 

 

Tuesday, 11-06-2018 (pre-conference):

Fixing Query Performance Problems from Estimates, Statistics, Heuristics, and Cardinality

Speaker: Kimberly Tripp (of SQLSkills.com)

The first step is to find out that Query really has a problem. There are many tools out there to do that. This session is about what to do after that. The first step is to identify if it is a statistics not updated on the column issue or plan caching issue. This is very important because based on that you will apply the remedy.

In lots of instances, statistics are not a problem but gets blamed anyway. Sometimes caching and parametrizing is the problem.

Generalized and Simplified Query execution steps:

  • Parsing (syntax checking)
  • Standardization /algebrization (Optimizer looks at the predicate and pushes down the tree to select the correct index if exists)
  • Cost-based optimization (chooses the good plan faster not the best plan)
  • Compilation
  • Execution

Cost-Based Optimization: More statistics on a table is not a bad thing. Optimizer’s job is not to find the best plan but to find good plan faster. It sometimes re-writes your query to lower the cost. Calculates and estimates how much data it is going to process before processing it.

In the following example, the optimizer will use an index for the second query.

  • Where monthly_salary X  12 = Value  (maybe written in haste in an Adhoc query, cost higher)
  • Where monthly_salary = Value / 12 (uses index)

How the plan caching occurs? One way is when you are testing your procedure with certain parameters over and over and after your final change forget to clear the plan. The optimizer chooses (faster) existing plan in cache for different parameters and results in the performance problem.

Solution: use Execute procedure WITH Recompile. It recompiles the whole procedure, not just the statements inside the proc having trouble. Instead, use OPTION (Recompile) at the statement level. Less harsh.

A single query can have > 10 plans just because ANSI settings or spaces (SET OPTIONS) are different.

To find out session settings run this. Select * from sys.dm_exec_sessions where user_sessions =1

Selectivity: plan also varies based on what percent of data a query is returning. Lots of narrow non-clustered indexes are bad because they are non-selective. The tipping point between an index scan and seek is anywhere between ¼ pages to 1/3 pages of the table.

How to see the statistics?

  • Dbcc show_statistics
  • Sp_autostats tablename

Index density: If there are 10,000 rows and index is on last name column, say 400 rows return for a particular last name. But if an index is not narrow and contains last name and first name combo maybe 5 rows returned. It is more selective.

The histogram has 201 total steps (included 1 row for NULLs)

When the statistics were last updated?

Use stats_DATE (object_Id, index_id) in a query.

Demo: use the Credit database and Kimberly’s script to see the usage of statistics. If you see statistics has been used a lot then maybe a good idea to create an index on that column.

Do update statistics using FULL in off hours if possible. Sometimes updating on SAMPLE create more problems for the query. Sometimes full update is not possible on a Very Large Table. In that case, use filtered statistics. Never turn off auto update stats. But do run your own update statistics job regularly in a maintenance window. Auto update is there to catch if your job fails for some reason.

An index rebuild will update statistics and will invalidate plans in the cache. So be careful when you do it.

If you use a filtered index, use option recompile to use this index. You have to manage its update and it cannot depend on auto stats update.

 

Wednesday, 11-07-2018 sessions:

Session 2: Getting Better Query Plans by Improving SQL’s Estimates

Speaker: Brent Ozar

Abstract: You’ve been writing T-SQL queries for a few years now, and when you have performance issues, you’ve been updating stats and using OPTION (RECOMPILE). It’s served you well, but every now and then, you hit a problem you can’t solve. Your data’s been growing larger, your queries are taking longer to run, and you’re starting to wonder: how can I start getting better query plans? The secret is often comparing the query plan’s estimated number of rows to the actual number of rows. If they’re different, it’s up to you – not the SQL Server engine – to figure out why the guesses are wrong. To improve ’em, you can change your T-SQL, the way the data’s structured and stored, or how SQL Server thinks about the data. This session won’t fix every query – but it’ll give you a starting point to understand what you’re looking at, and where to go next as you learn about the Cardinality Estimator.

Prerequisites: You’re comfortable writing queries to get the data you want, but once you get a bad query plan, your only fix is OPTION RECOMPILE.

Error: spilling to disk- estimates are wrong maybe >10X magnitude.

Maxdop =1 option in the query for a single threaded demo.

SQL 2019 (latest)

Stackoverflow2013 DB. Open source scripts

Go to old compatibility mode. Get an estimated plan.

Statistics update help correct estimation.

If change the compatibility to SQL2017 still did not solve estimation error. It has new cardinality estimator (CE).

Changing to SQL2019 compatibility level, the same query resolves the spill problem. Estimate still wrong but memory grant is greater because it adds batch mode operator.

How you resolve it in 2017? Create statistics. Did not resolve. Because statistics was created on sample size, not full scan. Do full scan statistics, but Exec plan is same. So clean plan cache. Finally, it removes the spill error and correct estimate. But the query is still slow.

If you have a choice to create a statistic or index on a column, create index instead and will save you lot of work. Plus it will give you a sorted list.

Stream aggregate does the computation faster in buckets.

Read: Thomas Kejser, Ascending key problem

 

Thursday, 11-08-2018 sessions: 

Session 1: Containers, Pods, and Databases- The Future of Infrastructure

Speakers: Joseph D’Antoni

Notes:

Containers have quietly been taking over the world of infrastructure, especially amongst developers and CI/CD practitioners. However, in the database space, container adoption has been lower.

SQL Server 2017 introduced the concept of deploying databases into Docker containers. In this session, you will learn the fundamentals of creating containers, learning about Kubernetes for management, and how to further your learning in this new and emerging space.

Kubernates is still young but robust. MS and Google contributing. It is v 1.0 product, not beta. A code will have to be stateful so it can keep track of which node it ran. Kubernates pos can contain spark + SQL Server. (0% deployment will be on Linux.

A container is a standardized unit of software that includes everything needed to run it. Fast to deploy, self-contained, upgrades are easy, scalable and multi-tenant.

In SQL 2019, AG is also available in Kubernates. Virtualization was not easily accepted in 2003 when it was new. Containers are same in 2018 now. But this is the future of infrastructure.

Containers:

They are isolated but share OS, bins\libraries. VM has full copy of Host OS. Host don’t share OS. With containers you can get more density on less usage utilization. Faster to deploy than VMs. Storage is persisted. Better destroy than VMs. They are licensed just as VMs. Either license separately or license one host and install as many.

Compute and data are separated. Hard to do it in RDBMS. You can scale them independently.

Demo: SQL server 2019 and Docker

  1. Docker pull (path)
  2. Sudo Docker run

Worked in SQL operational Studio. Not a replacement for SSMS.

 Kubernates:

Initially developed by Google. Runs containerized application with HA. Greek word meaning a person who veers the ship. Frequently deployed in conjunction with Docker using contained service and Docker registry.

SQL Server and Kubernates:

Nodes per Kubernates. Each sql server and agent in its pod and in its node. There will be a master node in the cluster. Master runs the cluster. Nodes are individual hosts like VMs.

 

 

Session 3: Principle of Least Privilege: The Key to Strong Security

Speaker: Ed Leighton-Dick

Until now we have been putting more emphasis on network security so nobody can come inside and get our data. This is not enough anymore. 95% data breach happens with phishing. Next high is crypto jacking. This is not going to stop. Not only bad people outside but Inside Company who are malicious or going to work for competitors or just plain naïve. Till 2-3 years ago security was afterthought. Now it is forefront. In 2017, there were about 1500 reported data breach incidents in the US.

GDPR assumes that you will implement security and privacy built in the product not added afterwards. Requires huge investment.

Ed took 10 minutes, to explain why it is important and then started with what we can do. Gave an example of his experience where he was consulting, about a failure of security because everyone had full rights.

Watch for application service account with higher permissions.

Ideal security goal is that all accounts should be given permissions need to do the job and not more.

Applying least privilege for applications:

  • Require granular permissions, find out what it does

Applying least privileges for Developers:

  • They should not have access to production.
  • Should not have access to administrative rights.
  • Non-prod environment they should not have access to prod data.

Applying least permissions for DBA:

DBA should audit their own activity and ideal to have the least account and high account.

Ideally, there should be a security officer, not DBA and not a developer but manages keys. Even system administrator also should not have access to data.

Step 1: Analyze

Find out what vendor app does, what your app do, what your regulation requires you to do. Find out what is needed to be done not wanted to be done. What your users need to do. You may find them not to align.

Do Regular internal and external audits.

Step 2: Implementation

  • SQL server 2014+ – custom server roles
  • SQL server 2005 -2012 (separation of duties framework on CodePlex Archive)

[First demo of the session. Get the script from Ed’s blog.]

Execute as user

And run sys.fn_my_permissions.

Db_owner on DB can create sysadmin if trustworthy on db =1

Step 3: Repeat

It is not a onetime process, you have to periodically on a regular basis.

 

 

Friday, 11-09-2018 sessions:

Session 1: T-SQL Habits and Practices That Can Kill Performance

Speaker: Vern Rabe

 

  1. Union vs Union ALL

Union makes query unique by doing Sorts which is expensive. Run query with both to see execution plans.

  1. Using superfluous ORDER BY

If a query is only using columns in select that is covered by clustered index then list is already sorted. Run without it. See result of both. You will find order by was unnecessary.

  1. Not Joining on Keys

Usually, you want to join on keys. If not you may get duplicate. Then you will use duplicate which is a performance hit. The right way to do it is to use all the key (PK) columns in join predicate.

  1. Having too many JOINS

Optimizer figures out with an execution plan and comes up with good enough plans.

One way to solve the optimizer time out of big joins, break down into separate queries and store results in the temp tables and join the temp tables.

  1. Over-using or mis-using views

Views hide the complexity of join queries underneath. It starts with somebody writing a simple view then somebody comes and adds a few extra columns from another table. Make sure what is in the view before using the column in your query, maybe directly querying base table will be cheaper. Do not reference view in a view. Bad idea. Join view with a view increases complexity.

  1. User defined function in a predicate

Example month(F.orderdate). A month is not user-defined but is system defined.

Create a UDF using month(F.orderdate) built-in function.

Run both queries. Both plans are the same. Both are doing clustered index scan but hover over row arrow shows different numbers showing the second one is doing lot more work in the beginning and then filter out. Don’t rely on execution percentage for both queries for comparison.

 

  1.  Ignoring the transaction logs

Insert and update touches tlog. Anything you can do touch the diskless will improve the performance. Example. Insert into temp table from a regular table and check using sys.fn_dblog and then in next query run an update on temp table and verify using fn_dblog. Out of 200 rows only 5 updated, the first query will update all rows. The second query will only update matching the criteria and hence less written to tlog.

His stored procedure creates where clauses very easily. His script contains this stored proc. Using EXCEPT results in the same output.

  1. Ignoring data types

Mis-matching data types in predicate will cause implicit and explicit conversions. He used a number table with 0.5 million rows. In one table column is int and another table column is char. Now compare sp_spaceused of both tables. The second table will be bigger and size will affect the query performance. Duration of both queries will differ as you can see the Select properties (Query time stats) in execution plans.

  1. Not using OPTION (RECOMPILE)

It has special uses. When the first time you execute SP, optimizer comes up with a plan. The second time it uses the plan, good thing because it saves time. Recompile will ask for a new plan each time. If a cached plan is small values but next time your passing parameter with large number of rows, a previous plan will work badly. Analyze the parameter value range and decide if you want to use recompile. Every single statement in the SP will get a plan. Not all need to be a recompile. Sp_recompile and With recompile will recompile the entire SP. The nested loop works great when one of the join tables has small no. of rows.

  1. Not making predicates SARGABLE (Able to search by Argument).

Helps optimizer to select indexes.

Where number < @i+1 (Sargable)

Where number -1 < @I (32K pages read vs 3 pages)

 

 

Session 2: Columnstore Clustered vs Nonclustered, When to Choose What

Speaker: Niko Neugebauer

Index Types:  Clustered, physical structure and all columns are sorted. NC can be 999 per table.

Columnstore indexes: Rowstore is horizontal (clustered + nonclustered). Page and row compressions available. CS compresses data for each column hence can get more compression. Only one ColumnStore (CS) index allowed either clustered or non-clustered.

OLTP Architecture: insert, update and delete load, no CS index should be created because it will not benefit.

Operational analytics: need 1 columnstore index. 1 Non-clustered (NCCI) is needed.

DW: – 1 Clustered Columnstore Index (CCI)

BI: – hybrid, you have to think and do trial and error.

SQL 2012 had NCCI for DW. Could not update them unless you are switching the partition. No OLTP support

In SQL 2014, got CCI with updatable. Batch execution mode available. But still for DW only. Cannot use any rowstore indexes with CCI only with NCCI.

In SQL 2016, NCCI becomes updatable. More restriction for DW, CDC, CT supported only got NCCI. NCCI can be filtered. Support for replication, indexed views (not documented) added. Got OLTP NCCI, in-memory (Hekaton). Not CCI. Natively compiled SP does not support CCI

In SQL 2017, LOBs data type support is finally included CCI. Computed column supported.

SQL Server 2019 included an online rebuild of CCI. Expect 2 to 3 times execution time difference because optimizer will use delta store. If it takes 1 hour to rebuild offline, It will take 3 or 4 hours online.

 

Session 4: Lightning Talks: DBA

 

This Lightning Talk session will be comprised of five 10-minute talks on SQL Server, DevOps, TempDB Performance, Troubleshooting and more! Sessions and Speakers will include:

  • “SQL Server Performance Troubleshooting: Bottom to Top… or Top to Bottom?” Andy Yun
  • “How High Is Your Availability?” Allan Hirt
  • “The Top Five Things DBAs Need to Know About DevOps” Hamish Watson
  • “Evil Indexes Reducing Performance of SELECT” Pinal Dave
  • “SQL Server’s Most Wanted: Five Common Vulnerabilities” Ed Leighton-Dick

Allan Hirt:

Fedex example: Due to virus cost 300 million dollars. Hurricane mother nature outages. 3 short stories to make a point of what he is going to say.

It’s about the requirement. He asks the business what you are trying to achieve. He wants to make sure why they want something is clear to them.

Technology can provide business continuity but should not drive it. How much risk business is ready to take? This answer should be on paper in terms of RPO and RTO.

Keep systems simple, recovery will be faster. Do not implement anything you are not ready to monitor and support. Automation breaks down so have a plan B.

His main point was define availability requirements first.

Andy Yun:

What is a proper performance method? Driving to work analogy.

He used a lot of his driving emotional pictures and made a connection that who like to be stuck in a traffic and similarly we don’t like to be stuck behind slow queries. What is the root cause of congestion? Sp_whoisactive he used. This is one way to look at the query. Look at it differently like look at the traffic from a helicopter to see what is causing it. Take top down approach. Look at server level what is going on. Such as resource monitor or perf monitor. Use tools like SentryOne. It tells you the high level wait type and then looks at closer to a database. Narrow it down to wait types and queries.

Hamish Watson:

We don’t trust that’s why we do manual things. We do not use source control consistently. Application changes can break our database.

In SSMS, it is hard to use Source Control but VS does. tSQLT is a free unit testing tool. SSDT allows you to incorporate TSQL unit testing.

Continuous Integration (CI) – each check-in is verified by an automated process. This allows for continuous delivery (CD).

Culture is very important. Because this process needs partnership not silos.

Pinal Dave:

He showed an example where Select query improves when too many indexes created. But results in slow down when Deletes, insert and updates happen. He says SELECT query will not only run slower, if there is an index and optimizer does not choose it for some reason but unused indexes actually adversely affect the execution compared to if there was no index at all.

Ed Leighton_Dick:

Patching, logging, auditing all actions are good and needed. The best defense against SQL Injection is to use safeguarded stored procedures.

 

I hope you find these notes useful. I apologize for lot of grammatical and shorthand mistakes in my notes.

 

Speaking at SQL Saturday events:

June 11, 2016, Iowa City, IA http://www.sqlsaturday.com/523/Sessions/Schedule.aspx
September 24, 2016, Kansas City, MO http://www.sqlsaturday.com/548/Sessions/Schedule.aspx
January 14, 2017, Nashville, TN http://www.sqlsaturday.com/581/Sessions/Schedule.aspx
March 11, 2017, Chicago, IL http://www.sqlsaturday.com/600/Sessions/Schedule.aspx
April 08, 2017, Madison, WI http://www.sqlsaturday.com/604/Sessions/Schedule.aspx
August 05, 2017, Louisville, KY http://www.sqlsaturday.com/643/Sessions/Schedule.aspx
September 16, 2017, Wausau, WI http://www.sqlsaturday.com/631/Sessions/Schedule.aspx
July 21, 2018, Louisville, KY http://www.sqlsaturday.com/729/Sessions/Schedule.aspx
July 28, 2018, Columbus, OH http://www.sqlsaturday.com/736/Sessions/Schedule.aspx
August 11, 2018, Indianapolis, IN http://www.sqlsaturday.com/745/Sessions/Schedule.aspx
August 18, 2018, Sioux Falls, SD http://www.sqlsaturday.com/787/Sessions/Schedule.aspx
January 12, 2019, Nashville, TN https://www.sqlsaturday.com/815/Sessions/Schedule.aspx
February 02, 2019, Cleveland, OH https://www.sqlsaturday.com/821/Sessions/Schedule.aspx
April 06, 2019, Madison, WI https://www.sqlsaturday.com/842/Sessions/Schedule.aspx

 

I am hearing a lot about GDPR (General Data Protection Regulation) in past few months. I decided to do some digging up to know how it affects us in the United States. Those of you who directly want to go to the source of my information and avoid my 2 cents are welcome to check out these 2 links. This Law is explained in very easy terms for a layperson.

https://www.red-gate.com/simple-talk/opinion/opinion-pieces/questions-gdpr-shy-ask/?utm_source=simpletalk&utm_medium=pubemail&utm_content=20171121-slota2&utm_term=simpletalkmain

https://www.csoonline.com/article/3202771/data-protection/general-data-protection-regulation-gdpr-requirements-deadlines-and-facts.html

Those of you who are short on time and want to just get the skinny version, then keep on reading.

  1. This law is about protecting European Citizens from data breaches of companies inside and outside Europe
  2. This Law may affect us if and when we store data of EU citizens.
  3. This is for EU consumer safety.
  4. In case of a data breach, the companies even outside EU can become part of the lawsuit.
  5. To avoid the lawsuit, a USA based company have to provide the EU country representative with a notification within 3 days of a data breach.
  6. Companies can be fined if the governing body finds the proof that adequate data protection measures were not taken by the company whose data was breached.

It is good to keep this EU law on our radar even if our company is not storing EU consumer data. I think something like this law will become the norm in next 5 years across the global communities. After all, it is the responsibility of the company who collects and stores consumer data to provide safeguards to protect the data. The basic rationale of this law is simple. Since companies make a profit by using a consumer information, it is fair that they should spend a part of that profit to protect that information. This provides clear accountability on one hand and provides peace of mind to the citizens on the other hand. In my opinion, this is a win-win situation.

I would appreciate your comments on this topic.

What you like to LEARN?

Here I am putting down a simple method of removing SQL Server 2016 named or default instance. I am assuming here that there are no user databases on this instance. I stopped all SQL  Server services before I started this uninstall process.

  1. To begin the uninstall process, go to Control Panel and then Programs and Features.
  2. Select Microsoft SQL Server 2016 and select Uninstall. Then click Remove. This starts the Remove SQL Server 2016 Wizard.

                                                              

  1. Setup Support Rules runs to verify your computer configuration. Since there is only one instance of SQL server 2016 on this machine, wizard selects it for you. Otherwise, you can use the drop-down menu to select your instance for removal. To continue, click Next.

4. After hitting NEXT, you will get Ready to Remove dialog box showing the summary of Engine and features that are going to be removed. Click Remove. The removal process will begin and it may take several minutes depending on your installation.

  1. If everything goes smoothly, you will receive a success message. You can go to the hyperlinked text file to see the report of the whole process of removal. Click Close to complete the removal of SQL Server 2016 instance.

In the end, you can refer to this TechNet document for more detailed instructions. https://technet.microsoft.com/en-us/windows/ms143412(v=sql.90)

 

IS IT ONE MORE BORING BLOG?

I am writing this blog to remind myself how the query processing work. It is a forgettable item because it is different from the way you write a query. To begin with let me write a simple query.

Select * from AdventureWorks2012.Person.Person

SIMPLE ANALYSIS

We have 3 elements in this simple query. First is the keyword “SELECT”. This keyword means you are retrieving data from the table (as oppose to inserting or updating data in a table). A Second element is “*” which denotes the list of columns in a table. The third element is the “FROM” clause which tells you to retrieve data from a particular table.

When SQL Server Query Optimizer receives this query it processes it in a logical order instead of a physical order. Most logical and cost-effective thing is to first check if the table exists or not. So the “FROM” clause will be processed first. Next, it checks for columns requested also exist or not. If a table exists and the columns requested also exist then Query Optimizer looks at “SELECT” keyword and retrieves the data for you.

Look at the direction of a blue arrow in the execution plan of this query. It shows, that Query Optimizer scans the table first (using the FROM clause) and then do the SELECT.

Logical query processing 1

This is how the flow diagram will look

Flow Diagram 1

Flow Diagram 1

Let’s add some more spice to this query.

Select * from AdventureWorks2012.Person.Person where Lastname = ‘Ford’

This query returns only 3 rows. “WHERE” clause in the query is filtering the table data on Lastname column. Query optimizer first checks the physical objects (tables and columns) exist and then see the filter and uses the index if there is one on the filtering column. Note the key_lookup operator in the below execution plan.  The reason is that the Non-Clustered index chosen in the Seek operation is not the covering index (remember we are using * ). In that case clustered index of the table is used to point to the data which generates key_lookup operator. It is an important consideration when you are doing query tuning. But for our purpose here it is not important. Another point to note is the size of arrows. Arrows are thin in this execution plan because the query is returning only 3 rows compared to previous plan where the query was returning about 20K rows.

Logical Query Processing 2

Logical Query Processing 2

Flow Diagram 2

Flow Diagram 2

Next Step:

Having a clear concept of how the query is logically processed will help you write more efficient and Optimizer friendly queries. In my next blog, I will take the learning done here and extend it to other query clauses such as GROUP BY and ORDER BY.

Yesterday I attend this session after work and took some notes. Sian is a great presenter and possesses wealth of information.

Topic: How Mind Works Under Stress

Presenter: Sian Beilock (Author of books Choke, How the body knows the mind)

Venue: Heller Auditorium Francis Parker School.

Notes

What happens when we fail to reach our potential? Do we feel like we are choking under the pressure? Why we fail to perform under pressure? Sian’s research is about finding techniques to overcome this choking effect and perform better.

Frontal cortex, front part of brain, is much bigger in humans than any other primates. It defines us as humans. This defines what we are capable of. FC communicates with other parts of the brain. Under stress FC stops this communication and results in weird consequences. During stressful situations, FC goes haywire but it does that in a predictable way. We have to find that pattern; Paralysis by Analysis.

What happens under stress situation is that we second guess our ability and pay detail attention to things that should be on autopilot. We become very conscious of ourselves. We start breaking down the details of our skills. Focusing on the outcome instead will give better results. All we want to do is sail rather than fail.

A Study was done at Cornell University with medical students. They divided the group of medical students of the same age and level into 2 groups. One who is preparing to take their board exams and one not. They measured the brain activities showing how the brain gets ready for board exam compared with other people not having exam. What changes in the brain? Study showed people who were excessively worried about the exam performed bad at critical thinking problems. Study also concludes that stress in one area of life effects other parts of the lives.

Now let us talk about what can we do help our brain perform at its peak.

  • People are more likely to get the answers if they walk away from the difficult problem for a while. This is taking a step back and view the problem differently. Mind resets and you get the connection back.
  • Sleep is an important for our life. Mind does not stop working even when we are sleeping but it works on its on things. It makes new connections from the information we gathered in our waking hours.
  • Talking out your problems to someone who does not know the answer will also help you solve the problem on your own.
  • Our emotions also messes with Frontal Cortex’s functions. When we are under pressure, FC does not keep the negative emotions away and gets us. In teens, FC is under developed, it completes in 20s.

fMRI, Functional MRI (Magnetic resonance Imaging) is one way of testing brain activities. It shows functioning of brain in real time. People who are nervous about Math problems were told they are going to solve a math problem. The part of brain that lights up when we are physically harmed lighted up for these people who were fearing Math problems. To them anticipating Math was as painful as they have been pricked by a needle. This is caused by lack of blood supply to Neurons in the brain. If you tell them ahead about the problem, fear shuts down FC and they don’t perform better.

What can we do about this negativity? Every time negative emotion comes, practice changing it with one thing that you can change. The practice is important. Behavior will change gradually. How we think about ourselves matters very much. If we think we are better at something, chances are we will do better in that in that task. Asian girls did better on Math problems when they thought they are Asian and therefore better at Math than when they thought they are girls and therefore not good at Math.

  • Taking walks in nature is beneficial for brain and helps focus our attention. Even looking at pictures of nature will give some of the benefits.
  • Practice being in the stressful situation and when actual stressful condition happens you are more ready to cope and perform better in it. Simulating the actual situation will help.
  • Your body language send other messages weather you are ready or not. It also gives message to your own brain. So keeping your posture upright also keeps away negative emotions.
  • Journaling is best way to get the negativity out. This opportunity to download you thoughts on paper helps you give perspective.

 10-07-2015 

Brief Recap of the session:

This technical briefing’s main objective was to demonstrate the latest and greatest features of SQL Server 2016 and how running it on SanDisk hardware multiplies the benefits of new features.

This demo was conducted in Microsoft Technology Center in Chicago and was jointly presented by Microsoft and SanDisk representatives. It was a 300-level advanced preview of the hardware and software without any sales pitch.

Their Fusion IoMemory card goes directly in the PCI slot on the server’s motherboard and they have parity NAND chips on that card for fault tolerance. Price went down considerably for the same product since the April 2015 for two reasons. First their Shanghai, Chaina NAND chip manufacture has become more efficient in producing high quality chips and secondly partnership with Microsoft and other hardware vendors such as Dell and HP.

20151007_110956

These are some of the numbers we get with SanDisk flash when tables are stored as normal (i.e; on Sandisk without using columnstore feature of SQL Server).

20151007_110408

SQL server 2016 has opened the clogged artery meaning flash drive performance is out of the charts. They were talking in terms of IOPS, throughput and latency. Now SQL server 2016 (compared to sql server 2014) using Sandisk or any flash drive can deliver IOPS in millions, Throughput in GBs and latencies in microseconds. The enhancement is due to the improved ColumnStore feature in SQL Server 2016. IO bottleneck for OLTP or OLAP load almost vanishes.

20151007_115143

These improvements directly affects the HA and DR environments. For example Always on Availability Group. Log transfer from one database to another, on the secondary, is much faster because of separating the redo process at the secondary server. AG, in synchronized mode, waits for the confirmation from the secondary server. Before sql server 2016, a record has to be written to the disk on the secondary and applied to the database before confirmation is sent to the Primary node. Now confirmation is sent as soon as the records is written to the disk. Redo process than run in the background at secondary server to apply the changes to the database. SQL 2016 AG now have load balancing feature included.

20151007_113725

Another benefit is cost savings in SQL server license. Because most of the time sql load is IO bound and CPU hardly goes above 25% on average. These flash drives will make CPU work for what they are worth. So only 2 CPUs are needed for server load as big as 10 TB Datawarehouse.

My impressions:

Since our databases are not even 1 TB, the whole critical databases in production can be stored on these flash drives. Or we can have our log files and tempdb files on flash drives. There are lot of options. And when these cards are purchased with VM vendors, there is warranty included which replaces these cards when it fails without cost to us.

They also have caching feature, I think this a separate software based product that needed to be purchased separately and maybe useful for application level processes to run faster.

Another point is that we are skipping upgrade to SQL server 2014 so we are naturally going to upgrade to SQL server 2016 in year 2017. I think we can plan in 2016 for testing and purchasing these flash drives.

Additionally, these will also make our HA and DR very reliable in terms of failover.

In my opinion, we can take advantage of flash drives for production load in our VMs when we upgrade SQL Server to 2016.

FileStream directory

The file stream network path technically is this \\SQL Server name\Network Share Name\Database File stream Directory Name\<File Table Name>

  1. SQL Server Name – (starting with \\)
  2. Windows Share Name – It is visible in SQL Server Configuration tool -> SQL Server properties -> FileStream tab.
  3. Database File stream Directory Name – It can be seen in Database Properties -> Options -> FILESTREAM Directory Name
  4. The last part is the file table name.

Some helpful FileStream queries

Exploration Queries:

SELECT DB_NAME(database_id) AS DatabaseName,non_transacted_access_desc, directory_name  FROM sys.database_filestream_options WHERE directory_name IS NOT NULL

SELECT directory_name,is_enabled,filename_collation_name FROM sys.filetables;

SELECT * FROM sys.tables WHERE is_filetable = 1

SELECT object_id, OBJECT_NAME(object_id) AS ‘Object Names’    FROM sys.filetable_system_defined_objects

— To get the root level UNC path of a file table

SELECT FileTableRootPath()

SELECT FileTableRootPath(N’file_table_name’)

Enable Disable Queries:

https://msdn.microsoft.com/en-us/library/gg492061.aspx#HowToDisable

— Disable Non-Transactional write access.

ALTER DATABASE database_name

SET FILESTREAM ( NON_TRANSACTED_ACCESS = READ_ONLY );

— Disable non-transactional access.

ALTER DATABASE database_name     SET FILESTREAM ( NON_TRANSACTED_ACCESS = OFF );

— Enable full non-transactional Access

ALTER DATABASE database_name     SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL );

Find locks held by filestream queries:

Open handles to the files stored in a FileTable can prevent the exclusive access that is required for certain administrative tasks. To enable urgent tasks, you may have to kill open file handles associated with one or more FileTables.

SELECT * FROM sys.dm_filestream_non_transacted_handles;

— To identify open files and the associated locks

SELECT opened_file_name    FROM sys.dm_filestream_non_transacted_handles

WHERE fcb_id IN        ( SELECT request_owner_id FROM sys.dm_tran_locks );

— Kill all open handles in all the filetables in the database.

EXEC sp_kill_filestream_non_transacted_handles;

— Kill all open handles in a single filetable.

EXEC sp_kill_filestream_non_transacted_handles @table_name = ‘filetable_name’;

— Kill a single handle.

EXEC sp_kill_filestream_non_transacted_handles @handle_id = integer_handle_id;

Code Re-usability:

— Use the relative path for portable code

https://msdn.microsoft.com/en-us/library/gg492087(v=sql.110).aspx

DECLARE @root nvarchar(100);

DECLARE @fullpath nvarchar(1000);

SELECT @root = FileTableRootPath();

SELECT @fullpath = @root + file_stream.GetFileNamespacePath() FROM EDMS

WHERE name = N’document_name’;

PRINT @fullpath;

How FileStream works with some of the High Availability Features of SQL Server 2012?

 AlwaysOn Availability Groups and FileTables

https://msdn.microsoft.com/en-us/library/gg492086.aspx#alwayson

When the database that contains FILESTREAM or FileTable data belongs to an AlwaysOn availability group:

  • FileTable functionality is partially supported by AlwaysOn Availability Groups. After a failover, FileTable data is accessible on the primary replica, but FileTable data is not accessible on readable secondary replicas.
  • The FILESTREAM and FileTable functions accept or return virtual network names (VNNs) instead of computer names.
  • All access to FILESTREAM or FileTable data through the file system APIs should use VNNs instead of computer names.

Replication and FileTables

Replication and related features (including transactional replication, merge replication, change data capture, and change tracking) are not supported with FileTables.