A closer look at everything related to SQL Server

IS IT ONE MORE BORING BLOG?

I am writing this blog to remind myself how the query processing work. It is a forgettable item because it is different from the way you write a query. To begin with let me write a simple query.

Select * from AdventureWorks2012.Person.Person

SIMPLE ANALYSIS

We have 3 elements in this simple query. First is the keyword “SELECT”. This keyword means you are retrieving data from the table (as oppose to inserting or updating data in a table). A Second element is “*” which denotes the list of columns in a table. The third element is the “FROM” clause which tells you to retrieve data from a particular table.

When SQL Server Query Optimizer receives this query it processes it in a logical order instead of a physical order. Most logical and cost-effective thing is to first check if the table exists or not. So the “FROM” clause will be processed first. Next, it checks for columns requested also exist or not. If a table exists and the columns requested also exist then Query Optimizer looks at “SELECT” keyword and retrieves the data for you.

Look at the direction of a blue arrow in the execution plan of this query. It shows, that Query Optimizer scans the table first (using the FROM clause) and then do the SELECT.

Logical query processing 1

This is how the flow diagram will look

Flow Diagram 1

Flow Diagram 1

Let’s add some more spice to this query.

Select * from AdventureWorks2012.Person.Person where Lastname = ‘Ford’

This query returns only 3 rows. “WHERE” clause in the query is filtering the table data on Lastname column. Query optimizer first checks the physical objects (tables and columns) exist and then see the filter and uses the index if there is one on the filtering column. Note the key_lookup operator in the below execution plan.  The reason is that the Non-Clustered index chosen in the Seek operation is not the covering index (remember we are using * ). In that case clustered index of the table is used to point to the data which generates key_lookup operator. It is an important consideration when you are doing query tuning. But for our purpose here it is not important. Another point to note is the size of arrows. Arrows are thin in this execution plan because the query is returning only 3 rows compared to previous plan where the query was returning about 20K rows.

Logical Query Processing 2

Logical Query Processing 2

Flow Diagram 2

Flow Diagram 2

Next Step:

Having a clear concept of how the query is logically processed will help you write more efficient and Optimizer friendly queries. In my next blog, I will take the learning done here and extend it to other query clauses such as GROUP BY and ORDER BY.

Yesterday I attend this session after work and took some notes. Sian is a great presenter and possesses wealth of information.

Topic: How Mind Works Under Stress

Presenter: Sian Beilock (Author of books Choke, How the body knows the mind)

Venue: Heller Auditorium Francis Parker School.

Notes

What happens when we fail to reach our potential? Do we feel like we are choking under the pressure? Why we fail to perform under pressure? Sian’s research is about finding techniques to overcome this choking effect and perform better.

Frontal cortex, front part of brain, is much bigger in humans than any other primates. It defines us as humans. This defines what we are capable of. FC communicates with other parts of the brain. Under stress FC stops this communication and results in weird consequences. During stressful situations, FC goes haywire but it does that in a predictable way. We have to find that pattern; Paralysis by Analysis.

What happens under stress situation is that we second guess our ability and pay detail attention to things that should be on autopilot. We become very conscious of ourselves. We start breaking down the details of our skills. Focusing on the outcome instead will give better results. All we want to do is sail rather than fail.

A Study was done at Cornell University with medical students. They divided the group of medical students of the same age and level into 2 groups. One who is preparing to take their board exams and one not. They measured the brain activities showing how the brain gets ready for board exam compared with other people not having exam. What changes in the brain? Study showed people who were excessively worried about the exam performed bad at critical thinking problems. Study also concludes that stress in one area of life effects other parts of the lives.

Now let us talk about what can we do help our brain perform at its peak.

  • People are more likely to get the answers if they walk away from the difficult problem for a while. This is taking a step back and view the problem differently. Mind resets and you get the connection back.
  • Sleep is an important for our life. Mind does not stop working even when we are sleeping but it works on its on things. It makes new connections from the information we gathered in our waking hours.
  • Talking out your problems to someone who does not know the answer will also help you solve the problem on your own.
  • Our emotions also messes with Frontal Cortex’s functions. When we are under pressure, FC does not keep the negative emotions away and gets us. In teens, FC is under developed, it completes in 20s.

fMRI, Functional MRI (Magnetic resonance Imaging) is one way of testing brain activities. It shows functioning of brain in real time. People who are nervous about Math problems were told they are going to solve a math problem. The part of brain that lights up when we are physically harmed lighted up for these people who were fearing Math problems. To them anticipating Math was as painful as they have been pricked by a needle. This is caused by lack of blood supply to Neurons in the brain. If you tell them ahead about the problem, fear shuts down FC and they don’t perform better.

What can we do about this negativity? Every time negative emotion comes, practice changing it with one thing that you can change. The practice is important. Behavior will change gradually. How we think about ourselves matters very much. If we think we are better at something, chances are we will do better in that in that task. Asian girls did better on Math problems when they thought they are Asian and therefore better at Math than when they thought they are girls and therefore not good at Math.

  • Taking walks in nature is beneficial for brain and helps focus our attention. Even looking at pictures of nature will give some of the benefits.
  • Practice being in the stressful situation and when actual stressful condition happens you are more ready to cope and perform better in it. Simulating the actual situation will help.
  • Your body language send other messages weather you are ready or not. It also gives message to your own brain. So keeping your posture upright also keeps away negative emotions.
  • Journaling is best way to get the negativity out. This opportunity to download you thoughts on paper helps you give perspective.

 10-07-2015 

Brief Recap of the session:

This technical briefing’s main objective was to demonstrate the latest and greatest features of SQL Server 2016 and how running it on SanDisk hardware multiplies the benefits of new features.

This demo was conducted in Microsoft Technology Center in Chicago and was jointly presented by Microsoft and SanDisk representatives. It was a 300-level advanced preview of the hardware and software without any sales pitch.

Their Fusion IoMemory card goes directly in the PCI slot on the server’s motherboard and they have parity NAND chips on that card for fault tolerance. Price went down considerably for the same product since the April 2015 for two reasons. First their Shanghai, Chaina NAND chip manufacture has become more efficient in producing high quality chips and secondly partnership with Microsoft and other hardware vendors such as Dell and HP.

20151007_110956

These are some of the numbers we get with SanDisk flash when tables are stored as normal (i.e; on Sandisk without using columnstore feature of SQL Server).

20151007_110408

SQL server 2016 has opened the clogged artery meaning flash drive performance is out of the charts. They were talking in terms of IOPS, throughput and latency. Now SQL server 2016 (compared to sql server 2014) using Sandisk or any flash drive can deliver IOPS in millions, Throughput in GBs and latencies in microseconds. The enhancement is due to the improved ColumnStore feature in SQL Server 2016. IO bottleneck for OLTP or OLAP load almost vanishes.

20151007_115143

These improvements directly affects the HA and DR environments. For example Always on Availability Group. Log transfer from one database to another, on the secondary, is much faster because of separating the redo process at the secondary server. AG, in synchronized mode, waits for the confirmation from the secondary server. Before sql server 2016, a record has to be written to the disk on the secondary and applied to the database before confirmation is sent to the Primary node. Now confirmation is sent as soon as the records is written to the disk. Redo process than run in the background at secondary server to apply the changes to the database. SQL 2016 AG now have load balancing feature included.

20151007_113725

Another benefit is cost savings in SQL server license. Because most of the time sql load is IO bound and CPU hardly goes above 25% on average. These flash drives will make CPU work for what they are worth. So only 2 CPUs are needed for server load as big as 10 TB Datawarehouse.

My impressions:

Since our databases are not even 1 TB, the whole critical databases in production can be stored on these flash drives. Or we can have our log files and tempdb files on flash drives. There are lot of options. And when these cards are purchased with VM vendors, there is warranty included which replaces these cards when it fails without cost to us.

They also have caching feature, I think this a separate software based product that needed to be purchased separately and maybe useful for application level processes to run faster.

Another point is that we are skipping upgrade to SQL server 2014 so we are naturally going to upgrade to SQL server 2016 in year 2017. I think we can plan in 2016 for testing and purchasing these flash drives.

Additionally, these will also make our HA and DR very reliable in terms of failover.

In my opinion, we can take advantage of flash drives for production load in our VMs when we upgrade SQL Server to 2016.

FileStream directory

The file stream network path technically is this \\SQL Server name\Network Share Name\Database File stream Directory Name\<File Table Name>

  1. SQL Server Name – (starting with \\)
  2. Windows Share Name – It is visible in SQL Server Configuration tool -> SQL Server properties -> FileStream tab.
  3. Database File stream Directory Name – It can be seen in Database Properties -> Options -> FILESTREAM Directory Name
  4. The last part is the file table name.

Some helpful FileStream queries

Exploration Queries:

SELECT DB_NAME(database_id) AS DatabaseName,non_transacted_access_desc, directory_name  FROM sys.database_filestream_options WHERE directory_name IS NOT NULL

SELECT directory_name,is_enabled,filename_collation_name FROM sys.filetables;

SELECT * FROM sys.tables WHERE is_filetable = 1

SELECT object_id, OBJECT_NAME(object_id) AS ‘Object Names’    FROM sys.filetable_system_defined_objects

— To get the root level UNC path of a file table

SELECT FileTableRootPath()

SELECT FileTableRootPath(N’file_table_name’)

Enable Disable Queries:

https://msdn.microsoft.com/en-us/library/gg492061.aspx#HowToDisable

— Disable Non-Transactional write access.

ALTER DATABASE database_name

SET FILESTREAM ( NON_TRANSACTED_ACCESS = READ_ONLY );

— Disable non-transactional access.

ALTER DATABASE database_name     SET FILESTREAM ( NON_TRANSACTED_ACCESS = OFF );

— Enable full non-transactional Access

ALTER DATABASE database_name     SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL );

Find locks held by filestream queries:

Open handles to the files stored in a FileTable can prevent the exclusive access that is required for certain administrative tasks. To enable urgent tasks, you may have to kill open file handles associated with one or more FileTables.

SELECT * FROM sys.dm_filestream_non_transacted_handles;

— To identify open files and the associated locks

SELECT opened_file_name    FROM sys.dm_filestream_non_transacted_handles

WHERE fcb_id IN        ( SELECT request_owner_id FROM sys.dm_tran_locks );

— Kill all open handles in all the filetables in the database.

EXEC sp_kill_filestream_non_transacted_handles;

— Kill all open handles in a single filetable.

EXEC sp_kill_filestream_non_transacted_handles @table_name = ‘filetable_name’;

— Kill a single handle.

EXEC sp_kill_filestream_non_transacted_handles @handle_id = integer_handle_id;

Code Re-usability:

— Use the relative path for portable code

https://msdn.microsoft.com/en-us/library/gg492087(v=sql.110).aspx

DECLARE @root nvarchar(100);

DECLARE @fullpath nvarchar(1000);

SELECT @root = FileTableRootPath();

SELECT @fullpath = @root + file_stream.GetFileNamespacePath() FROM EDMS

WHERE name = N’document_name’;

PRINT @fullpath;

How FileStream works with some of the High Availability Features of SQL Server 2012?

 AlwaysOn Availability Groups and FileTables

https://msdn.microsoft.com/en-us/library/gg492086.aspx#alwayson

When the database that contains FILESTREAM or FileTable data belongs to an AlwaysOn availability group:

  • FileTable functionality is partially supported by AlwaysOn Availability Groups. After a failover, FileTable data is accessible on the primary replica, but FileTable data is not accessible on readable secondary replicas.
  • The FILESTREAM and FileTable functions accept or return virtual network names (VNNs) instead of computer names.
  • All access to FILESTREAM or FileTable data through the file system APIs should use VNNs instead of computer names.

Replication and FileTables

Replication and related features (including transactional replication, merge replication, change data capture, and change tracking) are not supported with FileTables.

In my second post regarding R and statistics, I am going to describe my registering for online classes. See here for Part 1.

First, I registered on Coursera.org for a class named “R Programming”. This is a part of a Data Science Specialization and is offered by JHU. This will begin on August 3rd. The other online course is from EdX.org and it is self-paced and is called “Explore Statistics with R”. The class already started on July 7th but I just joined it today. It will end on August 31st. Hopefully, I will catch up.

Among these 2 classes and the R study group that I mentioned in my last post, I think it will give me very broad and somewhat deep understanding of how to use statistics with R application to solve real life problems. I will be also be attending other complimentary user groups such as ChicagoCityData which explores city’s datasets such as https://data.cityofchicago.org/. It has datasets such as landlord’s list, Police stations, crimes – 2001 to present and other such interesting ones.

Now let me explain more about the courses that I am taking and how it is going to help me in reaching my goal. But wait! What is my goal? What do I want to achieve by going through all this trouble of learning R, statistics and user groups?

As a Senior SQL Server Database Administrator, I am very familiar with data, datatypes, storage, performance, optimization etc. But 20 years ago I graduated as an Electrical Engineer and Statistics and Mathematics were my two favorite subjects. And since becoming a DBA 14 years ago, I did not get to use these 2 areas much. Learning R is just like learning any other programming language, such as TSQL, which I am pretty familiar with. I am somewhat familiar with scripting language “Powershell” also. But learning both together, R and Statistics, and applying them to solve practical issues is my dream come true scenario. Currently I am working with SQL Server 2012 version. SQL Server 2014 is out there and SQL Server 2016 will be out next year. With 2016 version, Microsoft is tightly integrating R Studio functionality. So now Data Scientist do not have to wait to get their big datasets and then work on the analysis. They can do it right on the SQL Server. How much performance hit it will be is yet to be seen. So you can see my motivation here. I am not going to leave the world of SQL Server because I fell in love with it and I would like to fall in love with R and statistics too. I would like to find new meanings in the data that I have on my fingertips. Bring new insight to my company and become successful myself at the same time.

Enough about loves! Back to class descriptions.

1. R Programming : –

I have taken this class online exactly a year ago but it became harder for me after week 3 and I could not  finish the last project. So after one year this  “Study group for An Introduction to Statistical Learning with Applications in R” has reignited my interest in pursuing this course and towards Data Science Specialization. This is a 4 week course and starting on Aug 3rd, 2015, as mentioned before. It requires about 10 hours per week of your time. It uses a cool R self teaching tool called “Swirl” and a book. The course will cover the following material each week:

  • Week 1: Overview of R, R data types and objects, reading and writing data
  • Week 2: Control structures, functions, scoping rules, dates and times
  • Week 3: Loop functions, debugging tools
  • Week 4: Simulation, code profiling

2. Explore Statistics with R : –

I have joined this class today and as mentioned earlier it began on July 7th, 2015. This is an 8 week course and requires about 5 hours per week of time commitment.It uses the materials from here and here. This is a self paced course meaning all the 5 weeks materials were posted on July 7th. So you take your time to finish it by August 31st when some project is due. The main outline of this course is:

  • Week 1: Get to know R
  • Week 2: How to import and clean data in R
  • Week 3: Statistics under the hood: distributions and tests.
  • Week 4: Non-parametric tests
  • Week 5: Visit the research frontier

I can see some overlap here between the 2 courses and that is why I am thinking I will be able to finish Statistics course by Aug 31st even after starting late. At the meetups, I would also like to help others because I know how it feels when you are struggling and not sure of yourself whether you can do it or not and it is for you or not. I would be able to encourage people to keep learning and not give up. Fruits are right within your reach you just have to go little closer.

See you in next post!

Hi Fellow Readers,

I recently joined a study group called “Statistical Learning with Application in R” on the meetup.com and attended its first session yesterday in BrainTree’s office at 222 Merchandise Mart, Chicago.

This group is about those people who have some familiarity about R programming but lack the statistical depth or for those who know the statistics but are new to the R programming. Basically anyone having a motivation to learn something new and inclination to share their knowledge can join it. In future, meetup organizer may add Google hangout feature to the meetup so that remote people can also join in.

Statistical learning recently became hot topic with explosion of Big Data and Machine Learning. Moreover, the new job market for Data Scientist has given it much hype. But what is statistical learning? Basically it is the study of tools to help predict and infer from data. For example Linear Regression is used for predicting quantitative values such as salesman’s sales figure or individual’s salary. With the advent of computer technology in 1980s, it became feasible to calculate non-linear methods such as Classifications and Regression Trees. The subject of Machine Learning is essentially the study of statistics of non-linear methods.

Our group will be following the book “Introduction to Statistical Learning, 4th Edition” or ISL and the corresponding videos. This book is especially good for beginners.

From the book ISL itself, the basic premises of the book are:

  1. Many statistical learning methods are relevant and useful in a wide range of academic and non-academic disciplines, beyond just the statistical sciences.
  1. Statistical learning should not be viewed as a series of black boxes.
  1. While it is important to know what job is performed by each cog, it is not necessary to have the skills to construct the machine inside the box!
  1. We presume that the reader is interested in applying statistical learning methods to real-world problems.

This is enough for the first post on this topic. I am planning to write more as my jouRney progresses.

I am doing log shipping since SQL Server 2000. I did it with SQL Server 2005. But for the 4 years that I worked with SQL Server 2008 R2, I did not work with Log shipping. Nothing has changed in SQL server 2012. But to refresh my memory, here I am writing down simple steps for setting up log shipping.

  1. Make sure database is in Full recovery mode. In this case primary server is PriSQL\Prod01 and primary database is PriDB.
  2. Pre-initialize the database at secondary. It means take the full backup of PriDB and a transaction log backup of primary database SecDB and restore on secondary SecSQL\DR01 in Standby mode.
  3. Right click the primary database, go to properties and select “Enable this as a primary database in a log shipping configuration”.
  4. Select how frequently you are going to do log backups in “Backup Settings” button. Also provide the path where log backup goes. Some other settings on this page are set as follows:
    1. Delete files older than: 72 hours
    2. Alert if no backup occurs within: 2 hours
    3. Backup schedule: Every 15 minutes
    4. Backup Path: E:\SQLBackups\PriDB_log
    5. Job name: LSBackup_PriDB
    6. Backup compression: Use default server setting.
  5. Add the secondary server. In this case it is SecSQL\DR01. Secondary database is SecDB.
    1. Tab- Initialize secondary Database:
      • Select database is already initialized.
    2. Tab- Copy files:
      • Destination folder for copied file F:\SQLBackups\LogShip
      • Set 72 hours for Delete copied files.
      • Copy job: LSCopy_PriSQL\prod01_PriDB
      • Copy Schedule: Every 15 minutes.
    3. Tab- Restore Transaction Logs
      • Restore job: LSRestore_PriSQL\prod01_PriDB
      • Restore Schedule: Every 15 minutes
      • Delay restoring: 1 minute
      • Alert if no restore occurs: 2 hours