A closer look at everything related to SQL Server

Archive for July, 2015

Studying Statistics with R – 2

In my second post regarding R and statistics, I am going to describe my registering for online classes. See here for Part 1.

First, I registered on Coursera.org for a class named “R Programming”. This is a part of a Data Science Specialization and is offered by JHU. This will begin on August 3rd. The other online course is from EdX.org and it is self-paced and is called “Explore Statistics with R”. The class already started on July 7th but I just joined it today. It will end on August 31st. Hopefully, I will catch up.

Among these 2 classes and the R study group that I mentioned in my last post, I think it will give me very broad and somewhat deep understanding of how to use statistics with R application to solve real life problems. I will be also be attending other complimentary user groups such as ChicagoCityData which explores city’s datasets such as https://data.cityofchicago.org/. It has datasets such as landlord’s list, Police stations, crimes – 2001 to present and other such interesting ones.

Now let me explain more about the courses that I am taking and how it is going to help me in reaching my goal. But wait! What is my goal? What do I want to achieve by going through all this trouble of learning R, statistics and user groups?

As a Senior SQL Server Database Administrator, I am very familiar with data, datatypes, storage, performance, optimization etc. But 20 years ago I graduated as an Electrical Engineer and Statistics and Mathematics were my two favorite subjects. And since becoming a DBA 14 years ago, I did not get to use these 2 areas much. Learning R is just like learning any other programming language, such as TSQL, which I am pretty familiar with. I am somewhat familiar with scripting language “Powershell” also. But learning both together, R and Statistics, and applying them to solve practical issues is my dream come true scenario. Currently I am working with SQL Server 2012 version. SQL Server 2014 is out there and SQL Server 2016 will be out next year. With 2016 version, Microsoft is tightly integrating R Studio functionality. So now Data Scientist do not have to wait to get their big datasets and then work on the analysis. They can do it right on the SQL Server. How much performance hit it will be is yet to be seen. So you can see my motivation here. I am not going to leave the world of SQL Server because I fell in love with it and I would like to fall in love with R and statistics too. I would like to find new meanings in the data that I have on my fingertips. Bring new insight to my company and become successful myself at the same time.

Enough about loves! Back to class descriptions.

1. R Programming : –

I have taken this class online exactly a year ago but it became harder for me after week 3 and I could not  finish the last project. So after one year this  “Study group for An Introduction to Statistical Learning with Applications in R” has reignited my interest in pursuing this course and towards Data Science Specialization. This is a 4 week course and starting on Aug 3rd, 2015, as mentioned before. It requires about 10 hours per week of your time. It uses a cool R self teaching tool called “Swirl” and a book. The course will cover the following material each week:

  • Week 1: Overview of R, R data types and objects, reading and writing data
  • Week 2: Control structures, functions, scoping rules, dates and times
  • Week 3: Loop functions, debugging tools
  • Week 4: Simulation, code profiling

2. Explore Statistics with R : –

I have joined this class today and as mentioned earlier it began on July 7th, 2015. This is an 8 week course and requires about 5 hours per week of time commitment.It uses the materials from here and here. This is a self paced course meaning all the 5 weeks materials were posted on July 7th. So you take your time to finish it by August 31st when some project is due. The main outline of this course is:

  • Week 1: Get to know R
  • Week 2: How to import and clean data in R
  • Week 3: Statistics under the hood: distributions and tests.
  • Week 4: Non-parametric tests
  • Week 5: Visit the research frontier

I can see some overlap here between the 2 courses and that is why I am thinking I will be able to finish Statistics course by Aug 31st even after starting late. At the meetups, I would also like to help others because I know how it feels when you are struggling and not sure of yourself whether you can do it or not and it is for you or not. I would be able to encourage people to keep learning and not give up. Fruits are right within your reach you just have to go little closer.

See you in next post!

Advertisements

Studying Statistics with R – 1

Hi Fellow Readers,

I recently joined a study group called “Statistical Learning with Application in R” on the meetup.com and attended its first session yesterday in BrainTree’s office at 222 Merchandise Mart, Chicago.

This group is about those people who have some familiarity about R programming but lack the statistical depth or for those who know the statistics but are new to the R programming. Basically anyone having a motivation to learn something new and inclination to share their knowledge can join it. In future, meetup organizer may add Google hangout feature to the meetup so that remote people can also join in.

Statistical learning recently became hot topic with explosion of Big Data and Machine Learning. Moreover, the new job market for Data Scientist has given it much hype. But what is statistical learning? Basically it is the study of tools to help predict and infer from data. For example Linear Regression is used for predicting quantitative values such as salesman’s sales figure or individual’s salary. With the advent of computer technology in 1980s, it became feasible to calculate non-linear methods such as Classifications and Regression Trees. The subject of Machine Learning is essentially the study of statistics of non-linear methods.

Our group will be following the book “Introduction to Statistical Learning, 4th Edition” or ISL and the corresponding videos. This book is especially good for beginners.

From the book ISL itself, the basic premises of the book are:

  1. Many statistical learning methods are relevant and useful in a wide range of academic and non-academic disciplines, beyond just the statistical sciences.
  1. Statistical learning should not be viewed as a series of black boxes.
  1. While it is important to know what job is performed by each cog, it is not necessary to have the skills to construct the machine inside the box!
  1. We presume that the reader is interested in applying statistical learning methods to real-world problems.

This is enough for the first post on this topic. I am planning to write more as my jouRney progresses.

Setting up Log Shipping

I am doing log shipping since SQL Server 2000. I did it with SQL Server 2005. But for the 4 years that I worked with SQL Server 2008 R2, I did not work with Log shipping. Nothing has changed in SQL server 2012. But to refresh my memory, here I am writing down simple steps for setting up log shipping.

  1. Make sure database is in Full recovery mode. In this case primary server is PriSQL\Prod01 and primary database is PriDB.
  2. Pre-initialize the database at secondary. It means take the full backup of PriDB and a transaction log backup of primary database SecDB and restore on secondary SecSQL\DR01 in Standby mode.
  3. Right click the primary database, go to properties and select “Enable this as a primary database in a log shipping configuration”.
  4. Select how frequently you are going to do log backups in “Backup Settings” button. Also provide the path where log backup goes. Some other settings on this page are set as follows:
    1. Delete files older than: 72 hours
    2. Alert if no backup occurs within: 2 hours
    3. Backup schedule: Every 15 minutes
    4. Backup Path: E:\SQLBackups\PriDB_log
    5. Job name: LSBackup_PriDB
    6. Backup compression: Use default server setting.
  5. Add the secondary server. In this case it is SecSQL\DR01. Secondary database is SecDB.
    1. Tab- Initialize secondary Database:
      • Select database is already initialized.
    2. Tab- Copy files:
      • Destination folder for copied file F:\SQLBackups\LogShip
      • Set 72 hours for Delete copied files.
      • Copy job: LSCopy_PriSQL\prod01_PriDB
      • Copy Schedule: Every 15 minutes.
    3. Tab- Restore Transaction Logs
      • Restore job: LSRestore_PriSQL\prod01_PriDB
      • Restore Schedule: Every 15 minutes
      • Delay restoring: 1 minute
      • Alert if no restore occurs: 2 hours