Saturday, 24 March 2012

SqlServer Indexes Cont..




In this post,I will be extending  the discussion on indexes from my previous post,as we were discussing on non clustered indexes ,I will be covering a specific type of non clustered index called covering index.Before going in detail ,let me explain    what happens when   we   execute a query which has got non clustered index  with  clustered index optional.

Take an example table Product  with columns as Pid  Pname PCategory PsubCategory ,assume we got a clustered  index on Pid  and non clustered index on Pname ,Lets   execute a query like Select  Pcategory,PSubCategory  from Product.

We got a  Non clustered index on Pcategory but not on PsubCategory so what sqlserver optimizer  does
when  a column  in the select query is not included in an index is  a book mark look up ,during this book mark look up it makes a look up to a  clustered index in this case or a row id if we haven't got clustered index and fetches relative rows  from the table,clearly as you can observe  there  is an extra step  for this and this degrades performance,so in this case if  we include a non clustered index on the other column PsubCategory also ,than we can avoid the book mark look as we got both the columns of the select query in the form of indexes and fetched directly  without needing a lookup to a clustered index or row location .This naturally improves performance .

I  recommend to  have look at  graphical execution plans in Management studio which will give you a quick overview of the Various metrics associated with the cost of the queries,but remember spend as much time as you can on indexes and the way optimizer executes in sqlserver 

Thursday, 22 March 2012

Sqlserver Indexes

In this post I will be explaining indexes ,So the big question is  What are Indexes?
Well Indexes are used mainly for performance so that the optimizer can choose the best method for fetching records for a query ,Before going in to the details of indexes,  we will look at, how exactly Sql sever retrieves the information when we issue a query against its tables,Basically what  Sqlserver does, when haven't got indexes on relevant table is, it  will make a table scan to find the relevant information  ,So what is a table scan?In a table scan  what Sqlserver does is that ,it goes to each row of the table   to find the relevant match.

How Does it Effects the Performance,Assume if you got Millions of  records and If  you  scan all the rows than  it consumes lot of time  as you have to go to each and every  row , this consumes lot of time that is why the concept of indexes was introduced.remember a lot factors contribute to the optimization of query but  i want to  limit my discussion only to indexes ,which play an  important role in improving performance,I will discuss rest of the associated factors with indexes and query performance like statistics and fragmentation in future posts

What Indexes does is it  stores the data in an orderly way ,So if you issue a query against a table which contain indexes ,the execution engine will see the existence of the index   and it  knows where exactly those values are going to reside  and picks up the relevant records quickly  by taking a particular logical path ,this operation  is called  a seek a operation,to do this seek operation it takes B tree approach Namely Balanced Tree approach
Levels of a clustered index 
Picture Taken from Msdn

This Balanced Tree Approach Makes the whole process of fetching the records  easy  as it takes relevant path as shown in the above figure as per its data to find the records   and this avoids row by row operation and improving performance

Before Going any further I will explain the two Main Type of Indexes  Namely
Clustered and NonClustered

So what is a Clustered Index.A Clustered Index is an Index  which will order all the records of a table in an orderly way.So this naturally will lead to a question ,How many Clustered Indexes a table can have?well the answer is simple, physically we can order a table in   in only  one way ,So the answer is simple    there will be only  one Clustered Index per table .Remember Clustered Index orders the entire table in a particular order and one more important thing to understand before going to Non Clustered Index is that the Leaflevel Nodes of a Clustered indexes are actual Data.

Now we will  dig deeper into NonClustered Index.
In non clustered index the leaf level nodes of the Btree is not  actual data as in clustered, but it contains a pointer, so you may think how this pointer will actually help to find me the actual record  ,well this pointer points to  either a RowID of the table or a Clustered index Key if there is a Clustered index on the table to fetch the actual data.As you might be wondering why you use a non clustered index when there is this extra process of pointer to fetch the records which will effect performance,  well we know we can only create one  clustered index per table but  we can create as many as non clustered indexes i will explain this further in detail later ,but for now remember even though  the performance of non clustered index is low to clustered index but it gives a flexibility to create more than one non clustered index and for some queries how the performance an be improved without the need of a clustered  index , and non-clustered index is far better than  than row by row operation for highly selective  columns of large tables

Before you read any further I strongly recommend to understand the above process,this might be basic but is lays  a solid foundation how the Optimizer will use the indexes and  fetches  the records for performance,  rest of the post is totally dependent on this.

I will explain in detail more about indexes in next  post .











Tuesday, 27 December 2011

SSRS ExecutionLog For Performance

SSRS Executing log table contains  important metrics for performance.It allows to see the reports execution results which contains information aboutthe user who executed reports,the amount of time it has taken  for data retrieval,time taken to process,time taken to render,no of bytes,number of rows  ,time the report started execution,time the report completed etc .These metrics can be further analysed ,suppose we can use the no of rows and bytes to check which report is giving large number of records ,also the execution time for data retrievel and further we can use other performance tools to look into the details of  query if there is any other issues such as blocking .


Thursday, 22 December 2011

SSIS Connection Managers

There are many connecton managers ,the most popular connection managers are ADO.NET,OLEDB
so 'when to choose what' is important.when you need speed, use oledb else use Ado.Net .It Provides greater flexibility  where you can execute different scripts with parameters  but marshaling can be overhead which gives ado.net that extra flexibility

Another important   connection manager which is less known is Multi Flat-file connection manager,it allows to load multiple files.It brings major performance benefits as everything is executed within a data flow

Upcoming SSIS Books


Two new SSIS  books are coming from well known authors in early 2012

Professional Microsoft SQL Server 2012 Integration Services

SSIS Design Patterns

SSIS Non Relational Source Optimisation



When dealing with non relational  sources  a lot factors can play important role increasing the performance of the flow.As we know ssis moves the data in buffers in the data flow,so  the buffer size play an important role in  the overall  speed of load.When we connect to non relational    sources such as text file ,the source component allocates  default data type string of  length 50 as shown below










which might consume a lot of buffer size.You can optimise by selecting a sample of rows and analysing the data types and length.This can be done on connection manager editor  advanced  properties  and suggest types  tab and accept the suggest column types as below



and after  accepting suggest column types it will optimise to minimum length. based on the column length as
shown below






This is an important  step as it impacts buffer size heavily


Wednesday, 21 December 2011

SSIS DataFlow Destination Optimisation

Use FastLoad  at Destination ,try to use the batch size depending upon the buffer size .If you use Fast load and Buffer size of 0 it will push all the rows in single transaction,it may create some back pressure on the flow .

Avoid using sqlserver destination if your destination is on different server.
Make sure to turn on trace flag 610 for minimal logging.

Power BI Report Server

Power BI Report Server  is a new product which can be used to host both Power BI reports and SSRS reports. Currently beta version is ava...