Saturday, 24 March 2012

SqlServer Indexes Cont..

In this post,I will be extending the discussion on indexes from my previous post,as we were discussing on non clustered indexes ,I will be covering a specific type of non clustered index called covering index.Before going in detail ,let me explain what happens when we execute a query which has got non clustered index with clustered index optional.

Take an example table Product with columns as Pid Pname PCategory PsubCategory ,assume we got a clustered index on Pid and non clustered index on Pname ,Lets execute a query like Select Pcategory,PSubCategory from Product.

We got a Non clustered index on Pcategory but not on PsubCategory so what sqlserver optimizer does

when a column in the select query is not included in an index is a book mark look up ,during this book mark look up it makes a look up to a clustered index in this case or a row id if we haven't got clustered index and fetches relative rows from the table,clearly as you can observe there is an extra step for this and this degrades performance,so in this case if we include a non clustered index on the other column PsubCategory also ,than we can avoid the book mark look as we got both the columns of the select query in the form of indexes and fetched directly without needing a lookup to a clustered index or row location .This naturally improves performance .

I recommend to have look at graphical execution plans in Management studio which will give you a quick overview of the Various metrics associated with the cost of the queries,but remember spend as much time as you can on indexes and the way optimizer executes in sqlserver

Thursday, 22 March 2012

Sqlserver Indexes

In this post I will be explaining indexes ,So the big question is What are Indexes?

Well Indexes are used mainly for performance so that the optimizer can choose the best method for fetching records for a query ,Before going in to the details of indexes, we will look at, how exactly Sql sever retrieves the information when we issue a query against its tables,Basically what Sqlserver does, when haven't got indexes on relevant table is, it will make a table scan to find the relevant information ,So what is a table scan?In a table scan what Sqlserver does is that ,it goes to each row of the table to find the relevant match.

How Does it Effects the Performance,Assume if you got Millions of records and If you scan all the rows than it consumes lot of time as you have to go to each and every row , this consumes lot of time that is why the concept of indexes was introduced.remember a lot factors contribute to the optimization of query but i want to limit my discussion only to indexes ,which play an important role in improving performance,I will discuss rest of the associated factors with indexes and query performance like statistics and fragmentation in future posts

What Indexes does is it stores the data in an orderly way ,So if you issue a query against a table which contain indexes ,the execution engine will see the existence of the index and it knows where exactly those values are going to reside and picks up the relevant records quickly by taking a particular logical path ,this operation is called a seek a operation,to do this seek operation it takes B tree approach Namely Balanced Tree approach

Picture Taken from Msdn

This Balanced Tree Approach Makes the whole process of fetching the records easy as it takes relevant path as shown in the above figure as per its data to find the records and this avoids row by row operation and improving performance

Before Going any further I will explain the two Main Type of Indexes Namely

Clustered and NonClustered

So what is a Clustered Index.A Clustered Index is an Index which will order all the records of a table in an orderly way.So this naturally will lead to a question ,How many Clustered Indexes a table can have?well the answer is simple, physically we can order a table in in only one way ,So the answer is simple there will be only one Clustered Index per table .Remember Clustered Index orders the entire table in a particular order and one more important thing to understand before going to Non Clustered Index is that the Leaflevel Nodes of a Clustered indexes are actual Data.

Now we will dig deeper into NonClustered Index.

In non clustered index the leaf level nodes of the Btree is not actual data as in clustered, but it contains a pointer, so you may think how this pointer will actually help to find me the actual record ,well this pointer points to either a RowID of the table or a Clustered index Key if there is a Clustered index on the table to fetch the actual data.As you might be wondering why you use a non clustered index when there is this extra process of pointer to fetch the records which will effect performance, well we know we can only create one clustered index per table but we can create as many as non clustered indexes i will explain this further in detail later ,but for now remember even though the performance of non clustered index is low to clustered index but it gives a flexibility to create more than one non clustered index and for some queries how the performance an be improved without the need of a clustered index , and non-clustered index is far better than than row by row operation for highly selective columns of large tables

Before you read any further I strongly recommend to understand the above process,this might be basic but is lays a solid foundation how the Optimizer will use the indexes and fetches the records for performance, rest of the post is totally dependent on this.

I will explain in detail more about indexes in next post .

Tuesday, 27 December 2011

SSRS ExecutionLog For Performance

SSRS Executing log table contains important metrics for performance.It allows to see the reports execution results which contains information aboutthe user who executed reports,the amount of time it has taken for data retrieval,time taken to process,time taken to render,no of bytes,number of rows ,time the report started execution,time the report completed etc .These metrics can be further analysed ,suppose we can use the no of rows and bytes to check which report is giving large number of records ,also the execution time for data retrievel and further we can use other performance tools to look into the details of query if there is any other issues such as blocking .

Thursday, 22 December 2011

SSIS Connection Managers

There are many connecton managers ,the most popular connection managers are ADO.NET,OLEDB
so 'when to choose what' is important.when you need speed, use oledb else use Ado.Net .It Provides greater flexibility where you can execute different scripts with parameters but marshaling can be overhead which gives ado.net that extra flexibility

Another important connection manager which is less known is Multi Flat-file connection manager,it allows to load multiple files.It brings major performance benefits as everything is executed within a data flow

Upcoming SSIS Books

Two new SSIS books are coming from well known authors in early 2012

Professional Microsoft SQL Server 2012 Integration Services

SSIS Design Patterns

SSIS Non Relational Source Optimisation

When dealing with non relational sources a lot factors can play important role increasing the performance of the flow.As we know ssis moves the data in buffers in the data flow,so the buffer size play an important role in the overall speed of load.When we connect to non relational sources such as text file ,the source component allocates default data type string of length 50 as shown below

which might consume a lot of buffer size.You can optimise by selecting a sample of rows and analysing the data types and length.This can be done on connection manager editor advanced properties and suggest types tab and accept the suggest column types as below

and after accepting suggest column types it will optimise to minimum length. based on the column length as
shown below

This is an important step as it impacts buffer size heavily

Wednesday, 21 December 2011

SSIS DataFlow Destination Optimisation

Use FastLoad at Destination ,try to use the batch size depending upon the buffer size .If you use Fast load and Buffer size of 0 it will push all the rows in single transaction,it may create some back pressure on the flow .

Avoid using sqlserver destination if your destination is on different server.

Make sure to turn on trace flag 610 for minimal logging.

SqlServer Business Intelligence