Wednesday, 25 July 2012
Friday, 13 April 2012
SSIS Indirect Configurations
Configurations in SSIS are mainly used to make the package independent of the environment dependent objects such as connection strings ,location of files etc.. When we deploy our package to different server with different configurations we can easily apply the environment dependent objects through configurations .
I will implement direct configurations initially than i will extend to indirect configurations.
I'm using the same package of my previous demo,a simple package of for each loop for a collection of excel files,which loads the files dynamically to a database table.So the connection string for excel source is implemented dynamically by using variables with expression as i explained in my previous post,and the destination is connection string pointing to a table in my local database SSIS Demo
I will create a configuration file for these two
In order to create configuration you can either go to SSIS menu or right click on your design surface and add configurations
I have selected configuration type as xml configuration file ,and given a location for it.remember configurations can be implemented using many ways with xml files or sqlserver or environment variables in this demo i will be using combination of xml files and environment variables.In the next step we will selecting the environment dependent variables
This example simply selects connection string for our destination database and variable value. By Capturing these we can make changes in our destination server configuration with relevant values which are local to that environment .In next step give a name and finish.
Save the package and manually copy the package to the other server.
Bring DTEXECUI by searching for programme file for DTEXECUI
Double click DTExecUI
Give the connection details of the package location and than add configuration file as shown above.The Configuration file should be edited as per your environment.
This is a direct configuration,as we need to provide the hard coded location of the configuration file which is not a flexible approach.In order to avoid this we will create an environment variable pointing to the location of configuration file
Environment variables can be set by right clicking the computer and advance system settings and environment variables .
and finally configure the environment variable. and give this environment variable name while executing the package on the destination server with the environment variable set.
Thursday, 12 April 2012
Dynamically Loading excel files using SSIS
In this post i will design a package for looping through excel files on a 64 bit computer as it requires few extra steps otherwise the package will fail.
The Package Looks like this
For Each Loop will be looping through all sales files and will be mapped to Salesfilename Variable
All the files are located under C:\
Open the Foreachloop container and set the Properties for Folder to enumerate and the File Qualifier as shown below
Variable Mapping
Next step is to configure connection string for the excel source
As we will provide the file name at run time we need to set few properties Add a connection string to the design surface and right click the connection string and go to properties ,expressions and set connection string as
The easiest way to set this expression is to connect to excel file initially and than copy the connection string and make necessary changes to include the variable name and reset the Expression with connection string with the expression
Open the DataFlow drag a Excel Data Source and Oledb Destination with a mapped table
Open the properties of the Excel Source and set the Validate external meta data to False other wise it will give validation error ,as the connection string gets its value at run time
Also go to the For each Loop Container properties and set delay validation to true and run the package on your 64 bit System
The Package Fails and throws an error
[Excel Source [1]] Error:
SSISErrorCodeDTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER.
The AcquireConnection method call to the connection manager "Excel Connection Manager 2"
failed with error code 0xC00F9304. There may be error messages posted before this with more information on why the AcquireConnection method call failed.
The solution is to go to Project Menu and Select ProjectProperties and go to Debugging and set Run64BitRuntime to False.
And re run the package the package will execute without any errors
And re run the package the package will execute without any errors
Saturday, 24 March 2012
SqlServer Indexes Cont..
In this post,I will be extending the discussion on indexes from my previous post,as we were discussing on non clustered indexes ,I will be covering a specific type of non clustered index called covering index.Before going in detail ,let me explain what happens when we execute a query which has got non clustered index with clustered index optional.
Take an example table Product with columns as Pid Pname PCategory PsubCategory ,assume we got a clustered index on Pid and non clustered index on Pname ,Lets execute a query like Select Pcategory,PSubCategory from Product.
We got a Non clustered index on Pcategory but not on PsubCategory so what sqlserver optimizer does
when a column in the select query is not included in an index is a book mark look up ,during this book mark look up it makes a look up to a clustered index in this case or a row id if we haven't got clustered index and fetches relative rows from the table,clearly as you can observe there is an extra step for this and this degrades performance,so in this case if we include a non clustered index on the other column PsubCategory also ,than we can avoid the book mark look as we got both the columns of the select query in the form of indexes and fetched directly without needing a lookup to a clustered index or row location .This naturally improves performance .
I recommend to have look at graphical execution plans in Management studio which will give you a quick overview of the Various metrics associated with the cost of the queries,but remember spend as much time as you can on indexes and the way optimizer executes in sqlserver
Thursday, 22 March 2012
Sqlserver Indexes
In this post I will be explaining indexes ,So the big question is What are Indexes?
Well Indexes are used mainly for performance so that the optimizer can choose the best method for fetching records for a query ,Before going in to the details of indexes, we will look at, how exactly Sql sever retrieves the information when we issue a query against its tables,Basically what Sqlserver does, when haven't got indexes on relevant table is, it will make a table scan to find the relevant information ,So what is a table scan?In a table scan what Sqlserver does is that ,it goes to each row of the table to find the relevant match.
How Does it Effects the Performance,Assume if you got Millions of records and If you scan all the rows than it consumes lot of time as you have to go to each and every row , this consumes lot of time that is why the concept of indexes was introduced.remember a lot factors contribute to the optimization of query but i want to limit my discussion only to indexes ,which play an important role in improving performance,I will discuss rest of the associated factors with indexes and query performance like statistics and fragmentation in future posts
What Indexes does is it stores the data in an orderly way ,So if you issue a query against a table which contain indexes ,the execution engine will see the existence of the index and it knows where exactly those values are going to reside and picks up the relevant records quickly by taking a particular logical path ,this operation is called a seek a operation,to do this seek operation it takes B tree approach Namely Balanced Tree approach
Picture Taken from Msdn
This Balanced Tree Approach Makes the whole process of fetching the records easy as it takes relevant path as shown in the above figure as per its data to find the records and this avoids row by row operation and improving performance
Before Going any further I will explain the two Main Type of Indexes Namely
Clustered and NonClustered
So what is a Clustered Index.A Clustered Index is an Index which will order all the records of a table in an orderly way.So this naturally will lead to a question ,How many Clustered Indexes a table can have?well the answer is simple, physically we can order a table in in only one way ,So the answer is simple there will be only one Clustered Index per table .Remember Clustered Index orders the entire table in a particular order and one more important thing to understand before going to Non Clustered Index is that the Leaflevel Nodes of a Clustered indexes are actual Data.
Now we will dig deeper into NonClustered Index.
In non clustered index the leaf level nodes of the Btree is not actual data as in clustered, but it contains a pointer, so you may think how this pointer will actually help to find me the actual record ,well this pointer points to either a RowID of the table or a Clustered index Key if there is a Clustered index on the table to fetch the actual data.As you might be wondering why you use a non clustered index when there is this extra process of pointer to fetch the records which will effect performance, well we know we can only create one clustered index per table but we can create as many as non clustered indexes i will explain this further in detail later ,but for now remember even though the performance of non clustered index is low to clustered index but it gives a flexibility to create more than one non clustered index and for some queries how the performance an be improved without the need of a clustered index , and non-clustered index is far better than than row by row operation for highly selective columns of large tables
Before you read any further I strongly recommend to understand the above process,this might be basic but is lays a solid foundation how the Optimizer will use the indexes and fetches the records for performance, rest of the post is totally dependent on this.
I will explain in detail more about indexes in next post .
Power BI Report Server
Power BI Report Server is a new product which can be used to host both Power BI reports and SSRS reports. Currently beta version is ava...
-
This article defines about Slowly Changing Dimension 3 ,the uses of slowly changing dimension Type 3 and its implementation using SSIS . ...
-
SSRS metada contains wealth of information which can be usefull under varieous circumstances , but extracting that informatio...
-
Data Depulication using SSIS 2012 The De Duplication componants used in this demo are Sort Lookup FuzzyLookup FuzzyGrouping ...