• home
  • forum
  • my
  • kt
  • download
  • Your First Script Component in SQL Server 2005 Integration Services

    Author: 2007-09-03 10:32:09 From:

    In Chapter 2, you learned about the three types of Data Flow components in SSIS: sources, transformations and destinations. This chapter describes these different types and show you how to create your first transformation component using script. Once you have mastered the Script Component, you will find that you can readily integrate almost any functionality into the SSIS Data Flow. Custom scripted data sources can handle otherwise unsupported file formats; custom transformations can call functions in managed assemblies, including the .NET Framework; and custom scripted destinations enable SSIS to output data in very flexible ways. In fact, it is quite possible to write sophisticated data flows for ETL and data integration using only script components!

    Script Component Types
    You will typically use script somewhat differently in each of the three component types.

    Script Source Component
    Source components have no input columns, but do have output columns. The purpose of the Script Component in this case is to deliver data to the output columns. For example, the component author could write a script using file and string handling functions to parse a complex text file whose format is not suitable for parsing with the Flat File Connection Manager.

    Script Transformation Component
    Transformation components have input columns and output columns. In these components, the script typically will transform the data in some way between inputs and outputs. What these transformations are is up to you¡ªthat is why scripts are so flexible. Chapter 2 discussed two different patterns of transformation: synchronous and asynchronous . Script components can handle both patterns quite easily.

    A synchronous component, you will remember, is particularly useful for row-by-row transformations. For example, I may have incoming data that includes customer names. Perhaps for easy cross-referencing with other customer records, I would like to calculate a SOUNDEX value for each customer name. SOUNDEX is an algorithm invented by the US Census for codifying names to take account of different spellings. T-SQL has a SOUNDEX function, but SSIS does not.

    An SSIS developer can code a SOUNDEX function in Visual Basic .NET script. With this, the script can transform every incoming customer name and emit a SOUNDEX value at the output.

    An asynchronous component is useful for performing operations which change the shape of the data significantly, or where incoming rows do not have related rows at the output. A good use of an asynchronous script would be to aggregate text. The SSIS Aggregate component is very powerful, but it can only perform Min and Max calculations against numeric columns. If you need to be able to calculate the Min and Max values of a string column, this can easily be achieved in Visual Basic .NET script, as we shall see in Chapter 8.

    Script Destination Component
    As mentioned previously, it is possible to have a text file that the Flat File Connection Manager cannot parse, but which a script source component can handle. In an enterprise where text files like this are important to legacy applications, you may also need the ability to write data to a file in this format for the legacy application to read. The script destination component is useful in these circumstances. You will have guessed by now that a script destination component has input columns but no output columns. Instead, the Visual Basic .NET script handles the data, perhaps using file and string routines from the .NET Framework to output text files in the appropriate format.

    Adding a Script Component to Your Package
    The first Script Component we are going to look at it is a transformation component. In fact, we are going to build a simple SOUNDEX component to transform a column containing a name to a codified value representing the sound of the name, just as the US Census would do. As this first component is a transformation, it requires some data to work with. We can quickly build a package containing a Data Flow and a source component to get started.

    Preparing the Package
    Use the following steps to prepare the package:

    1. Create a new SSIS package.
    2. In the designer, drag a Data Flow Task from the Control Flow Items tab of the Toolbox to the Control Flow design surface.
    3. Double-click the Data Flow Task to open the Data Flow design surface.
    4. Drag an OLE DB Source component from the Data Flow Sources tab of the Toolbox .
    5. Double-click the OLE DB Source component shape on the design surface to open the OLE DB Source Editor .
    6. Click the New button to create a new OLE DB Connection Manager .
    7. Select an existing connection to your AdventureWorksDW database, or create a new one now. This assumes you have installed the sample databases with SQL Server.
    8. Click the OK button to return to the OLE DB Source Editor .
    9. Select the Table or View Data Access Mode .
    10. Select the [dbo].[DimCustomer] table. ( DimCustomer stands for Customer Dimension and is not a reflection on the intelligence of AdventureWorks customers! DimEmployee, however, may be a different matter.)
    11. Click the Preview button if you would like to see the data in this table. Note the Last Name field that we shall be using later.
    12. Click the OK button to close the OLE DB Source Editor .

    Now we have some source data to work with. At this stage, your package design should look like the example in Figure 7.1.


    Figure 7.1: Package to Which We Will Add Our First Script Component.

    Adding the Script Component

    1. In the designer, drag a Script Component from the Data Flow Transformations tab of the Toolbox to the Data Flow design surface.
    2. When you drag the Script Component, the Select Script Component Type Dialog will appear, prompting you to select the kind of component you want to create (see Figure 7.2). This is important, as the configuration of the Script Component is somewhat different for each type. This dialog box sets up the component for you automatically, saving some work in creating this configuration yourself.


      Figure 7.2: Select Component Type Dialog Box.
    3. Select Transformation (the default) and click the OK button.
    4. Now connect the output of the OLE DB Source to the Script Component . At this stage your package should look like Figure 7.3. Note the warning icon that appears in the Script Component . You can mouse over that to see the text of the warning. In this case, we have not yet added any script code, so the component at this point is in an invalid state.


      Figure 7.3: Package with OLE DB Source and Script Component Connected.

    Adding Columns to Your Script Component
    The next step is to tell your Script Component which input columns to work with. The Script Component requires you explicitly to say in advance which columns are required. This is because the component creates a wrapper that exposes these columns to the scripting environment. It would be expensive and redundant to expose all the input columns by default when you may only be using one of them.

    Selecting the input columns is easy. Just follow these steps:

    1. Double-click the Script Component shape on the design surface to open the Script Transformation Editor .
    2. The default view is of the Input Columns tab. In this case, check the box next to the Last Name column. The component editor should now look like Figure 7.4.


      Figure 7.4: Component Editor with Last Name Column Selected.

    Now we can use the Last Name column in our component. Note that we have left the Usage Type of the column as Read Only . This is because we do not want to alter the last name. Instead, we want to calculate a new column from the values in this column. But where will these new values go? We need to create a new column to hold these new values. This new column will not appear at the input. It is being calculated by the script, so it can only appear at the output.

    We should add an Output Column, as follows:

    1. Select the Inputs and Outputs tab of the Script Transformation Editor .
    2. Expand the Output node of the Inputs and Outputs tree view.
    3. Select the Output Columns folder under the Output node.
    4. Click the Add Column button. This will add a new Output Column , named Column by default. The data type will be a four-byte signed integer. You can see this in Figure 7.5.


      Figure 7.5: New Output Column.

      This default column is not quite what we need. We can, however, edit its properties.
    5. Edit the Name property of the column to be Soundex. The column name should change in the tree view, too.
    6. SOUNDEX codes are strings, so select the Data Type property of the column and use the drop-down list to select the type String [DT_STR] .

    At this stage, your component should look Figure 7.6.


    Figure 7.6: Renamed Output Column.

    We have selected an Input Column ( LastName ) and created an Output Column ( Soundex ) to hold the transformed value. Now we are ready to write some code!

    Elements of the Script
    Navigate to the Script tab of the component. The component editor will now look like Figure 7.7.


    Figure 7.7: Script Tab of the Script Component.

    There are some useful properties available in the property grid, but they are, in fact, optional. However, it is a good idea to set the PreCompile property to True .

    To get started, just click the Design Script button. The VSA editor will open, looking like Figure 7.8.


    Figure 7.8: Script Editing Environment for a Component.

    In Chapter 3, you learned about the VSA environment and its various elements. These should be familiar to you now, but some aspects of the script for a Script Component will certainly be new, compared to the Script Task we looked at earlier. First, look at the Imports statements for the Script Component in Listing 7.1.

    Imports System
    Imports System.Data
    Imports System.Math
    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Listing 7.1: Imports Statements for a Script Component.

    You can see that rather than importing the Dts.Runtime namespace, there are two ¡° wrappers¡± to be imported. These wrappers enable a script within a Data Flow Task to ¡° see out¡± from the task to objects such as variables. In Chapter 8, you will learn that this introduces a slightly different way of accessing variables.

    In the Script Task, the Main() subroutine was where most of the work was done, but here in the Script Component, we see something a little more involved, as in Listing 7.2.

    Public Class ScriptMain
       Inherits UserComponent

       Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)
          '
          ' Add your code here
          '
       End Sub
    End Class

    Listing 7.2: ScriptMain Class in the Script Component.

    The ScriptMain class performs those duties we mentioned earlier ¨C wrapping and exposing the Data Flow columns and objects to the script. Within ScriptMain you can see the following subroutine: Input_ProcessInputRow(ByVal Row As InputBuffer) .What does this mean? We will cover this in more detail later, but for now, all we need to know is that when the input to the Script Component is processed, SSIS in effect calls this routine for every row. In practice, this means that we can immediately start writing useful script code.

    Editing the Script Component
    The example we are going to implement is a function to convert the customer¡¯s last name to a SOUNDEX code. Listing 7.3 shows the function that we will use for SOUNDEX, written in Visual Basic .NET code.

    Function CalcSoundex(ByVal sName As String) As String
       Dim i, acode, dcode, prevCode As Integer
       Const codes As String = "01230120022455012623010202"
       SName = UCase(SName)
       CalcSoundex = Left(SName, 1)
       prevCode = Asc(Mid(codes, Asc(sName) - 64))

       For i = 2 To Len(SName)
          acode = Asc(Mid(SName, i, 1)) - 64
          ' we are not interested in symbols or numbers
          If acode >= 1 And acode <= 26 Then
             ' convert the character to a digit based on soundex code
             dcode = Asc(Mid(codes, acode, 1))
             ' ignore repeats
             If dcode <> 48 And dcode <> prevCode Then
                CalcSoundex = CalcSoundex & Chr(dcode)
                If Len(CalcSoundex) = 4 Then Exit For
             End If
             prevCode = dcode
          End If
       Next
    End Function

    Listing 7.3: A Simple SOUNDEX Function in VB.NET Script.

    We do not really need to understand much about the internals of this function for this example. It returns a code based on the consonants in a string according to the SOUNDEX algorithm described earlier. For now, write this function into your script code, immediately after the Input_ProcessInputRow function, but still within ScriptMain() .

    The next step is to reference the function in our script. For each row coming in to the component (in the input buffer) we will set the value of the Soundex column to the value returned by CalcSoundex based on the LastName column. Just below the line Add your code here , type: Row . That is, Row followed by a period. As shown in Figure 7.9, IntelliSense (within the editor) offers a list of options to complete this object, including the available columns (in this case LastName and Soundex ).


    Figure 7.9: Using IntelliSense in the Scripting Environment.

    In this case, select Soundex and complete the rest of the statement as shown in Listing 7.4, using IntelliSense if you like to complete the other references in this statement:

    Row.Soundex = CalcSoundex(Row.LastName)

    That is all! Your script is ready to run. Close the scripting environment. SSIS automatically saves the script for you. Now click the OK button in the Script Transformation Editor and return to the Data Flow design surface. The validation icon has now gone¡ªyour script is valid and you are ready to complete the Data Flow for testing.

    Debugging the Script Component
    In Chapter 5, you learned how to debug the Script Task by setting breakpoints within the script itself. The VSA design environment does enable you to set breakpoints in a Script Component, but the SSIS designer and debugger ignores them when executing a package.

    There are a number of reasons for this, but it is mainly due to the nature of the Script Component , which (unlike the Script Task) does not call a script only once, but many times within an execution.

    However, SSIS does provide some very elegant techniques for debugging Data Flows in general, and we can use these techniques to understand if our script is performing as expected. Later we will look at more fine-grained, row-by-row debugging. For now, we will see how to debug the script working against the Data Flow as many rows pass through it.

    Debugging a Data Flow with a RowCount Component
    You might think that you could execute the SSIS package right now, with just an OLE DB Source and a Script Component. In fact, this is possible, but not much will happen. If you were to try it, you would see no activity. You could find the explanation on the Progress tab of the designer, where you would see the following message:

    Warning: Source "OLE DB Source Output" () will not be
    read because none of its data ever becomes visible
    outside the Data Flow Task.

    SSIS has recognized that your data is not going anywhere, so it has optimized the Data Flow by removing the redundant components! Of course, the data from your Script Component (including your Soundex column) is not going anywhere, so you have no opportunity to see this data.

    In most data-integration applications, you would send the data to a temporary destination in order to examine it and see if the process had transformed the data correctly. SSIS provides an excellent way to debug your data without having to create temporary tables or files.

    The Row Count component simply counts the passing rows wherever you add it to the Data Flow. This component can be very useful for debugging. It gives you a count of the rows at a particular point and SSIS will not optimize it out of the Data Flow. This enables you to execute the Data Flow with a Row Count component at the end of the flow, even though you have added no destination. In other words, you can execute your SSIS package and debug it, even though the data is not going anywhere¡ªit is just counting rows!

    In Chapter 5, you learned how to create SQL Server Integration Services variables. Create a variable now, named RowCount of Int32 type, as follows:

    1. Select SSIS > Variables to show the Variables window.
    2. Click the New Variable button to create a new variable. This variable will be of type Int32 by default.
    3. Rename it to RowCount (see Figure 7.10). You can now close the Variables window if you like.


      Figure 7.10: Data Flow Designer and Variables Window with RowCount Variable Added.
    4. Drag a Row Count Component from the Data Flow Transformations tab of the Toolbox to the Data Flow design surface.
    5. Now connect the output of the Script Component to the Row Count Component .
    6. Edit the Row Count component by double-clicking the component shape in the designer.
    7. In the Advanced Editor for Row Count , set the VariableName property to the name of the variable that you created: RowCount .

    If you were to execute the package now, the number of rows passing through the Row Count component would be written to the named variable.

    Now that you can execute the package and the Data Flow within it, you can start to debug the output of the script. You shall see whether the script has calculated the SOUNDEX value correctly. The best way to see the output from the Script Component is to add a data viewer to the Data Flow on the path between the output of the Script Component and the input of the Row Count Component.

    1. Right-click the line between path between the output of the Script Component and the input of the Row Count Component .
    2. Select Data Viewers .
    3. The Data Flow Path Editor will appear, with the Data Viewers tab visible by default (see Figure 7.11).


      Figure 7.11: Data Flow Path Editor
    4. Click the Add ¡­ button in the Data Flow Path Editor to show the Configure Data Viewer dialog box (see Figure 7.12).


      Figure 7.12: Configure Data Viewer Dialog Box.
    5. Select the Grid type of Data Viewer, then click the Grid tab.
    6. The Grid tab shows all the columns currently available selected on the right hand side. In fact, we only need to see the LastName and Soundex columns in order to debug our script. Use the arrow buttons to move all the other rows to the Unused Columns list on the left-hand side. In fact, it may be easier to move all the columns over to the left-hand side and reselect LastName and Soundex (see Figure 7.13).


      Figure 7.13: Grid Data Viewer Configured to Show Only Required Columns.
    7. Click the OK button to close the Configure Data Viewer dialog, and click OK again to close the Data Flow Path Editor .

    Now the Data Flow is ready for debugging. The Data Flow designer shows the Data Viewer on the path between the Script Component and the Row Count Component with a ¡° spectacles¡± icon (see Figure 7.14).


    Figure 7.14: Completed Data Flow.

    Running a Data Flow with a Script Component
    To execute the package, and thus the Data Flow, you can use the F5 key, or select Debug? Start Debugging from the main menu, or right-click the package itself in the Solution Explorer and select Execute Package . When the package executes, you will notice that the Data Viewer window appears. You can drag and dock this Data Viewer to any size or convenient location in your designer or on your screen.

    On execution, the Data Flow task reads a buffer of data from the OLE DB Source component. The Script Component acts on this buffer, transforming each row using our function. The Row Count component should next count these rows, but before it can do so, the Data Viewer will pause execution of the Data Flow and the package.

    At this point, all the components in the flow will be colored yellow, to indicate that they are in progress and have not yet completed. When the Data Viewer has received a buffer of data, it displays the columns you selected in the viewer. As shown in Figure 7.15, the grid now shows the results of our SOUNDEX calculation for the first buffer of data¡ª4872 rows.


    Figure 7.15: Data Flow Executed.

    You can examine all the values to ensure that the function is working correctly. In this example they certainly appear to be correct. If necessary, you can even copy the data to the clipboard and paste it into other applications, such as Microsoft Excel, for further analysis.

    To see the next buffer of data, click the green Continue button in the top left-hand corner of the Data Viewer. The Data Viewer will show the next buffer of data and pause again. Alternatively, if you have finished examining the data, you could click the Detach button and the flow will continue without pausing. Or, you could just close the Data Viewer.

    When the Data Flow is complete, all the components in the Data Flow should be colored green to show that they have all completed successfully. You can now stop debugging, using F5 if you like, just like any other SQL SSIS package.

    Summary
    As you can see, script components are versatile. They are easy to add to your Data Flow, quite simple to program, and give excellent performance. Add to that the power of visual debugging and you can see why they are such an exciting feature of SSIS. Mastering the Script Component can be your key to a vast range of data integration functionality.

    discuss this topic to forum

    relation tutorial

    No relevant information

    Category

      Miscellaneous (8)

    New

    Hot