Greetings, and Happy Friday. I want to check myself in regards to the proper use of connections in a data flow script component being used as a transformation. I want to ensure that the solution is ideal from a performance standpoint, but more importantly that there won't be any resource leaks from connections that are not being closed or objects that are not being disposed properly.
This transformation script component is writing to a table based on the rows that pass through Input0_ProcessInputRow(). What I'm doing so far is, in the script component script, I've declared a class-level variable of type System.Data.SqlClient.SqlConnection. In the PreExecute() event handler I am initializing this connection with the following code (which comes after the call to MyBase.PreExecute()):
Dim connMgr As IDTSConnectionManager90
connMgr = Me.Connections.JobDB
adoConn = CType(connMgr.AcquireConnection(Nothing), SqlClient.SqlConnection)
If adoConn.State <> ConnectionState.Open Then
adoConn.Open()
End If
connMgr = Nothing
The adoConn variable referred to there is the class-level variable. So, my first questions:
Is this the best way to initialize the connection? And is that code checking adoConn.State necessary? Can I pretty much assume that the connection manager is going to give me an open connection?
Then in the PostExecute() event handler I have this code (again, after the MyBase call):
If Not (adoConn Is Nothing) Then
If adoConn.State = ConnectionState.Open Then
adoConn.Close()
End If
adoConn.Dispose()
adoConn = Nothing
End If
Next questions:
Is this adoConn variable actually holding a reference to an object whose primary owner is in the connection manager? Is it necessary to close my SqlConnection this way, or is that going to actually close the connection manager's connection?
I'm also wondering whether I need to have this same code in the script component's Finalize() handler also...if there's an error I don't want the resource to leak. Or will PostExecute() fire even if there's an error in Input0_ProcessInputRow()?
Thanks in advance,
Dan
Not necessarily in order:
The ADO.NET Connection Manager returns an open connection object. I would rely on it to do its job and assume a problem with the connection manager if by chance you get a connection object that not's open.
I would also rely on the connection manager to do its work afterwards. I suggest that you call ReleaseConnection rather than closing the connection directly.
While it's correct to use PreExecute and PostExecute for things that you only need to do once, you're not taking full advantage of the Integration Services component architecture by (apparently) collapsing a source, a transformation, and a destination into a single component. You may find that all three pieces will be more flexible, and more easily adapted for use elsewhere, if you separate them out. Otherwise, in a sense, you have little more than an ADO.NET application wrapped with the overhead of the SSIS runtime.
The ReleaseConnection method of the connection manager should be taking care of Dispose/Finalize housekeeping.
-Doug
|||I would also rely on the connection manager to do its work afterwards. I suggest that you call ReleaseConnection rather than closing the connection directly.
Thank you. I will look into ReleaseConnection.
you're not taking full advantage of the Integration Services component architecture by (apparently) collapsing a source, a transformation, and a destination into a single component.
This is a keen observation, and one I'd like to discuss further if you're open
to it. I struggled with finding the best design for this particular situation that fits within the SSIS paradigm and built-in tools. Believe me, if I thought there was a better way than using a script component, I would embrace it, becuase I've been struggling with getting the DataSet caching, managed by PreExecute and PostExecute, to work. If you were to peek into the other parts of my SSIS packages, you would find I think traditional uses of lookups and derivations and unions and destinations.
My situation is that I have a row from a tab-delimited text file flowing through this dataflow that, among 185 other fields, has five fields in it that are not normalized. That is, these five fields each contain a comma-delimited string of value codes. Another way to say it is that these are multi-value fields.
So each mutli-value field must be parsed, and then each sub-value must be looked up in a mapping table. I don't think there's a straightforward way to do that using the built-in transformations.
I can imagine what you might be thinking at this point as an obvious solution, but the kicker is that in advance I will never be able to know how many subvalues any of these five fields could have. There could be anywhere from zero to a dozen for a given row and field. So that makes it tough to use the Derived Column transformation to just parse them and add new columns, because I can't know in advance how many columns will be needed to add.
Furthermore, downstream from that I would not know how many Lookup transformations to create to resolve the values. Perhaps some kind of "Foreach" functionality at the field level within the dataflow would be helpful in a future SSIS version, but in the meantime it seems to me the script component is where it's at.
Thoughts welcome, and thanks again, Doug, for your input on connection management.
Dan
|||
It sounds like what you're really designing is a source (more than a transformation), that needs to parse the multi-valued columns before presenting the data to the Data Flow.
The metadata-driven solution that would be the ideal theoretical solution would be possible using the Integration Services object model, but would require a heck of a lot more lines of code than you now have in your Script component. While lots of users have been asking about self-configuring packages that are driven by run-time metadata, I don't believe I've seen any examples of this yet. Changes in a source ripple all the way downstream through the data flow, so it would not be nearly so simple as just changing (programmatically) the number of output columns coming out of your source.
(As an aside: The IS team took great pains to prevent access from any one component to the package object model and to other components, beyond what was absolutely necessary, like variables and connections. However in Service Pack 1 a means will be provided (IDtsPipelineEnvironmentService) to give custom data flow components programmatic access to the parent data flow task and other objects in it. This would enable a custom solution along the lines of the Load SCD Wizard, which creates transformations etc. as needed...or like a custom source that creates or configures downstream Derived Column and Lookup transformations as needed.)
More clever minds may offer you more clever solutions. :-)
Best wishes,
-Doug
No comments:
Post a Comment