Transformation Pattern with SQL, XML and CROSS APPLY

data-integration-sql-xml-cross-apply

You don’t have to live with the repeated pain of transforming xml to rowset values when you need data from xml data stores. You also don’t need to transform and copy the data to relational database tables in MS SQL Server to be able to query easily in the future.

I like to leave the data in its original form. Store it in a column as xml. Done.

The T-SQL I will demonstrate is an excellent way to extract data from xml. However, that is not all. At the end of the post, I will show you an excellent way to deploy this T-SQL with views so that you can forget that the underlying data is stored and structured with xml.

The database objects and fictitious health information data shown below were created for this post to illustrate the techniques mentioned. Although they were created for illustration purposes only, similar data structures can be found in real health information systems.

Examine the following IncomingOrder table which includes an xml column.

XML-Stored-In-a-column

It was populated using this query:

insert IncomingOrder (IncomingOrderXml)
Values (‘
<IncomingOrder>
<Patient patientId=”10001″>
<ICD10Code>M00-031</ICD10Code>
<ICD10Code>M47.012</ICD10Code>
<ICD10Code>M75.102</ICD10Code>
</Patient>
<Patient patientId=”10002″>
<ICD10Code>M00-031</ICD10Code>
<ICD10Code>M47.012</ICD10Code>
</Patient>
<Patient patientId=”10003″>
<ICD10Code>M47.012</ICD10Code>
<ICD10Code>M75.102</ICD10Code>
</Patient>
</IncomingOrder>
‘)

insert IncomingOrder (IncomingOrderXml)
Values (‘
<IncomingOrder>
<Patient patientId=”10001″>
<ICD10Code>M75.102</ICD10Code>
</Patient>
<Patient patientId=”10002″>
<ICD10Code>M75.102</ICD10Code>
</Patient>
</IncomingOrder>
‘)

Notice that two rows are added. This will help illustrate how our script is able to pick up multiple ICD10Codes across rows.

Here is the CROSS APPLY query used to transform the xml data into a rowset.

select
dt.c.value(‘../@patientId[1]’, ‘INT’) as Patientid,
dt.c.value(‘.’, ‘VARCHAR(50)’) AS ICD10Code,
io.IncomingOrderDate
from IncomingOrder io
CROSS APPLY io.IncomingOrderXml.nodes(‘//Patient/ICD10Code’) as dt(c)

Essentially, dt is a derived table of ICD10Codes with a derived xml column (c) that can be queried with XQuery.

Cross-Apply-Xml-To-Rowset

Even if the results can be achieved with XQuery alone, employing CROSS APPLY drastically increases performance. This is a great segway for my discussion on the deployment of this script. Given acceptable performance, it is advisable to use this cross apply query inside a view, so that subsequent queries can be written without direct XQuery knowledge as well as without direct knowledge of the underlying xml structure.

XQuery-Cross-Apply-Inside-View

What is the benefit? No need to Extract, Transform, and Load into your precious RDBMS. Instead, you can treat the xml source as having relational data. Once the views are in place, you can share the view with the rest of the development team. Every level of SQL developer can then query the xml data, easily joining it with tables and other well known rowsets.

Here is a simple query that takes advantage of the new view and joins to other tables for rich interaction between data stored in traditional tables and data stored in the xml column. Notice that part of the data comes from xml but the structure of the underlying xml source has been completed hidden.

SQL-Query-Abstracts-XML-Store

What happens to the view output when the xml structure varies or outright changes?
The older rows with the known xml structure return values, while nothing is returned for the new rows with new xml structure. You may call this a limitation, but I call it “nothing found” or it doesn’t explode.

You then have the choice to write new views or to update the existing views to accomodate varying xml structures. I usually go with the former option. New xml structures inherently mean new types of data, so multiple views representing multiple structures makes sense – and, both views can coexist.

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>