Thanks to all who attended the Learn RegEx in TSQL presentation at the Philly.Net Code Camp 2011.2. Especially those who took the time to catch up with me throughout the day.
RegEx is a big topic and is hard to learn in only an hour and 20 minutes. I do hope that you got enough information to get yourself started in learning how to use RegEx in TSQL in your own environment.
All the links discussed in the presentation are included throughout this post. The PowerPoint was not included since it seemed kind of pointless without me also attached to the presentation. Some of the content was extracted and included in this post.
The Class!
Code Camp Heroes Bill and Rob
Code:
Down Load The code for the presentation.
The file 01 - installRegexAssembly.sql is my version of the create assembley and create of the RegEx functions in the [util] schema.
Goal:
Get you started using RegEx in TSQL
- Load the toolbox with techniques
- Exposure to variations
- Under the hood knowledge
Warning
- There are many ways to write a Regular Expression!
- RegEx isn’t inherently better then other solutions, even in TSQL.
What Are Regular Expressions?
Tools for matching and manipulating text RegEx pattern = as{2}
Will Match
- ass
- assume
- passenger
- class
Will NOT match
Why use RegEx?
Over standard TSQL RegEx can:
- Be faster
- Be easier to code
- Require fewer steps
- Accomplish more complex tasks
What can RegEx be used for?
- Validate Data Format
- Scrub Data
- Same as standard TSQL text functions but allow for more complex criteria
- Replace()
- LIKE ‘%’ --Technically not a function but you get the idea
- PATINDEX()
- SUBSTRING()
How to use RegEx in TSQL?
Formula for RegEx in TSQL
SQL 2005 and up supports .Net CLR for User Defined Functions
.Net has built in base class library for RegEx: System.Text.RegularExpressions
Combine above two facts -> RegEx in TSQL
Now how do we actually use RegEx in TSQL?
Write the code for the .NET based UDFs ourselves
OR
Use someone else's code!
http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/
The is an excellent article by Phil Factor over at Simple-talk. There are a few parts to this article including the basics of writing .NET based functions in TSQL. There is also a file that lets you jump start using RegEx in TSQL.
Keep in mind you will need to be running SQL 2005 or newer for this code to work. It will also work on express versions of SQL server too. The CLR will also need to be enabled for the database you want to install the RegEx functions.
There are 4 files at the top of the article you can download:
- InstallRegex.sql
- installRegexAssembly.sql
- installRegexSample.sql
- RegexSQLCLR.vb
The fourth file is a text file with the VB code that gets compiled in to the DLL. You do not need to do this unless you are curious and interested in the details.
To jump right in to using the TSQL UDFs in your data base just run the first to files. The first one, despite its name creates the Assembly on the SQL db. The second creates the functions in TSQL that are implemented by the code in the assembly.
NOTE: The code linked to above installs the functions in the [dbo] schema. The version used in the presentation was modified to create a new Schema [util] and install the assembly and functions under that schema.
Overview of the RegEx Functions
The functions installed by the SQL scripts will create 9 funtions that will be useable in your TSQL code. The break down as follows based on return type:
- Boolean
- Text Returning
- RegExReplace ()
- RegExReplaceX ()
- RegExMatch()
- Table Valued
- RegExSplit()
- RegExMatches ()
- Integer
- RegEx Utility
- RegExEscape()
- RegExOptionEnumeration()
Tips for Learning RegEx patterns
Have a good cheat sheet. Here are some I've used
http://www.regular-expressions.info/reference.html
http://regexlib.com/CheatSheet.aspx
http://regexlib.com/DisplayPatterns.aspx This link is to a libray of expressions for matching common patterns.