1. Computing

Compiling Regular Expressions

For Speed, Security, and Better Code Organization


Updated November 18, 2007

In the article Regular Expressions in VB.NET, I show the basics of coding regular expressions in VB.NET. Regular expressions are a 'language in a language' with a history that starts before Visual Basic or even just B.A.S.I.C. and they're used in a lot of programming languages. For this article, I use the "telephone number" RegEx from the article above. Read that article for a more detailed explanation of why it works.

Although they're very handy by themselves, regular expressions can be even more useful as compiled DLL modules for all the same reasons that you compile anything: speed, security, and a better way to organize code in libraries.

Because it's a 'language in a language', there's no standalone compiler for RegEx in VB.NET. Instead, there's a static method that is part of the normal RegularExpressions namespace. The method is called CompileToAssembly and you will usually call it with two parameters which, naturally enough, are the 'source code' input and the 'compiled assembly' output. (There are also overloaded methods that let you include custom attributes that can be passed to the compiled DLL. The article Attributes in VB .NET explains what attributes are in VB.NET.)

To see just how much improvement is possible using compiled regular expressions, this article will show how to compile one. Then the StopWatch component of the System.Diagnostics namespace will be used to compare how fast a regular "inline" execution of the RegEx is versus the same compiled RegEx.

First, we have to compile the RegEx. This requirement will probably result in the creation of a utility if you use very many compiled regular expressions. Here's the way I did it. (Arguments in the event Sub's are not shown in this article to save space.)

Private Sub CompileRegEx_Click( ...
   Dim myRegexString As String = _
      "^1?\s*-?\s*(\d{3}|\(\s*\d{3}\s*\))" & _
   Dim RegExNameSpace As String = "myRegExNS"
   Dim RegExType As String = "myRegExType"
   Dim RegExIsPublic As Boolean = True
   Dim RegExAssembly As New _
   Dim CompileRegExParms As New RegexCompilationInfo( _
      myRegexString, _
      RegexOptions.Compiled, _
      RegExType, _
      RegExNameSpace, _
   Dim CompileRegArray() _
      As RegexCompilationInfo = {CompileRegExParms}
   Regex.CompileToAssembly(CompileRegArray, RegExAssembly)
End Sub

Notice that the actual RegEx is now just a string (instead of being declared as a RegEx as it was in the article referenced earlier). That's because it's passed to the RegexCompilationInfo to be saved as a string property. The illustration below shows the property displayed in the MsgBox.

Click Here to display the illustration
Click the Back button on your browser to return

In addition to the actual text of the RegEx, we need to declare ...

  • The namespace that the compiled RegEx will be in: myRegExNS
  • The name of the type of the compiled RegEx: myRegExType
  • Whether the compiled Regex will be public or not: RegExIsPublic
  • The name of the assembly (the compiled DLL) for the RegEx: myRegExAssembly

Most of this information is simply passed to the New constructor for the RegexCompilationInfo object. The assembly name is used when the RegEx is compiled.

One additional detail needs doing. The Regex.CompileToAssembly method actually expects an array of RegexCompilationInfo objects, not just one. The assumption is that you will compile a lot of different regular expressions into the same DLL assembly. That's why this statement is necessary:

Dim CompileRegArray() _
   As RegexCompilationInfo = {CompileRegExParms}

Once all this is done, compiling is just a method call:

Regex.CompileToAssembly(CompileRegArray, RegExAssembly)

On the next page, we use our compiled RegEx and check the speed.

  1. About.com
  2. Computing
  3. Visual Basic
  4. Using VB.NET
  5. Compiling Regular Expressions

©2014 About.com. All rights reserved.