1. Technology

Regular Expressions in VB.NET

Programming Regular Expressions

By

RegEx is built into VB.NET, but you should reference the right namespace to use it. When you use RegEx, add an Imports statement:

Imports System.Text.RegularExpressions

Here's a quick example showing how to use RegEx. MatchCollection and Regex are objects in the RegularExpressions namespace. You can find complete documentation on them at MSDN. This segment from an About Visual Basic tutorial tells you how to find information there.

--------
Click Here to display the illustration
Click the Back button on your browser to return
--------

Dim myMatches As MatchCollection
Dim myRegex As New Regex("\w+")
Dim t As String = _
   "Introduction to Regular Expressions " & _
   "in Visual Basic"
myMatches = myRegex.Matches(t)
' Search for all the words in a string
Dim successfulMatch As Match
For Each successfulMatch In myMatches
   Debug.WriteLine(successfulMatch.Value)
Next

Add this code to a Button event subroutine and it will display each of the words in the string in the Immediate window.

Introduction
to
Regular
Expressions
in
Visual
Basic

The "regular expression" is the string in myRegex.

We can also try out the "telephone number" example shown earlier. To check a (USA style) phone number, this code will display True in the Immediate window.

' Check a (USA style) telephone number
Dim myRegex As New Regex( _
   "^1?\s*-?\s*(\d{3}|\(\s*\d{3}\s*\))\s*-?\s*\d{3}\s*-?\s*\d{4}$")
Dim t As String = _
   "1-800-555-1212"
Debug.WriteLine(myRegex.IsMatch(t))

"800-555-1212" also works, But an area code is required so just "555-1212" returns False.

To start writing your own regular expressions, let's try something a little easier. Validating a USA social security number is both useful and a good starting exercise. Once you understand how to do it, you'll be part of the way through the learning curve for the telephone number expression above.

A SSN is a number that must be in the pattern:

999-99-9999

or

999999999

Matching just 9 numbers (the second case)

If we were not going to test for the dash characters, it would be easy. The expression would simply be:

\d{9}

The \d metacharacter matches any number from 0 to 9. The {9} matches 9 of them. (You often see a different expression that does the same thing: [0-9]. This is a character group covering the range from 0 to 9.)

This works great ... except ... if there are 10 numbers, it still matches (since it matches the first 9). We need more qualification. There are two more metacharacters that will work here: "^" (caret) and "$" (dollar sign). Used in this context, the caret will match characters at the left side of a string. (Used in other ways, the caret means something completely different. This is one of the things that can make RegEx so complicated.) The dollar sign will match characters at the right side. So, if we want only 9 numbers, match at both the left and right and it will restrict the match. This makes our evolving RegEx ...

^\d{9}$

Matching With And Without Dashes

But what about the dashes? Simply put the dash in where it belongs as a character by itself. Then adjust the counting of the numbers in the three groups. This will match a "traditional" social security number:

^\d{3}-\d{2}-\d{4}$

But now it no longer matches a number without the dashes! To solve this problem we use the "?" (question mark) metacharacter. When a question mark follows a character or character group, it means that it will match 0 (zero) or 1 (one) time - in effect, making it optional. This is the final touch for our SSN RegEx expression:

^\d{3}-?\d{2}-?\d{4}$

Regular expressions are one of those topics, like programming itself, that you can learn more and more about and still not know everything. To make things more complex, there's no "regular expression" standard and the way it's implemented will be slightly different in different software implementations. (Java will evaluate the same RegEx differently than VB in some cases.) There's more of an "understanding" in the programming community about what a RegEx means, not a clear definition.

  1. About.com
  2. Technology
  3. Visual Basic
  4. Using VB.NET
  5. Programming Regular Expressions in VB.NET

©2014 About.com. All rights reserved.