1. Home
  2. Computing & Technology
  3. Visual Basic
photo of Dan Mabbutt
Dan's Visual Basic Blog

By Dan Mabbutt, About.com Guide to Visual Basic since 2002

A Question of Regular Expressions

Thursday April 26, 2007

Can you think of a better solution?

I don't pretend to be an expert on regular expressions although there are a couple of articles here that introduce them for you. This mini-tutorial, for example, shows how to use them in ASP.NET: Using Regular Expressions in ASP.NET

In brief, a regular expression is a string of text that will either "match" or "not match" patterns in other strings. Most people have at least seen a DOS command (or even a search) where you can find executable programs using a search argument: "*.exe". That means find all files with any name (that's what the "*" means) that end in the characters ".exe". A regular expression is the same thing with an advanced degree and a job at the bank.

"Sheela" wrote to ask a question about this regular expression:

Dim mRegExp As Regex
  mRegExp = _
    New Regex("[1-9][0-9]*(\.[0-9]*)?(\+[1-9][0-9]*(\.[0-9]*)?)*")
"Valid expressions are a number, or a decimal, or a number or a decimal strung together with other numbers or decimals with a "+" sign between them." For example, "2+2" would be a valid match. The problem was that expressions like "2+2+" were being matched as well. The way it works is:

[1-9] -- the string must start with a number between 1 and 9

[0-9]* -- the next character must be a number between 0 and 9 and it can be repeated

(\.[0-9]*)? -- the next group of characters is optional, but if it exists, then it must start with a "." and be followed by a series of numbers between 0 and 9

( ... )* -- the group of characters described below repeated

   \+ -- a plus sign
   and
   [1-9][0-9]*(\.[0-9]*)? -- the same characters described earlier

The problem is that regular expressions are "eager". That is, when they match a string, they quit with a "match" result no matter what else might be there. So "2+2", "2+2+" and "2GeorgeWashington" all match. This was a problem because the purpose of the regular expression was to validate input typed into a TextBox and it didn't do that very well.

The solution I recommended was to extract the match from the TextBox input and use that instead.

Dim matchedString As String
matchedString = Regex.Match(txtEntered.Text, _
	"([1-9]\d*(\.\d*)?(\+[1-9]\d*(\.\d*)?)*)").ToString
Debug.WriteLine(matchedString)

But ... like I said ... I don't pretend to be a Regex guru. Does anyone else have a better idea?

(The complete cartoon can be seen at: XKCD: A webcomic of romance, sarcasm, math, and language.)

Comments

April 30, 2007 at 12:53 am
(1) Glenn says:

End the regex with $ which will disallow anything such as + after the second number.
e.g(”[1-9][0-9]*(\.[0-9]*)?(\+[1-9][0-9]*(\.[0-9]*)?)*$”)

April 30, 2007 at 12:33 pm
(2) visualbasic says:

Thanks for the suggestion, Glenn.

Even though I use that example in my main article about regular expressions …

Regular Expressions in VB.NET

… I didn’t think of applying it to this problem.

Part of the reason is probably that the application of regular expressions to problems takes more than the normal amount of imagination and interpretation. For example, the definition that Microsoft gives for the “$” metacharacter is:

“Specifies that the match must occur at the end of the string, before \n at the end of the string, or at the end of the line.”

The interpretation that it “disallows anything … after the second number” doesn’t exactly jump out and grab you by the throat.

To add consternation to confusion, you can find this behavior documented under the MSDN heading, “Atomic Zero-Width Assertions”. Every authority I have read agrees that Microsoft’s regular expression documentation sucks big time!!

But even though their documentation is pretty bad, the technology is pretty good.

May 17, 2007 at 8:04 pm
(3) rita says:

Easy. I just don’t get regex.

May 17, 2007 at 8:56 pm
(4) visualbasic says:

You, me, and 99 percent of humanity don’t get regex, Rita. It’s just hard to wrap your head around.

But it is …

1 – Very useful in some cases. It’s one of those things like … “when you need it, nothing else will do.”

2 – Efficient. It may be tough for people to understand, but computers eat it up.

3 – Representative of a whole class of computer technologies. If you don’t “get” regex, you’ll have trouble with all of them too. Regex is a great way to exercise your mind in a new way of thinking that will also prove useful.

Give it another shot. Maybe it will work for you this time.

Leave a Comment

Line and paragraph breaks are automatic. Some HTML allowed: <a href="" title="">, <b>, <i>, <strike>

Explore Visual Basic
By Category
About.com Special Features

Stay connected and entertained with reviews on tips on the latest HDTVs, cellphones and more. More >

Easy ways to connect two computers for networking purposes. More >

  1. Home
  2. Computing & Technology
  3. Visual Basic

©2009 About.com, a part of The New York Times Company.

All rights reserved.