What is RegEx all about?
Regular Expressions - also known as "RegEx" - are strings of text used to match patterns in other strings. The classic example is a search string that uses "wild cards" such as an "*" or "?" (which are also used in "regular" regular expressions). VB.NET has great support for RegEx but if you're looking for information about how to use it in VB 6, About Visual Basic has that too. Just follow this link!
To get the basic idea behind RegEx, open a "Search" in Windows Explorer to look for a file on your computer. Enter "*.ico" into the "All or part of the file name" text box (the "word or phrase" text box doesn't support wild card characters). You should get a list of hundreds of "icon" files if you search your whole computer. The "*" wild card will "match" any file name and the ".ico" will "match" icon files that end in just those characters.
For another explanation, check out the "Regular Expressions" cartoon at XKCD: A webcomic of romance, sarcasm, math, and language.)
RegEx is exactly the same idea, but the idea has been developed to allow just about any kind of text searching you can imagine to be done. Andrew Watt wrote an 800 page book (reviewed by About.Com here) that he called, Beginning Regular Expressions. I wonder how many pages it it would take to write the "Advanced" version.
The big problem with RegEx's is that they are sort of a "write-only" language. A RegEx that does something meaningful, (such as this example to match patterns for US telephone numbers (credit to author Jesse Sweetland) ...
... can be hard for anyone to figure out how it works after it's written. Don't be fooled because this is just one line of code. It might take even an experienced programmer quite a while to craft something like this. (I'll use this RegEx in some VB code in a few paragraphs.)
For this reason, there are a lot of utilities that you can download that help you figure out what a regex is doing. Some examples:
- RegexBuddy from JGsoft
- RegexDesigner .NET from Chris Sells
Historical Sidetrack ...
A lot of people (myself included) get confused right away because we don't even understand why these things are called 'regular expressions'. Just to clear this out of the way so we can get to the important stuff, the term was first used by the American mathematician Stephen Kleene. For him, it was a branch of mathematics and he figured out math rules that make it work. For programmers, it's just a name, so call them "widgets" or "thingies" if it will help you understand better.
But if you want to get geek points for knowing obscure stuff, the '*' character that we usually call a "wildcard" is sometimes called a Kleene star in academic circles and - here's a good one - Kleene pronounced his last name klay'nee. His son, Ken Kleene, wrote: "As far as I am aware this pronunciation is incorrect in all known languages. I believe that this novel pronunciation was invented by my father." Dr. "Clay Knee" must have been a geek of the first water, to be sure!
On the next page, we dig into how to use RegEx in your VB.NET code!