The Dopefly Tech Blog

« The Dopefly Tech Blog Main page

The Viagra Expression

posted under category: ColdFusion on December 2, 2004 by Nathan

Rob Cockerham of cockeyed.com has inspired me. After reading his report on Viagra spam, titled There Are 600,426,974,379,824,381,952 Ways To Spell Viagra, I realized that I needed to develop a regular expression that can handle every misspelling possible. Well, it looks like I'm getting close.

Here is the RegEx so far:

\b((V|v|\\/)+([\W]*[\w]{0,2}[\W]*)(I|i|1|l|\||ï|ì|:|Ì|Î|Í|Ï|y|Y)*([\W]*[\w]{0,2}[\W]*){0,2}(A|a|@|/\\|á|à|â|ã|ä|å|æ|À|Á|Â|Ã|Ä|Å)+([\W]*[\w]{0,3}[\W]*)(G|g|6|9){1,4}([\W]*[\w]{0,3}[\W]*)(R|r|®){0,4}([\W]*[\w]{0,2}[\W]*)(A|a|@|/\\|á|à|â|ã|ä|å|æ|À|Á|Â|Ã|Ä|Å)+)\b

It's been tested with ColdFusion and Java, and should be similar to Perl syntax. It's about 260 characters long so far, test it out for yourself. It's not perfect, but it does a darn good job!

UPDATE:
1) Sorry to fullasagoog for throwing off your formatting
2) I made a page to test the viagra expression

Nathan is a software developer at The Boeing Company in Charleston, SC. He is essentially a big programming nerd. Really, you could say that makes him a nerd among nerds. Aside from making software for the web, he plays with tech toys and likes to think about programming's big picture while speaking at conferences and generally impressing people with massive nerdiness and straight-faced sarcastic humor. Nathan got his programming start writing batch files in DOS. It should go without saying, but these thought and opinions have nothing to do with Boeing in any way.
This blog is also available as an RSS 2.0 feed. Click your heels together and click here to contact Nathan.