The Dopefly Tech Blog

« The Dopefly Tech Blog Main page

The Viagra Expression

posted under category: ColdFusion on December 2, 2004 at 12:00 am by Nathan

Rob Cockerham of cockeyed.com has inspired me. After reading his report on Viagra spam, titled There Are 600,426,974,379,824,381,952 Ways To Spell Viagra, I realized that I needed to develop a regular expression that can handle every misspelling possible. Well, it looks like I'm getting close.

Here is the RegEx so far:

\b((V|v|\\/)+([\W]*[\w]{0,2}[\W]*)(I|i|1|l|\||ï|ì|:|Ì|Î|Í|Ï|y|Y)*([\W]*[\w]{0,2}[\W]*){0,2}(A|a|@|/\\|á|à|â|ã|ä|å|æ|À|Á|Â|Ã|Ä|Å)+([\W]*[\w]{0,3}[\W]*)(G|g|6|9){1,4}([\W]*[\w]{0,3}[\W]*)(R|r|®){0,4}([\W]*[\w]{0,2}[\W]*)(A|a|@|/\\|á|à|â|ã|ä|å|æ|À|Á|Â|Ã|Ä|Å)+)\b

It's been tested with ColdFusion and Java, and should be similar to Perl syntax. It's about 260 characters long so far, test it out for yourself. It's not perfect, but it does a darn good job!

UPDATE:
1) Sorry to fullasagoog for throwing off your formatting
2) I made a page to test the viagra expression