The Dopefly Tech Blog

<< The Dopefly Tech Blog Main page

The Viagra Expression

posted under category: ColdFusion on December 2, 2004 at 1:00 am by MrNate

Rob Cockerham of cockeyed.com has inspired me. After reading his report on Viagra spam, titled There Are 600,426,974,379,824,381,952 Ways To Spell Viagra, I realized that I needed to develop a regular expression that can handle every misspelling possible. Well, it looks like I'm getting close.

Here is the RegEx so far:

\b((V|v|\\/)+([\W]*[\w]{0,2}[\W]*)(I|i|1|l|\||ï|ì|:|Ì|Î|Í|Ï|y|Y)*([\W]*[\w]{0,2}[\W]*){0,2}(A|a|@|/\\|á|à|â|ã|ä|å|æ|À|Á|Â|Ã|Ä|Å)+([\W]*[\w]{0,3}[\W]*)(G|g|6|9){1,4}([\W]*[\w]{0,3}[\W]*)(R|r|®){0,4}([\W]*[\w]{0,2}[\W]*)(A|a|@|/\\|á|à|â|ã|ä|å|æ|À|Á|Â|Ã|Ä|Å)+)\b

It's been tested with ColdFusion and Java, and should be similar to Perl syntax. It's about 260 characters long so far, test it out for yourself. It's not perfect, but it does a darn good job!

UPDATE:
1) Sorry to fullasagoog for throwing off your formatting
2) I made a page to test the viagra expression

Too old to comment!
On Dec 2, 2004 at 1:00 AM Pete Freitag (http://www.petefreitag.com/) said:
You should put up a page where we can test it.

On Dec 2, 2004 at 1:00 AM Nathan Strutz (http://www.dopefly.com/) said:
Thanks Pete, I just updated the blog entry. I'll get it up soon.

On Dec 4, 2004 at 1:00 AM Bjorn (bjorn, by way of ninthcircle.com) said:
Hi,

Close...very close..but these common variations on everyone's favourite 'hardener' still 'slip' through...

v1@gr@, v1agr@, viagr@, vi@gr@

On Jun 24, 2005 at 1:00 AM Brian Scot (http://blogs.geekdojo.net/brian) said:
Thanks, nice regex, but the word boundry, \b, at the begenning and end of the expression allows variations beginning with \/ or ending with @ to pass. These are not considered words and the boundry is actually inside these characters. Taking them off catches more variations, but also catches variations with extra beginning or trailing letters. Which come to think of it may be better in my opinion.

On Jun 27, 2006 at 1:00 AM jakarikukan (tyut at the ever popular op.pl) said:
http://www.filmy-z-shemal.lolek.pl | filmy darmowe nowe | [URL=http://www.darmowe-krotkie-filmy-najlepsze.lolek.pl]darmowe krotkie filmy najlepsze[/URL] |

On Jul 23, 2006 at 1:00 AM Chimmy (http://www.chimmy.com/) said:
This RegEx will find the following valid strings as well as the "Viagra" strings:

On Aug 1, 2006 at 1:00 AM Den (www.google.com) said:
The "i" variants should also include the "inverted exclamation mark", usually used in Spanish etc. languages, since it looks like an "i" and can be used instead of it: ¡

On Dec 7, 2006 at 1:00 AM Bill Knight (william.j.knight, who breathes verizon.net) said:
You might also want to include ^ as a varient for A, as

V1@gr^
is not caught.

On Jan 15, 2007 at 1:00 AM Mark (markruny who can't believe it's not gmail.com) said:
the text is not at the site anymore (millions of ways of writing viagra), but google cache has it

http://72.14.209.104/search?q=cache:A0enVOcoXiAJ:www.cockeyed.com/lessons/viagra/viagra.html+http://cockeyed.com/lessons/viagra/viagra.html&hl=pt-BR&gl=br&ct=clnk&cd=2

On Jun 4, 2009 at 1:00 AM jblo (jbull0ck of yahoo.com) said:
catches vagina
Too old to comment!