Categories
technology

The Gaisan regular expression toolkit

If you want to match URL’s reliably without creating a regexp monster so big that you need to connect up the digital projector just so you can work on it then this is something tasty I’ve come up with. Demonstrated in java, my language of choice

Pattern urlPattern = Pattern.compile("(((URL:|url:|http:|htt:)\\/\\/)|www\\.)(((([A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9]"+
"|[A-Za-z0-9])\\.)*([a-zA-Z][A-Za-z0-9-]*[A-Za-z0-9]|[a-zA-Z]))|([0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+))"+
"(:[0-9]+)?(\\/([a-zA-Z0-9$_.+!*'(,);:@&=\\~\\#-]|%[0-9A-Fa-f][0-9A-Fa-f])*(\\/([a-zA-Z0-9$_.+!*'(,)"+
";:@&=\\~\\#-]|%[0-9A-Fa-f][0-9A-Fa-f])*)*(\\?([a-zA-Z0-9$_.+!*'(,);:@&=\\~\\#-]|%[0-9A-Fa-f][0-9A-Fa-f])*)?)?)";
Matcher urlMatcher = urlPattern.matcher("http://streamserver.gaisan.com/ourapplication?sd=234324&cam=1");
boolean matches2 = m2.matches();
System.out.println("Match should be true:\t" + urlMatcher.matches());