{"id":61,"date":"2004-12-10T00:20:37","date_gmt":"2004-12-10T00:20:37","guid":{"rendered":"http:\/\/gaisan.com\/wordp\/?p=61"},"modified":"2004-12-10T00:20:37","modified_gmt":"2004-12-10T00:20:37","slug":"fun-with-regular-expressions","status":"publish","type":"post","link":"http:\/\/gaisan.com\/blogs\/?p=61","title":{"rendered":"Fun with Regular Expressions"},"content":{"rendered":"<p>I was playing around with regular expressions in Java. AFAIK these are only around since the JDK 1.4 and are therefore quite new. As a sometimes Perl programmer I&#8217;ve some experience with these but .<br \/>\nHoever, all this hacking reminded me of the most amazing regular expression I ever saw. I saw this on the ActiveState&#8217;s RX cookbook some time ago.<br \/>\nIt&#8217;s actually a useful and logically sound solution to a common problem&#8230; <b>How to match all RFC 1738 compliant URLs and turn them into hyperlinks!<\/b> It was posted by Abigail to comp.lang.perl.misc on 08\/14\/2000. Abigail, I love you!!!<br \/>\n<code><br \/>\n$string =~ s<\n(?:http:\/\/(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.\n)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\\d+)\n){3}))(?::(?:\\d+))?)(?:\/(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\n\\d]{2}))|[;:@&#038;=])*)(?:\/(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{\n2}))|[;:@&#038;=])*))*)(?:\\?(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{\n2}))|[;:@&#038;=])*))?)?)|(?:ftp:\/\/(?:(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?\n:%[a-fA-F\\d]{2}))|[;?&#038;=])*)(?::(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-\nfA-F\\d]{2}))|[;?&#038;=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-\n)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?\n:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+))?))(?:\/(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!\n*'(),]|(?:%[a-fA-F\\d]{2}))|[?:@&#038;=])*)(?:\/(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'()\n,]|(?:%[a-fA-F\\d]{2}))|[?:@&#038;=])*))*)(?:;type=[AIDaid])?)?)|(?:news:(?:\n(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[;\/?:&#038;=])+@(?:(?:(\n?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:[\na-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3})))|(?:[a-zA-Z](\n?:[a-zA-Z\\d]|[_.+-])*)|\\*))|(?:nntp:\/\/(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[\na-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d\n])?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+))?)\/(?:[a-zA-Z](?:[a-zA-Z\n\\d]|[_.+-])*)(?:\/(?:\\d+))?)|(?:telnet:\/\/(?:(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+\n!*'(),]|(?:%[a-fA-F\\d]{2}))|[;?&#038;=])*)(?::(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'()\n,]|(?:%[a-fA-F\\d]{2}))|[;?&#038;=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[a\n-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d]\n)?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+))?))\/?)|(?:gopher:\/\/(?:(?:\n(?:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:\n(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+\n))?)(?:\/(?:[a-zA-Z\\d$\\-_.+!*'(),;\/?:@&#038;=]|(?:%[a-fA-F\\d]{2}))(?:(?:(?:[\na-zA-Z\\d$\\-_.+!*'(),;\/?:@&#038;=]|(?:%[a-fA-F\\d]{2}))*)(?:%09(?:(?:(?:[a-zA\n-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[;:@&#038;=])*)(?:%09(?:(?:[a-zA-Z\\d$\n\\-_.+!*'(),;\/?:@&#038;=]|(?:%[a-fA-F\\d]{2}))*))?)?)?)?)|(?:wais:\/\/(?:(?:(?:\n(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:\n[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+))?\n)\/(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))*)(?:(?:\/(?:(?:[a-zA\n-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))*)\/(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(\n?:%[a-fA-F\\d]{2}))*))|\\?(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]\n{2}))|[;:@&#038;=])*))?)|(?:mailto:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),;\/?:@&#038;=]|(?:%\n[a-fA-F\\d]{2}))+))|(?:file:\/\/(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]\n|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:\n(?:\\d+)(?:\\.(?:\\d+)){3}))|localhost)?\/(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'()\n,]|(?:%[a-fA-F\\d]{2}))|[?:@&#038;=])*)(?:\/(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(\n?:%[a-fA-F\\d]{2}))|[?:@&#038;=])*))*))|(?:prospero:\/\/(?:(?:(?:(?:(?:[a-zA-Z\n\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)\n*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+))?)\/(?:(?:(?:(?\n:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[?:@&#038;=])*)(?:\/(?:(?:(?:[a-\nzA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[?:@&#038;=])*))*)(?:(?:;(?:(?:(?:[\na-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[?:@&#038;])*)=(?:(?:(?:[a-zA-Z\\d\n$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[?:@&#038;])*)))*)|(?:ldap:\/\/(?:(?:(?:(?:\n(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?:\n[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+))?\n))?\/(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\\d]|%(?:3\\d|[46][a-fA-F\\d]|[57][Aa\\d])\n)|(?:%20))+|(?:OID|oid)\\.(?:(?:\\d+)(?:\\.(?:\\d+))*))(?:(?:%0[Aa])?(?:%2\n0)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\n\\d]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)\\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?\n:(?:(?:[a-zA-Z\\d]|%(?:3\\d|[46][a-fA-F\\d]|[57][Aa\\d]))|(?:%20))+|(?:OID\n|oid)\\.(?:(?:\\d+)(?:\\.(?:\\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])\n?(?:%20)*))?(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))*)))*)(?:(\n?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:(?:(\n?:(?:[a-zA-Z\\d]|%(?:3\\d|[46][a-fA-F\\d]|[57][Aa\\d]))|(?:%20))+|(?:OID|o\nid)\\.(?:(?:\\d+)(?:\\.(?:\\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(\n?:%20)*))?(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))*))(?:(?:(?:\n%0[Aa])?(?:%20)*)\\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\\d]|%(\n?:3\\d|[46][a-fA-F\\d]|[57][Aa\\d]))|(?:%20))+|(?:OID|oid)\\.(?:(?:\\d+)(?:\n\\.(?:\\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a\n-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))*)))*))*(?:(?:(?:%0[Aa])?(?:%2\n0)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:\\?(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+\n!*'(),]|(?:%[a-fA-F\\d]{2}))+)(?:,(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-f\nA-F\\d]{2}))+))*)?)(?:\\?(?:base|one|sub)(?:\\?(?:((?:[a-zA-Z\\d$\\-_.+!*'(\n),;\/?:@&#038;=]|(?:%[a-fA-F\\d]{2}))+)))?)?)?)|(?:(?:z39\\.50[rs]):\/\/(?:(?:(?\n:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:(?\n:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\\d+)){3}))(?::(?:\\d+))\n?)(?:\/(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))+)(?:\\+(?:(?:\n[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))+))*(?:\\?(?:(?:[a-zA-Z\\d$\\-_\n.+!*'(),]|(?:%[a-fA-F\\d]{2}))+))?)?(?:;esn=(?:(?:[a-zA-Z\\d$\\-_.+!*'(),\n]|(?:%[a-fA-F\\d]{2}))+))?(?:;rs=(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA\n-F\\d]{2}))+)(?:\\+(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))+))*)\n?))|(?:cid:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[;?:@&#038;=\n])*))|(?:mid:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[;?:@\n&#038;=])*)(?:\/(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[;?:@&#038;=]\n)*))?)|(?:vemmi:\/\/(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\n\\d])?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:.(?:\\d+)){3}))(?::(?:\\d+))?)(?:\/(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a\n-fA-F\\d]{2}))|[\/?:@&#038;=])*)(?:(?:;(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a\n-fA-F\\d]{2}))|[\/?:@&#038;])*)=(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d\n]{2}))|[\/?:@&#038;])*))*))?)|(?:imap:\/\/(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+\n!*'(),]|(?:%[a-fA-F\\d]{2}))|[&#038;=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:\\*|(?:(\n?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[&#038;=~])+))))?)|(?:(?:;[\nAa][Uu][Tt][Hh]=(?:\\*|(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2\n}))|[&#038;=~])+)))(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[\n&#038;=~])+))?))@)?(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])\n?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\\.(?:\n\\d+)){3}))(?::(?:\\d+))?))\/(?:(?:(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:\n%[a-fA-F\\d]{2}))|[&#038;=~:@\/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][Tt]|\n[Ss][Uu][Bb])))|(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))\n|[&#038;=~:@\/])+)(?:\\?(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[\n&#038;=~:@\/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-\n9]\\d*)))?)|(?:(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[&#038;=~\n:@\/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\\d*\n)))?(?:\/;[Uu][Ii][Dd]=(?:[1-9]\\d*))(?:(?:\/;[Ss][Ee][Cc][Tt][Ii][Oo][Nn\n]=(?:(?:(?:[a-zA-Z\\d$\\-_.+!*'(),]|(?:%[a-fA-F\\d]{2}))|[&#038;=~:@\/])+)))?))\n)?)|(?:nfs:(?:(?:\/\/(?:(?:(?:(?:(?:[a-zA-Z\\d](?:(?:[a-zA-Z\\d]|-)*[a-zA-\nZ\\d])?)\\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\\d]|-)*[a-zA-Z\\d])?))|(?:(?:\\d+)(?:\n\\.(?:\\d+)){3}))(?::(?:\\d+))?)(?:(?:\/(?:(?:(?:(?:(?:[a-zA-Z\\d\\$\\-_.!~*'\n(),])|(?:%[a-fA-F\\d]{2})|[:@&#038;=+])*)(?:\/(?:(?:(?:[a-zA-Z\\d\\$\\-_.!~*'(),\n])|(?:%[a-fA-F\\d]{2})|[:@&#038;=+])*))*)?)))?)|(?:\/(?:(?:(?:(?:(?:[a-zA-Z\\d\n\\$\\-_.!~*'(),])|(?:%[a-fA-F\\d]{2})|[:@&#038;=+])*)(?:\/(?:(?:(?:[a-zA-Z\\d\\$-_.!~*'(),])|(?:%[a-fA-F\\d]{2})|[:@&#038;=+])*))*)?))|(?:(?:(?:(?:(?:[a-zA-\nZ\\d\\$\\-_.!~*'(),])|(?:%[a-fA-F\\d]{2})|[:@&#038;=+])*)(?:\/(?:(?:(?:[a-zA-Z\\d\n\\$\\-_.!~*'(),])|(?:%[a-fA-F\\d]{2})|[:@&#038;=+])*))*)?)))\n&gt;&lt;a href = \"$1\"&gt;$1&lt;\/a&gt;&gt;gx;\n<\/code><br \/>\nNeedless to say, anything this complex requires a license to say that it may not work which is reprinted <a href=\"http:\/\/aspn.activestate.com\/ASPN\/Cookbook\/Rx\/license\">here<\/a> <i>(even though it logically should work at all times)<\/i>. Wow and wholeheated respect to Abigail...<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was playing around with regular expressions in Java. AFAIK these are only around since the JDK 1.4 and are therefore quite new. As a sometimes Perl programmer I&#8217;ve some experience with these but . Hoever, all this hacking reminded me of the most amazing regular expression I ever saw. I saw this on the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"_links":{"self":[{"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/61"}],"collection":[{"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=61"}],"version-history":[{"count":0,"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/61\/revisions"}],"wp:attachment":[{"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=61"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=61"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/gaisan.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=61"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}