Regex prime checker
This Java snippet uses a regular expression for something way different than they were designed for: a primality check.
public static boolean
Who is the author?
After some digging I found out that the original author is Abigail and the code was originally written in his language of choice, Perl. The original one-liner can be found here, it looks like this:
perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/'
How does it work?
These two code snippets look a bit different, but the general idea behind them is the same. First, the code builds a string of the length n
, then a specially crafted regular expression is matched against it. If the match is positive, it means the number is not prime.
What is a prime?
By definition, a prime is a number that is greater than 1
and has no positive divisors other than 1
and itself.
The Perl version
The code expects a number for the primality check as an argument, and displays "Prime
" on the standard output when the number is found to be a prime.
% perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' 12
(no output)
% perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' 13
Prime
The string to run matches on is built using Perl's x
repetition operator. The expression 1 x shift
will repeat 1
as many times as given in the program argument, building a string consisting just of ones. If 12
is passed as the argument, the string will contain 111111111111
.
The regular expression is composed of two sub-expressions, ^1?$
and ^(11+?)\1+$
. Only one of these matches is required for the whole expression to match. Let's have a closer look at these.
First part: for 0 and 1
^1?$
Translated to English, this means: match the beginning of the string (^
), followed by the number one (1
) that occurs exactly zero or one time (?
), followed by the end of string ($
). This piece will match only strings of the length of 0
and 1
, detecting them as non-primes.
Second part: every other number
^(11+?)\1+$
This regular expression is more complicated as uses a repeated capturing group. It will match the string beginning (^
), followed by a group of at least two ones ((11+?)
), repeated at least once after the first match - so at least twice in total (\1+
), and followed by the end of string ($
). Its purpose is to check whether there is a number that will divide n
into at least two equal parts, with no remainder. If there is one, the number is by definition not a prime.
Differences in Java version
The Java code is a bit different. First, the string is constructed like this:
Which means: create a new array of char
s with the length of n
, and then create a String
out of it. The default value of newchar
s is \U0000
, which means the created string will contain exactly n
null characters.
Next, the regular expression has changed slightly.
.?|(..+?)\1+
The regular expression is matched against the string using String.matches(String regex) method, which always wants a complete match - so there is no need to use ^
and $
anymore as in Perl. Also, because the string content is different than in the Perl version, the regex now uses a dot (.
, one occurrence of any character) to match the null characters in the string.
Can it be used in production?
Most certainly not. Due to the number of matches that need to be done on the string, the code is slow.