Anatomy of the heart

13th Mar 2018
6 min read
Tags:
obfuscated,
Perl

#!/usr/bin/perl -w
use strict;

     my$f=           $[;my
   $ch=0;sub       l{length}
 sub r{join"",   reverse split
("",$_[$[])}sub ss{substr($_[0]
,$_[1],$_[2])}sub be{$_=$_[0];p
 (ss($_,$f,1));$f+=l()/2;$f%=l 
  ();$f++if$ch%2;$ch++}my$q=r
   ("\ntfgpfdfal,thg?bngbj".    
    "naxfcixz");$_=$q; $q=~
      tr/f[a-z]/ [l-za-k] 
        /;my@ever=1..&l
          ;my$mine=$q
            ;sub p{
             print
              @_;
               }
                         
       be $mine for @ever

This heart-shaped program is written in the language of choice for producing unreadable and obfuscated code: Perl 5. It also serves an unusual purpose of a proposal.

Who wrote the code?

The code was published on PerlMonks in 2004 by user Falkkin (Colin McMillen).

Let's run it

% perl ./heart.pl
Replacement list is longer than search list at ./heart.pl line 14.
kristen, will you marry me?

The author's intention was for the code to parse correctly even with strict mode enabled (use strict;) and all warnings turned on (-w parameter passed to the Perl executable). The modern Perl interpreter (5.26.1 used here) warns about more things than it did back in 2004, trying to spoil the great moment.

Thankfully back in 2004 the ugly warning message wasn't there yet, and nothing got in Falkkin's way of sending the beautiful message to Kristen.

How does it work?

The popular way of dealing with creatively formatted Perl code is to use B::Deparse to try and make it more readable. B::Deparse is a backend module for the Perl compiler, producing valid Perl code out of the internal structures after parsing the program.

% perl -MO=Deparse ./heart.pl > ./heart2.pl
Replacement list is longer than search list at ./heart.pl line 14.
./heart.pl syntax OK

The de-parsed code has lost its pretty heart shape, and "be mine for each ever" is not as romantic as the original, but it is now way easier to grasp. Here's the full code, we will take it apart soon.

BEGIN { $^W = 1; }
use strict;
use arybase ();
my $f = $[;
my $ch = 0;
sub l {
    length $_;
}
sub r {
    join '', reverse(split(//, $_[$[], 0));
}
sub ss {
    substr $_[0], $_[1], $_[2];
}
sub be {
    $_ = $_[0];
    p(ss($_, $f, 1));
    $f += l() / 2;
    $f %= l();
    ++$f if $ch % 2;
    $ch++;
}
my $q = r("\ntfgpfdfal,thg?bngbjnaxfcixz");
$_ = $q;
$q =~ tr/[]a-z/[]l-p r-za-k/;
my(@ever) = 1 .. &l;
my $mine = $q;
sub p {
    print @_;
}
be $mine foreach (@ever);

Open heart surgery

BEGIN { $^W = 1; }

First line of the code is a BEGIN block. This block will be executed as early as possible, and its purpose is to set the special variable $^W to 1. This is equivalent to passing the -w switch to the interpreter and has the effect of turning all warnings on.

use strict;
use arybase ();
my $f = $[;
my $ch = 0;

This includes the strict and arybase modules, then initializes two variables.

The arybase module is an interesting case -- we haven't seen it in the original code. It has been inserted here because later on, the code uses it to implement the special variable $[. Right here, the ugly-named variable is used as a confusing way to say 0, but the actual purpose of it is to modify the first index of an array. By assigning 1 to it, we can have 1-indexed arrays (and another warning), as demonstrated:

% perl -E 'my @months = qw/January February March/; $[ = 1; say $months[1]'
Use of assignment to $[ is deprecated at -e line 1.
January

Next, in the code we can see a couple of subroutines defined:

sub l {
    length $_;
}
sub r {
    join '', reverse(split(//, $_[$[], 0));
}
sub ss {
    substr $_[0], $_[1], $_[2];
}

These are just short names for string operations: l for length, r for reversing and ss for substring. The special variable $[ is used here again for obfuscation; what looks like unbalanced square brackets ($_[$[] in sub r, actually parses as $_[0] and means "the first parameter passed to the sub". There is another useful subroutine with a short name defined at line 28, p, which just calls print on its parameters.

sub be {
    $_ = $_[0];
    p(ss($_, $f, 1));
    $f += l() / 2;
    $f %= l();
    ++$f if $ch % 2;
    $ch++;
}

This subroutine actually prints letters to the standard output. It's easy to see because it calls p(ss(...)), which uses the previously defined short names and stands for print(substr(...)).

First, the variable $_ is assigned the value of the first parameter passed to the be subroutine.

Perl's substr function is passed three parameters here: string, offset and length. So the code prints a one-character substring of whatever it was given as the parameter, starting at position $f (which is initialized to 0).

Then, the counter variable $f is incremented in a creative way - by half the length of the string (l() / 2). The modulo operation in line 19 is performed so that the index is never out of bounds. The counter variable is additionally incremented by 1 for every second character in line 20. Due to this, the code selects characters to print alternatively from the beginning and from the middle of the passed string. So if the subroutine is passed a 20-character string, subsequent calls to it will print: first 0th element, then 10th, then 1st, then 11th... ad infinitum. This is easily checked by modifying the last line of code to print another string, like this:

be "1234567890abcdefghij" for 1..20;

Which will result in the code printing numbers and letters interleaved, as follows:

1a2b3c4d5e6f7g8h9i0j

But back to the code. Time for another obfuscated part -- the actual string that will be printed on the screen.

my $q = r("\ntfgpfdfal,thg?bngbjnaxfcixz");
$_ = $q;
$q =~ tr/[]a-z/[]l-p r-za-k/;

This looks like a cat has jumped on the author's keyboard, but it's written like this to obfuscate the message even further. First, the string is reversed (by passing to sub r). The reversed string still looks like gibberish and has the value of "zxicfxanjbgnb?ght,lafdfpgft\n".

Next, line 25 runs the string through a tr/// operator, a translation table replacing ranges of characters:

tr/[]a-z/[]l-p r-za-k/

For more readability, let's replace the ranges (a-z) by the actual characters:

[]abcdefghijklmnopqrstuvwxyz
[]lmnop rstuvwxyzabcdefghijk

This looks like a simple Caesar cipher, where all occurences of a are replaced by l, all letters b turn into m, and so on. The letter f will be replaced by a space.

In the original code, the translation tables were arranged diffently: tr/f[a-z]/ [l-za-k] /. The square brackets were probably included in the code to contribute to the confusion factor -- they are required in regular expressions, but don't do anything in the tr///. Hence in the the de-parsed code, the tables were rearranged placing the square brackets at the beginning, before the letters; they are on the same position on both "from" and "to" side of the tables, which means they won't be substituted with anything else in the string and left as they are.

After running the string through the cipher, the result is "kitn ilyumrym?rse,wl o ar e\n". Still doesn't look like the author's proposal, but it's beginning to bear some resemblance now.

my(@ever) = 1 .. &l;
my $mine = $q;
sub p {
    print @_;
}
be $mine for (@ever);

Here comes the final part -- we can finally print the string. (Sorry, I changed the "foreach (@ever)" back to the original meaning. Couldn't stand it like that.)

First, take notice that one invocation of be prints only one character, so it must be called as many times as there are letters in the string to deliver the whole message. This is why the code creates @ever, an array of numbers from 1 till the string length. The numbers aren't used for anything else in the code, only to run be the required number of times.

So that's it -- a reversed, enciphered and scrambled message that gets untangled by an obfuscated and confusing code, to deliver a beautiful message to Kristen.

Did she say yes?

Yes, she did! She replied just a thirteen minutes after the original posting, accepting the proposal with a "Yes! :)" and a $propose++;. Congratulations!