DEV Community

Bob Lied
Bob Lied

Posted on • Edited on

PWC 243 Sweeping the floor

Perl Weekly Challenge 243 had some more exercises in pair-wise iteration over arrays. While I was still contemplating it, James Curtis-Smith posted one of his characteristically code-dense solutions, and it prompted me to rethink what I was doing. Let's have a look.

Task 1, Reverse Pairs

You are given an array of integers.
Write a script to return the number of 
reverse pairs in the given array.

A reverse pair is a pair (i, j) where:
    a) 0 <= i < j < nums.length
and b) nums[i] > 2 * nums[j].
Enter fullscreen mode Exit fullscreen mode

Example 1

Input: @nums = (1, 3, 2, 3, 1)
Output: 2
(1, 4) => nums[1] = 3, nums[4] = 1, 3 > 2 * 1
(3, 4) => nums[3] = 3, nums[4] = 1, 3 > 2 * 1

Example 2

Input: @nums = (2, 4, 3, 5, 1)
Output: 3
(1, 4) => nums[1] = 4, nums[4] = 1, 4 > 2 * 1
(2, 4) => nums[2] = 3, nums[4] = 1, 3 > 2 * 1
(3, 4) => nums[3] = 5, nums[4] = 1, 5 > 2 * 1

Thoughts

Well, that's oddly specific, but okay. This is the typical two-index march over an array, finding all the pairs. It's simplified by specifying that i < j, so it's not really all pairs, but just the half. We've seen this many times in challenges, and the stereotypical solution is a nested loop, where i goes over all the indices, and j moves from i+1 to the end.

for ( my $i = 0 ; $i < $#nums ; $i++ )
{
    for ( my $j = $i + 1 ; $j <= $#nums ; $j++ )
    {
        $count++ if $nums[$i] > 2 * $nums[$j];
    }
}
Enter fullscreen mode Exit fullscreen mode

This is clean and obvious and has analogous code in most procedural languages with array indexing. But that indexing sure is annoying. It would be nice to deal with the values directly, wouldn't it?

A characteristic of this loop is that as we march along with i, we use that element once and then we're done with it. The j index is moving along the elements to the right of i. For all practical purposes, it's as if the array was getting smaller each time.

So let's make it smaller each time. Instead of indirectly getting the i-th elements, let's just take it off the array:

while ( my $i = shift @nums ) { ... }
Enter fullscreen mode Exit fullscreen mode

Now we have the next value in $i, and the remainder of the array in @nums. To process the rest, we can iterate over the elements directly, without having to access via index. With that, we get simpler code with many fewer brackets.

sub reversePairs(@nums)
{
    my $count = 0;
    while ( my $i = shift @nums )
    {
        for my $j ( @nums )
        {
            $count++ if $i > 2 * $j;
        }
    }
    return $count;
}
Enter fullscreen mode Exit fullscreen mode

It looks better, and Lisp programmers will be feeling some warm fuzzy car and cdr vibes, but what about efficiency? Is there a hidden cost in what looks like array modification? As a matter of fact, the shifting solution is much better for performance. Here's a sample Benchmark comparison:

            Rate basic     shifting 
basic     3125/s        --      -44%
shifting  5556/s       78%        --
Enter fullscreen mode Exit fullscreen mode

There is one thing to be careful of: this idiom destroys the array while operating on it. That's not much of a problem here, because the array is passed by value. If we were passed an array reference, however, and shifted values off that, the calling function might be surprised to find itself with an empty array.

Task 2, Floor Sum

You are given an array of positive integers (>=1).
Write a script to return the sum of 
floor(nums[i] / nums[j])
where 0 <= i,j < nums.length. 
The floor() function returns the integer 
part of the division.
Enter fullscreen mode Exit fullscreen mode

Example 1

Input: @nums = (2, 5, 9)
Output: 10

  floor(2 / 5) = 0  floor(2 / 9) = 0  floor(2 / 2) = 1
  floor(5 / 9) = 0. floor(5 / 5) = 1  floor(5 / 2) = 2
  floor(9 / 9) = 1  floor(9 / 2) = 4. floor(9 / 5) = 1
Enter fullscreen mode Exit fullscreen mode

Example 2

Input: @nums = (7, 7, 7, 7, 7, 7, 7)
Output: 49

Thoughts

Once again, we have a problem that searches for pairs in an array. This time, however, we have a random array, so we need to consider every possible pair. However, there is a curious kind of anti-symmetry: the floor function will be zero whenever the numerator is less than the denominator.

An aside on the floor function: the task is making it easy on us by specifying the array as containing positive integers. Not only can we skip over division-by-zero worries, that means the floor function is equivalent to integer truncation of the division, int(x/y). If we really need to handle the more general case of floating point and negative numbers, there has been a floor function in Perl for a long time in the POSIX module, and more recently, there is now a version among the experimental builtins:

use builtin qw/floor ceil/; no warnings "experimental::builtin";
Enter fullscreen mode Exit fullscreen mode

Alright, so let's start with obvious solutions. Taking the problem description literally, let's loop over all pairs of i and j and find the sum.

sub floorSum(@nums)
{
    my $sum = 0;
    for ( my $i = 0 ; $i <= $#nums ; $i++ )
    {
        for ( my $j = 0 ; $j <= $#nums ; $j++ )
        {
            $sum += int($nums[$i]/$nums[$j]);
        }
    }
    return $sum;
}
Enter fullscreen mode Exit fullscreen mode

That works, but leaves me a little queasy about the O(n^2) performance of that double loop. Could we do better? We could, if the array was sorted in descending order. In that case, the floor function would only give us non-zero numbers when the j index is less than the i index -- this will give us a problem similar to Task 1. Well, almost. Think about example 2, where all the values are equal. We can't really move j only to the right of i -- we have to consider the possibility of equal values. This leads to a solution like this:

sub fs(@nums)
{
    my @N = sort { $b <=> $a } @nums;
    my $sum = 0;
    my $jstart = 0;
    for ( my $i = 0 ; $i <= $#N ; $i++ )
    {
        $jstart++ while ( $N[$jstart] > $N[$i] );
        for ( my $j = $jstart ; $j <= $#N && $N[$i] >= $N[$j] ; $j++ )
        {
            $sum += int( $N[$i] / $N[$j] );
        }
    }
    return $sum;
}

Enter fullscreen mode Exit fullscreen mode

There's a sort step in here, which could be kind of expensive, but there will be many fewer floor operations. Does it pay off? Yes, it does. Sorting is probably O(nlogn), and the reduction of the loop will mean the number of operations is approximately in the neighborhood of sum-from-1-to-n. Instead of n^2 operations we'll have (n)(n+1)/2, plus the cost of the sort. If n is 100, n^2 is 10,000 operations, but the sort is about 200, and the double loop is about 5000. We'll benchmark later, but this seems worth doing.

Now back to that thought from James Curtis-Smith. All those square brackets and indirection are mental clutter. Could we access the array directly as we did in Task 1? James points out how to do it, and gives a concise and minimal solution. Credit for the answer is his, but here's my version of it, made a bit more readable.

sub fs_s(@nums)
{
    my $sum = scalar @nums;
    while ( my $i = shift @nums )
    {
        for my $j ( @nums )
        {
            $sum += int($i/$j) + int($j/$i);
        }
    }
    return $sum;
}
Enter fullscreen mode Exit fullscreen mode

What's going on? First of all, the floor will always be 1 when i=j, so we start with the sum being at least the length of the array. Then, as before, we can take each i value and apply it against the remainder of the array. We want the contributions of each pair i,j and j,i but one of them will be zero when the floor function is applied. Since were graced with having no zeroes or negative values, we can use simple integer division to get our sum.

So, was all that worth it? Here's a Benchmark comparison, showing the four versions from slowest to fastest. basic is the brute-force double loop; sort is the first optimization with a sort step; golf is James' version; and ungolf is my version, more readable and optimizing the integer division instead of using POSIX::floor

            Rate basic     sort      golf      ungolf   
basic      885/s        --      -18%      -49%      -62%
sort      1079/s       22%        --      -37%      -53%
golf      1724/s       95%       60%        --      -25%
ungolf    2308/s      161%      114%       34%        --
Enter fullscreen mode Exit fullscreen mode

I will note that I did another version that passed an array reference instead of copying the array by value, but it was hard to benchmark because the versions that shift values destroy the array -- as noted above -- and therefore the repeated calls in the benchark loop don't work. I had to recopy the array to make the benchmark work, and that defeats the purpose of using a reference.

Once again, letting Perl do the annoying bookkeeping work of iterating over arrays pays off, not just in simplicity of code, but in performance.

Top comments (0)