DEV Community

Bob Lied
Bob Lied

Posted on

PWC 255 Odd character cordoctahedra and the most most most frequent word word

Task 1: Odd Character

You are given two strings, $s and $t. The string
$t is generated using the shuffled characters of
the string $s with an additional character.
Write a script to find the additional character
in the string $t.
Enter fullscreen mode Exit fullscreen mode
  • Example 1. Input: $s = "Perl" $t = "Preel" Output: "e"
  • Example 2. Input: $s = "Weekly" $t = "Weeakly" Output: "a"
  • Example 3. Input: $s = "Box" $t = "Boxy" Output: "y"

Excogitation

This looks like set difference. Or more accurately, multiset, since characters can appear more than once. Or as we call it in the erudite world of computer science, bags. There's the Set::Bag module that could do that.

Alternatives without external dependencies exist. One is that we go through the characters of $t and delete them from $s, leaving only the odd one out. A similar way is to tally up occurrences in $s and then tally down from $t.

Using string operations, here's one way it could be done.

sub oddChar($s, $t)
{
    die qq("$t" has wrong length compared to "$s")
        if length($t) != length($s)+1;

    for my $c ( split(//, $t) )
    {
        if ( ( my $i = index($s, $c) ) < 0 )
        {
            return $c;
        }
        else
        {
            substr($s, $i, 1) = "";
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Fairly comprehensible code. The only mildly odd thing is assigning a value to a substring to erase characters from $s. As soon as we've found something in $t that isn't in $s, that must be the odd character. We have to remove from $s in case the extra character is a duplicate character. This isn't very robust if $s and $t don't conform to the input specification. For instance, oddChar("plover", "xyzzyp") would return "x", even though the inputs are wildly non-conforming.

We can catch more error cases if we analyze the strings before coughing up an answer.

sub oddChar($s, $t)
{
    my %freq;
    $freq{$_}++ for split(//, $s);
    $freq{$_}-- for split(//, $t);

    my @diff = grep { $freq{$_} != 0 } keys %freq;
    if ( @diff > 1 )
    {
        die "Too many differences $s => $t, [@diff]"
    }
    elsif ( @diff == 0 )
    {
        die "No difference between $s and $t"
    }
    elsif ( $freq{$diff[0]} != -1 )
    {
        die "Not exactly one diff for @diff"
    }

    return $diff[0];
}
Enter fullscreen mode Exit fullscreen mode

We count each distinct character in $s in the hash %freq. Then, we subtract again as we find the characters in $t. If the strings are well-formed, every character will count up and then back to zero, except for the extra one, which will have a count of -1. We can check if the expected condition happened and report on problems.

The grep will cast aside things that matched between $s and $t, leaving a list of differences in the @diff array. The size of @diff is expected to be 1; the count for that one letter should be -1; and the return value is the first element of @diff.

I've chosen to kill the process (die) if any of these errors are detected. That's pretty unfriendly in a library function, but good enough in this context, especially since there's no convincing rationale for any sensible return value for these errors. But that raises the question of how to test a function that throws an exception. With the Test2::V0::Exception module, the unit test looks like

   use Test2::V0;
   like(
        dies { oddChar("xyzzy","xyzzy") }, qr/No difference/,
        "Dies if no difference");
Enter fullscreen mode Exit fullscreen mode

Task 2: Most Frequent Word

You are given a paragraph $p and a banned word $w.
Write a script to return the most frequent word
that is not banned.
Enter fullscreen mode Exit fullscreen mode
  • Example 1. $p = "Joe hit a ball, the hit ball flew far after it was hit."
    $w = "hit"
    Output: "ball" The banned word "hit" occurs 3 times. The other word "ball" occurs 2 times.

  • Example 2. $p = "Perl and Raku belong to the same family. Perl is the most popular language in the weekly challenge."
    $w = "the"
    Output: "Perl" The banned word "the" occurs 3 times. The other word "Perl" occurs 2 times.

Meditations

Word frequencies are easily done by tallying counts in a hash that has the words as the keys. Then, delete the banned word from the hash, and report the one with the highest count.

The difficult part is splitting words, which touches on natural language processing. Does case matter? What do you do with numbers? Is "Perl", the world's best programming language, equivalent to "perl", the knitting action? What about contractions, possessives, and plurals? Can we reduce "can't" to "cant"? Should "can't" and "cannot" be considered the same word? Is "words" a form of "word" and "cooked" a form of "cook", or are they distinct? And that's just English. Really, one could write a book about it.

When in doubt, YAGNI. The Simplest Thing That [probably] Works is to remove the punctuation, and split the words on white space. Into the breach!

sub mfw($p, $w)
{
    my %freq;
    $freq{$_}++ for ( split(" ", $p =~ s/[[:punct:]]+/ /gr) );
    delete $freq{$w};
    return (sort { $freq{$b} <=> $freq{$a} } keys %freq)[0];
  }
Enter fullscreen mode Exit fullscreen mode

Dissection

  • $p =~ s/[[:punct:]]+/ /gr -- Change all the punctuation into spaces. [[:punct:]] is a Unicode character class that sweeps up all the variations of punctuation. The + sign matches one or more. Adding /g to the substitution does it globally; and /r returns the modified string (without it the s/// substitution operator returns a count of the number of changes).
  • split(" ", ...) -- turns the paragraph into a list of words by splitting wherever white space is found.
  • $freq{$_}++ for (...) -- each word in the list is temporarily referenced by the $_ topicalizer, and its count is incremented. Perl will make new entries in the hash automatically, and assume the initial integer value is 0; there's no need to test if the entry exists first.
  • delete $freq{$w} -- remove the hash entry for the banned word
  • sort { $freq{$b} <=> $freq{$a} } keys %freq -- sort the words by their frequency counts, in descending order (b before a).
  • (sort...)[0] -- the most frequent word is at the start of the sorted list. Enclosing it in parentheses asserts that we wish to take this list in array context, and then we can de-reference the first element directly, without needing to declare a temporary variable.

As always, my solutions are subjective and quite likely wrong. Other contestants' submitted solutions are on Github.

Top comments (0)