Bob Lied

Posted on Sep 20

PWC 287 Strength in Numbers

#perl #pwc #perlweeklychallenge

Let's knock off Task 2: Valid Number first, since it's the easy one. This asks us to recognize floating point numbers, with or without exponents. In the background, we hear Bob Seger's Feel Like a Number while reading the description.

Task 2: Valid Number

You are given a string, $str. Write a script to find if it is a valid number. Conditions for a valid number:

An integer number followed by an optional exponent.
A decimal number followed by an optional exponent.
An integer number is defined with an optional sign '-' or '+' followed by digits.

Decimal Number: A decimal number is defined with an optional sign '-' or '+' followed by one of the following definitions:

Digits followed by a dot '.'.
Digits followed by a dot '.' followed by digits.
A dot '.' followed by digits.

Exponent: An exponent is defined with an exponent notation 'e' or 'E' followed by an integer number.

Example 1: Input: $str = "1" Output: true
Example 2: Input: $str = "a" Output: false
Example 3: Input: $str = "." Output: false
Example 4: Input: $str = "1.2e4.2" Output: false
Example 5: Input: $str = "-1." Output: true
Example 6: Input: $str = "+1E-8" Output: true
Example 7: Input: $str = ".44" Output: true

Solution

The primary virtue of a Perl programmer is laziness.
Scalar::Util::looks_like_number. Our work here is done.

There also exists Regexp::Common::number. Or you can re-invent the regular expression and come up with something like

$str =~ m/^[+-]?(?:\d+\.?\d*|\.\d+)([eE][+-]?\d+)?$/

But looks_like_number is going to win. Here's a benchmark result:

            Rate common  regex scalar
common  379507/s     --   -31%   -64%
regex   549451/s    45%     --   -48%
scalar 1058201/s   179%    93%     --

On to Task 1.

Task1: Strong Password

Among all the "Stronger" songs out there, I'm going to let Sheryl Crow pose the musical question, "Are you Strong Enough to be my [password]", but this is going to take enough time that I would also need Kelly Clarkson's Stronger (What Doesn't Kill You)

You are given a string, $str. Write a program to return the minimum number of steps required to make the given string a very strong password. If it is already strong then return 0.

Criteria:

It must have at least 6 characters.
It must contains at least one lowercase letter, at least one upper case letter and at least one digit.
It shouldn't contain 3 repeating characters in a row.

Following can be considered as one step: Insert one character; delete one character; replace one character with another

Example 1: Input: $str = "a" Output: 5
Example 2: Input: $str = "aB2" Output: 3
Example 3: Input: $str = "PaaSW0rd" Output: 0
Example 4: Input: $str = "Paaasw0rd" Output: 1
Example 5: Input: $str = "aaaaa" Output: 2

Thoughts

I note first that we're not asked to actually transform the string, only to figure out how little work it should take (the prime virtue again).

Of the possible operations that the task allows us, deleting a character never gets us closer, so let's never do that. Adding a character has the same cost as replacement, so if we need a new character, we might as well append it rather than trying to figure out where a good replacement might be. Breaking up sequences is the trickiest part.

I think the laziest approach looks like this:

For every set of repeating characters, replace every third character. If we're missing a class of characters, use that class to do the replacement.
If we're still missing a class, append a character from that class.
If we're still less than six characters long, append characters.

sub calcOp($str)
{
    my $s = length($str);
    my $r =()= $str =~ m/(.)\1\1/g; # Number of triples
    my $n = 3 - ($str =~ m/\p{Lower}/) 
              - ($str =~ m/\p{Upper}/)
              - ($str =~ m/\p{Digit}/);

    my $opCount = $r;
    if ( $r > 0 && $n > 0 )
    {
        # Some triplets can be used to swap in missing classes.
        if ( $r >= $n ) { $n = 0 }
        else            { $n -= $r }
    }

    if ( $n > 0 )
    {
        $s += $n; # Add missing classes, makes string longer
        $opCount += $n;
    }

    $opCount += (6-$s) if $s < 6; # Pad string if too short.
    return $opCount;
}

Number of repeating three-character strings

There was a challenge recently that required us to find repeating characters. The regular expression to find a sequence of three characters that are the same looks like m/(.)\1\1/ -- the . matches any character, captures it, and the \1 refers back to whatever was captured.

To count the number of times it occurs, add the g flag (which will return a list of matches in array context) and use the Saturn operator to switch into scalar mode and get the count instead of the array. Yes, we're calling it the Saturn operator; we're not 14-year-olds anymore.

Character classes

To check for upper, lower, and digits, do regular expression matches. [a-z] would probably work. [:lower:] and [:upper:] is readable, too. Using the Unicode character class will handle international character sets, but can get into a real rabbit hole (what's the difference between Upper, PosixUpper, XPosixUpper and Uppercase_Letter?). Want to have something to do during your Zoom/Teams call? Here you go.

Upping the Fun Factor

It might be amusing to implement actually strengthening the password string. It's one thing to say we're going to replace the third character of a sequence, but another to actually pick the replacement and not inadvertently create a new triple in the process. Let's make some design decisions.

I will want to know how close my string is to being strong. Specifically, I want to know which classes of characters it contains. I'm going to choose to represent that with a bit map, where bits 0, 1, and 2 are booleans that say whether lower case, upper case, or digit characters are present. Then, I'm going to write a simple utility function to determine which classes are present, and to encode that into a single integer.

use constant { C_LC => 1, C_UC => 2, C_DIG => 4, C_OTHER => 8 };
sub hasClass($s)
{
    return ( ($s =~ m/\p{Lower}/) && C_LC)
         | ( ($s =~ m/\p{Upper}/) && C_UC)
         | ( ($s =~ m/\p{Digit}/) && C_DIG);
}

Now, if I know which classes are present, I know the complementary sets of characters that aren't. Let's make a table of those possibilities.

my $LOWER = join("", ("a".."z"));
my $UPPER = join("", ("A".."Z"));
my $DIGIT = join("", ( 0 .. 9 ));

my @NeedClass;
$NeedClass[ 0     | 0    | 0    ] = "$DIGIT$UPPER$LOWER";
$NeedClass[ 0     | 0    | C_LC ] = "$DIGIT$UPPER";
$NeedClass[ 0     | C_UC | 0    ] = "$DIGIT$LOWER";
$NeedClass[ 0     | C_UC | C_LC ] = "$DIGIT";
$NeedClass[ C_DIG | 0    | 0    ] = "$UPPER$LOWER";
$NeedClass[ C_DIG | 0    | C_LC ] = "$UPPER";
$NeedClass[ C_DIG | C_UC | 0    ] = "$LOWER";
$NeedClass[ C_DIG | C_UC | C_LC ] = "$DIGIT$UPPER$LOWER";

sub need($charClass) { return $NeedClass[$charClass]  }

The need function will take a bitmap of what I have, and return a string of missing characters to choose from.

One special case comes up. If I already have all the character classes, it seems like the complementary set should be the empty string. It turns out to be more useful to say that, if I already have all the classes accounted for, and I need another character, then any character from any class would do.

That need function is convenient, but there's a twist: when I choose a character to replace, I don't want to choose its replacement as an adjacent character -- that might inadvertently create a new triplet that would have to be cleaned up. For instance, in the string "aaaAA1", I don't want to replace the third "a" with "A", because it would create "aaAAA1", which needs an extra step to clean up "AAA". What I want is a function that says, "Choose one of these. No, not that one!".

sub randFromExcl($from, $exclude)
{
    $from =~ s/[$exclude]//g if $exclude ne "";
    substr($from, int(rand(length($from))), 1);
}

I would have liked to use tr//d to eliminate characters, but tr has the quirk that you can't interpolate strings into its argument. Using rand is setting myself up for making testing harder, but it feels "stronger."; I'll tap-dance around this later.

We have several of the pieces in place to implement our strategy from above. Let's defer the repeated-character problem for the moment by assuming we have a function that handles it. Then, our main function will look like:

my $opCount;
sub strongPassword($str)
{
    $OpCount = 0;
    $str = rmvRepeat($str);

    while ( (my $have = hasClass($str)) != (C_UC|C_LC|C_DIG) )
    {
        $str .= randFromExcl( need($have), "");
        $OpCount++;
    }

    while ( length($str) < 6 )
    {
        $str .= randFromExcl( need(0), "");
        $OpCount++;
    }

    return $OpCount;
}

Okay, what about removing repeated characters? The core of that is going to be a loop that keeps finding triplets, and replaces the third character of the triplet with something that we need (or any random thing if we have one of each).

sub rmvRepeat($s, $rplc = "")
{
    while ( $s =~ m/(.)\1\1/ )
    {
        my $char  = $1;
        my $notThis = substr($POSTMATCH, 0, 1);

        my $use = ( $rplc eq "" )
                ?  randFromExcl( need(hasClass($s)), "$char$notThis")
                : $rplc;

        $s =~ s/$char$char$char/$char$char$use/;
        $OpCount++;
    }
    return $s;
}

Notes on this function:

, $rplc = "") -- It's annoying to test functions that have random results, so I'm giving myself a way to pass in a known replacement value. If it's empty, I'll generate a random value, but if it's given, I'll use it.
while ( $s = m/(.)\1\1/ ) -- repeat while we keep finding sets of three identical characters. Matching has side effects.
my $char = $1 -- Side effect 1. We captured the repeating character and there it is in $1.
my $notThis = substr($POSTMATCH, 0, 1) -- Side effect 2. When a regular expression matches, Perl also records the piece of the string before the match, the match itself, and the remainder after the match in special variables $`, $&, and $'. There's ample punctuation in Perl, so I added a use English; statement to access the more readable $PREMATCH, $MATCH, and $POSTMATCH. Taking the first character of $POSTMATCH tells me a character I don't want. Long ago, using these variables was discouraged because of possible performance effects, but that was fixed many releases ago.
randFromExcl( need(hasClass($s)), "$char$notThis") -- here's my integration of the parts we developed earlier. hasClass tells me what is already present in $s. need tells me the missing characters I could choose from. I want to exclude the repeating character and the next character after the match, but otherwise any random character from the needed classes would do.
s/\Q$char$char$char/$char$char/$use/ -- Here's the character replacement. What is the \Q? Consider if the input string is '??????'. Then substitution would look like s/???/??x/. But ??? is not a valid pattern. The \Q sanitizes the input by quoting any regular expression meta-characters.
$OpCount++ -- every time I do a replacement, the global variable is bumped. I'm not terribly bothered by using a global variable. I could pedantically pass a reference to a counter variable. Or I could implement a Singleton. Or I could build a class around this (call it PasswordStrengthener maybe) and make the operation count be a member variable. But that would be too much fun, more than people should be allowed to have on a weekly basis.

The complete code, with unit tests, is up on GitHub.

DEV Community

PWC 287 Strength in Numbers

Task 2: Valid Number

Solution

Task1: Strong Password

Thoughts

Number of repeating three-character strings

Character classes

Upping the Fun Factor

Top comments (0)

Read next

WIP Notes working though Render hosting Flask + Vite + React + Wouter

Frontend Challenge - December Edition

Navigating Search Solutions: A Comprehensive Comparison Guide to Meilisearch, Algolia, and ElasticSearch

Is sourcehut git access denied for anyone?