Ben Steadman

Posted on Apr 21, 2019 • Edited on Jun 17, 2019

Advent of Code 2018 Day 2: Inventory Management System

#aoc2018 #python #puzzle #challenge

This post originally appeared on steadbytes.com

Complete solution can be found on GitHub

Part One

We are asked to calculate a checksum for the lis of box IDs in the puzzle input.

A checksum is defined as the number of IDs containing exactly two of any letter multiplied by t the number of IDs containing exactly three of any letter.

Parse the input to get a list of box IDs
Maintain a counter for the total number of IDs containing exacly two of any letter and a counter for the total number of IDs containing exactly three of any letter
For each box ID:
- Find the frequency of each letter
- Increment the appropriate counter if any letter occurs exactly two/three times
Multiply the final values of the two counters to obtain a checksum

Parsing Input

Each line of the puzzle input represents a box ID:

pbopcmjeizuhxlqnwasfgtycdm
pjokrmjeizuhxlqnfasfguycdv
pbokrpjejkuhxlqnwasfgtycdv
sbokrmjeizuhxaqnwastgtycdv
pbokrmjeizuhxljnnasfgtycn

We could read the lines one at a time, however I'm going to read them all into a list for re-use in part two.

This is feasible as the input file is small (6750 bytes)

with open("input.txt") as f:
    box_ids = [l.strip() for l in puzzle_input_f.readlines()]

The use of str.strip to remove trailing/leading whitespace isn't strictly necessary as Eric Wastl has provided us tidy, clean input data. However it's usually a good practice when dealing with external data where trailing/leading whitespace isn't desired.

Box ID Letter Frequency

A simple approach could be to iterate over the box ID and for each character search the remaining characters for repeat occurences:

frequencies = {}
for i, ch in enumerate(box_id):
    if ch not in frequencies:
        frequencies[ch]
    for j, ch2 in enumerate(box_id):
        if i != j and ch == ch2:
            frequencies[ch] += 1

This works, but we can do better! Some issues with this approach:

Inefficient O(n²) runtime caused by the nested loop. For each letter in the box ID, it 'looks' at every other letter.
The problem doesn't require keeping track of what the letters are, only their frequencies

To improve this, we can first reduce the box ID to it's unique letters, then count the occurences of each within the full box ID. Uniquifying a string in Python can be done by creating a set from the string (set will appear a lot throughout this series):

box_id = "pboxwmjeituhxlqnwasfgtycdv"
unique_letters = set(box_id)

Then we can use str.count to retrieve the frequencies for each unique letter:

freqs = [box_id.count(l) for l in unique_letters]

Comparing performance of the two by average execution time over 700000 loops:

Approach 1: 30.4 µs
Approach 2: 3.54 µs

Calculating The Checksum

With the list of letter frequencies, calculating the checksum requires finding the number of times a frequency is exactly two, the number of times a frequency is exactly three and then multiplying those counts together:

def part_1(box_ids):
    twos = 0
    threes = 0
    for _id in box_ids:
        unique_letters = set(_id)
        freqs = [_id.count(l) for l in unique_letters]
        twos += 1 if [f for f in freqs if f == 2] else 0
        threes += 1 if [f for f in freqs if f == 3] else 0
    return twos * threes

For my puzzle input, this gives a checksum of 6000

Part Two

We are told there are only two correct box IDs in the list and that they are the only two that differ by exactly one character. We are asked to find the common letters between these two box IDs.

For each box ID in the list, compare it with every other box ID as follows:
- Compare the letters at each position in the two box IDs
- If the letters differ, increment a counter
- If after comparing all the letters the counter is equal to one, return the two box IDs
  - We can return early here to prevent searching the remaining box IDs as we are told in the problem that there is only one pair of box IDs matching this invariant.

To compare each possible pair of box IDs we could use a double for loop:

for id_1 in box_ids:
    for id_2 in box_ids:
        # compare

However, I am going to use itertools.product to be more Pythonic:

import itertools

for id_1, id_2 in itertools.product(box_ids, box_ids):
    # compare

Then to count the differences between the letters of box ID pair:

diffs = 0
for i, letters in enumerate(zip(id_1, id_2)):
    if letters[0] != letters[1]:
        diffs +=1

The correct box IDs can then be found by breaking out of the loop when diffs is equal to one:

for id_1, id_2 in product(box_ids, box_ids):
    diffs = 0
    for i, chars in enumerate(zip(id_1, id_2)):
        if chars[0] != chars[1]:
            # we know these aren't the correct box IDs -> stop searching early
            if diffs > 1:
                break
            else:
                diffs += 1
    if diffs == 1:
        # id_1 and id_2 are correct
        break

Since only one letter is different between the two correct box IDs, finding the letters that are the same can be achieved by splitting the ID's at the position i where the differing letter is located and joining the remaining pieces together:

id_1[i:] + id_2[i:]

Putting it all together:

def part_2(box_ids):
    for id_1, id_2 in product(box_ids, box_ids):
        diffs = 0
        for i, chars in enumerate(zip(id_1, id_2)):
            if chars[0] != chars[1]:
                if diffs > 1:
                    break
                else:
                    diffs += 1
        if diffs == 1:
            return id_1[:i] + id_2[i:]

For my puzzle input, the result is "pbykrmjmizwhxlqnasfgtycdv".

DEV Community

Advent of Code 2018 Day 2: Inventory Management System

Part One

Parsing Input

Box ID Letter Frequency

Calculating The Checksum

Part Two

Resources

Top comments (0)

Read next

10 Python Scripts to Automate Your Daily Tasks

Building Enterprise Agent Systems: Core Component Design and Optimization

#? List vs Tuples in python

DUCK (file structure) YOU!