HackerRank is an excellent website to create code based on prompt challenges, prepare for coding interviews, search for jobs, and to see how the community has approached the solutions over time. The author wanted to dive into the Python focused solutions, and is in no way affiliated with HackerRank itself.
The Challenge: Mean, Median, Mode
From [10 Days of Statistics] Day 0: Mean, Median, and Mode:
Output Format
Print lines of output in the following order:
- Print the mean on a new line, to a scale of decimal place (i.e., , ).
- Print the median on a new line, to a scale of decimal place (i.e., , ).
- Print the mode on a new line; if more than one such value exists, print the numerically smallest one.
Sample Input
10 64630 11735 14216 99233 14470 4978 73429 38120 51135 67060
Sample Output
43900.6 44627.5 4978
The top-voted Python 3 solution came out to be:
Python 3 - Dont reinvent the wheel ;)
import numpy as np from scipy import stats size = int(input()) numbers = list(map(int, input().split())) print(np.mean(numbers)) print(np.median(numbers)) print(int(stats.mode(numbers)[0]))
To those who have been introduced to Python via data science courses and tools, this may seem like the solution one is looking for. Though, this is only the case if a project already includes the SciPy package.
Wait, Why Could This Be Bad Practice?
The scipy and numpy packages are third-party libraries, and they would have to be added to a requirements.txt
, setup.py
, or Pipfile
in order to make use of them in a project. This adds complexity by piling onto the software supply chain.1
Installing scipy (which includes installing numpy as a dependency) results in:
- Downloading ~45mb worth of files: >3000 files
- Introducing potential for vulnerabilities in a project
Just this year, numpy had an Arbitrary Code Execution (ACE) vulnerability raised around how it was unpickling-by-default with numpy.load
, which has since changed. The pickle module is known for this vulnerability risk, and has a big red warning about it in the Python docs.2
Using these third-party packages is overkill for a project that doesn't already contain the libraries, unless you'd really like to be on the lookout for long GitHub Issue conversations and Common Vulnerabilities and Exposures (CVE) database entries (such as CVE-2019-6446 in this case) where you try to decipher how big a problem this is if it even is a problem at all.
Using Standard Libraries
How can we solve this problem with standard libraries that come with Python?
# With standard lib imports only
from statistics import mean, median
def basicstats(numbers):
print(round(mean(numbers),1))
print(median(numbers))
print(max(sorted(numbers), key=numbers.count))
input() # Don't need array length, so ignore input
numbers = list(map(float, input().split()))
basicstats(numbers)
Detailed Code Breakdown
from statistics import mean, median
-
statistics
has been included with Python 3 since Python 3.4 (released in 2014). - We only want
mean
andmedian
from this library, so we are explicitly importing each rather than importing the entire library. - Why aren't we using
mode
fromstatistics
? This is becausemode
will error-out in cases where: "...if there is not exactly one most common value,StatisticsError
is raised."3- This is a problem, due to the last requirement of the challenge for mode output: "...if more than one such [mode] value exists, print the numerically smallest one."
input() # Don't need array length, so ignore input
numbers = list(map(float, input().split()))
- We do nothing with the first
input()
, which is meant to be a count of numbers being input in the second prompt. This is dropped because it is not needed in order to produce the mean, median, and mode output. - For
numbers
, let's start from the inside-most parentheses and move outword:-
input().split()
breaks apart the single-string input into a list of strings, assplit()
defaults to whitespace as the sep delimiter: "If sep is not specified or isNone
, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with aNone
separator returns[]
."4 -
map(float, input().split())
: Here,map()
is being used to convert the resulting list of strings into float type values. -
list(map(...))
: The reason we need to convert the map back into a list is becausemap()
returns an iterator. This means we can only call the elements within it once. If all we wanted was the median, for example, we wouldn't need to convert the map to a list type because we may not care about the values anymore after the median is returned.
-
NOTE: Instead of
list(map(...))
, we could use a list comprehension5 like so:numbers = [float(number) for number in input().split()]
This is argued as a better approach on StackOverflow,6 and if you are up for an interesting side note of history, you can read about how
map()
was nearly removed from Python 3 at one point.7
After we have our list of floats, basicstats(numbers)
is called, running the following:
def basicstats(numbers):
print(round(mean(numbers),1))
print(median(numbers))
print(max(sorted(numbers), key=numbers.count))
-
print(round(mean(numbers), 1))
from the inside-most parentheses and move outword to see what we are printing out:-
mean(numbers)
: Simply returns the mean without a third-party package! -
round(mean(numbers), 1)
rounds the resulting float to one number after the decimal point (per requirements).
-
-
print(median(numbers))
: Simply returns the median without a third-party package! -
print(max(sorted(numbers), key=numbers.count))
: how is this providing the mode?-
sorted(numbers)
: First, we need the list sorted as we are only meant to return the lowest-value mode if their is more than one value. This is needed formax(...)
to properly return the lowest value we want. -
max(sorted(numbers), key=numbers.count))
: Providingkey=numbers.count
as an arg is ensuring we get the value with the highest count within the list.max()
only returns a single value, so it will return the first value, being the lowest in the event that there is a draw (due to use usingsorted(numbers)
).
-
Optional Approach to Retrieving Mode: Using Counter()
Instead of max()
, we could alternately use Counter()
8 from collections
, which is argued to be a better approach to this problem.9 Counter() was added to the collections module way back with Python 2.7.0 (released in 2010):
# With standard lib imports only
from statistics import mean, median
from collections import Counter
def basicstats(numbers):
print(round(mean(numbers),1))
print(median(numbers))
# Optional approach to 'mode'
print(Counter(sorted(numbers)).most_common(1)[0][0])
input() # Don't need array length, so ignore input
numbers = list(map(float, input().split()))
basicstats(numbers)
-
Counter(sorted(numbers)).most_common(1)[0][0]
working from the inside, out:-
sorted(numbers)
is needs for the later call ofmost_common()
to return the lowest mode. -
Counter(...)
: Creates a dictionary with count values of all elements in the list. -
Counter(...).most_common(1)
: Returns a list of tuples. Using1
as an arg means it returns only one tuple, being the first value that appears the most often. -
Counter(...).most_common(1)[0][0]
: The first[0]
means we are calling the tuple in the0
index position of the list, with the[0]
calling the0
index value of that tuple.
-
Conclusion
There are many ways to come to a solution, and depending on the situation, some are better than others. If packages like scipy and/or numpy are already included within a project, it certainly makes sense to use them.
Though, it is a great idea to take a look at whether built-in or standard libraries can solve a problem before looking into third-party solutions. This helps you:
- Learn what Python is capable of out-of-the-box
- Make your code more portable for use in other projects without installing additional resources
- Reduce the security complexity of the software supply chain1 by avoiding unnecessary inclusion of third-party packages
Was this helpful? Have thoughts to add? Please add to the conversation below!
Originally published at https://icanteven.io on October 5th, 2019
-
Software Supply Chain: Fewer, Better Suppliers. Written by Shannon Lietz @ DevSecOps, 2016 ↩
-
Comprehending Python's Comprehensions. Written by Dan Bader @ dbader.org ↩
-
The Fate of
reduce()
in Python 3. Written by Guido van Rossum, 2005. NOTE: He's the creator, and previous BDFL, of Python. The article includes thoughts on map(), filter(), and lambda. ↩ -
StackOverflow: Python - Find The Item with Maximum Occurrences in A List. ↩
Top comments (0)