DEV Community

Cover image for The Weird World of Missing Values in Pandas
Jeff Hale
Jeff Hale

Posted on

The Weird World of Missing Values in Pandas

If you use the Python pandas library for data science and data analysis things, you'll eventually see NaN, NaT, and None in your DataFrame. These values all represent missing data. However, there are subtle and not-so-subtle differences in how they behave and when they appear..

Let's take a look at the three types of missing values and learn how to find them.

NaN, NaT, and None

NaN

If a column is numeric and you have a missing value that value will be a NaN. NaN stands for Not a Number.

NaNs are always floats. So if you have an integer column and it has a NaN added to it, the column is upcasted to become a float column. This behavior may seem strange, but it is based on NumPy's capabilities as of this writing. In general, floats take up very little space in memory, so pandas decided to treat them this way. The pandas dev team is hoping NumPy will provide a native NA solution soon.

NaT

If a column is a DateTime and you have a missing value, then that value will be a NaT. NaT stands for Not a Time.

None

A pandas object dtype column - the dtype for strings as of this writing - can hold None, NaN, NaT or all three at the same time!

What are these NaN values anyway?

NaN is a NumPy value. np.NaN
NaT is a Pandas value. pd.NaT
None is a vanilla Python value. None

However, they display in a DataFrame as NaN, NaT, and None.

Strange Things are afoot with Missing values

Strange Things are Afoot gif

Behavior with missing values can get weird. Let's make a Series with each type of missing value.



pd.Series([np.NaN, pd.NaT, None])


Enter fullscreen mode Exit fullscreen mode
0   NaT
1   NaT
2   NaT
dtype: datetime64[ns]
Enter fullscreen mode Exit fullscreen mode

Pandas created the Series as a DateTime dtype. Ok.

You can cast it to an object dtype if you like.



pd.Series([np.NaN, pd.NaT, None]).astype('object')


Enter fullscreen mode Exit fullscreen mode
0    NaT
1    NaT
2    NaT
dtype: object
Enter fullscreen mode Exit fullscreen mode

But you can't cast it to a numeric dtype.



pd.Series([np.NaN, pd.NaT, None]).astype('float')


Enter fullscreen mode Exit fullscreen mode


    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    <ipython-input-255-66ec4de18835> in <module>
    ----> 1 pd.Series([np.NaN, pd.NaT, None]).astype('float')

 ...


    TypeError: cannot astype a datetimelike from [datetime64[ns]] to [float64]


Enter fullscreen mode Exit fullscreen mode

Also note that you can change an object column with Nones into a numeric column with pd.to_numeric. No problem.

Equality Check

Another bizarre thing about missing values in Pandas is that some varieties are equal to themselves and others aren't.

NaN doesn't equal NaN.



np.NaN == np.NaN


Enter fullscreen mode Exit fullscreen mode


    False


Enter fullscreen mode Exit fullscreen mode

And NaT doesn't equal NaT.



pd.NaT == pd.NaT


Enter fullscreen mode Exit fullscreen mode


    False


Enter fullscreen mode Exit fullscreen mode

But None does equal None.



None == None


Enter fullscreen mode Exit fullscreen mode


    True


Enter fullscreen mode Exit fullscreen mode

Fun! 😁

Now let's turn our attention finding missing values.

Finding Missing Values with df.isna()

Use df.isna() to find NaN, NaT, and None values. They all evaluate to True with this method.

A boolean DataFrame is returned if df.isna() is called on a DataFrame and a Series is returned if called on a Series.

Let's see df.isna() in action! Here's a DataFrame with all three types of missing values:

DataFrame with all three types of missing values

Here's the code to return a boolean DataFrame with True for missing values.



df.isna()


Enter fullscreen mode Exit fullscreen mode

boolean DataFrame image

A one-liner to return a DataFrame of all your missing values is pretty cool. Deciding what to do with those missing values is a whole nother question that I'll be exploring in my upcoming Memorable Pandas book.

Note that it's totally fine to have all three Pandas missing value types in your DataFrame at the same time, assuming you are okay with missing values.

Wrap

I hope you found this intro to missing values in the Python pandas library to be useful. 😀

If you did, please do all the nice things on Dev and share it on your favorite social media so other people can find it, too. 👏

I write about Python, Docker, and data science things. Check out my other guides if you're into that stuff. 👍

You don't want to MISS them! (Missing values. Get it?) 🙄

Thanks to Kevin Markham of Data School for suggestions on an earlier version of this article!

Top comments (2)

Collapse
 
seungjunohclt profile image
seungjun-oh-clt

thank you so much sharing your good idea

Collapse
 
rodrigoplpz profile image
Rodrigo Piña Lépiz

It was too clear. Thanks man.