DEV Community

Lillian McVeigh for Codecademy

Posted on

A Day in the Life: Catherine Zhou, Codecademy Data Scientist

This episode of A Day in the Life features Catherine, Codecademy's Data Science Lead. In addition to the breadth of her data science expertise, Catherine is an avid traveler, a keen observer, and a Chemex coffee drinker.

Meet Catherine!~ ☕


1. Tell us a little about yourself.

Hi, I’m Catherine! I manage the data science team at Codecademy, where I was hired as the company’s first in-house data scientist.

At Codecademy, we’ve accumulated 10s of millions of users over the years -- this means we have petabytes of data (if you don’t know how big a petabyte is… it’s a lot). Our job is to find patterns within the data to help us understand our learners and make smarter business decisions. My programming toolbox includes SQL, R, and sometimes Python.

I was born and raised in NYC and never really left! In college, I majored in Sociology and learned about relevant research methods: experimental design, data collection, data cleaning, and statistical models. That said, I had to do a ton of outside learning to keep up with new technologies and frameworks. You can learn all the necessary skills in school, but there’s a ramp-up period learning how to tie it all back to the real world.

2. How did you end up working for Codecademy?

About two years ago, a recruiter reached out to me about a role as Codecademy’s first data scientist. I had been working for large Fortune 500 companies for a long time, where teams specialize in one research area or domain with little cross-departmental collaboration. A lot of the data infrastructure was silo'd, which meant I couldn't readily access data from other parts of the company. I wanted to move back into a data science generalist role at a smaller startup company, where I could work on a variety of things. I’ve always been interested in education and the future of technology, so the problems Codecademy was working on really resonated with me.

3. Did you always want to be a Data Scientist?

Although the term “Data Science” is fairly new, statistics-based quantitative research has existed for a long time. In high school, I was super interested in how statistics can be used to build a greater understanding of people and society. I liked reading books and articles that cited studies and surveys to help contextualize current events.

This might be weird, but I was also really into probabilistic thinking and used to think about how it applied to my day-to-day decisions. I would try to calculate things like: if I miss this traffic light, what are the chances I’ll miss the next two lights? How much longer would that lengthen my commute?

I knew pretty early on that I wanted to work in a research role that involved analyzing human behavior, but I didn't know what kinds of jobs like that were out there. I used statistical computing software for the first time in high school, but I didn't become serious about programming until after college. I've worked a lot of different jobs in the past: law firm filing, patent copyeditor, community organizer, bike mechanic.

4. What are the best aspects of working as a Data Scientist?

The best moments are when you find something in the data to get other people excited. A lot of people think the most exciting part of data science is building machine learning models. It’s true that getting a model up and running can feel like a personal accomplishment, but that work isn't always interesting/accessible to other people around you. The best parts of the job are the inspiring human moments such as uncovering that we have thousands of active Codecademy users coming from K-12 schools as near as Chicago Public Schools, to all the way around the world in New South Wales Public Schools (Australia).

5. What are the worst aspects of working as a Data Scientist?

I think there are a lot of misconceptions about what data science is, what skills it takes to be a data scientist, and the role of data scientists in the workplace. Data science, as it exists today, is a pretty amorphous space. This can cause confusion and mismatched expectations between data scientists and non-data scientists. It also can lead to imposter syndrome and confusion about career trajectory (I talk to so many people working in data science who aren’t sure if they can call themselves data scientists!). I'm optimistic that this will improve over time, but in the meantime, a big part of my job includes educating others about the field of data science and aligning on expectations.

6. If you could make one piece of fictional tech reality, what would it be?

Oof I’m a pretty realistic person, so this question is hard for me lol. I guess it would be that piece of magic in Harry Potter where Dumbledore extracts his memories to preserve them (pensieve). I have a pretty bad memory and I love learning -- it would be so useful if archiving memories was a real thing.

7. Do you have any advice for the learners?

I’ll recycle this tweet from Nate Silver: “A few times a year, I get asked to be a judge of student statistical projects in politics or sports. While the students are very bright, they spend WAY too much time using fancy statistical methods and not enough time framing the right questions and contextualizing their answers."

I often see people prematurely dump their data into black box models or neural nets without doing any of the necessary exploratory analysis and general background research on where the data comes from. As data scientists, our job is to ask and answer the right questions about the data. This entails taking the time to understand the context and using that information to shape your approach. Make sure your results are interpretable and grounded in the right context. Be prepared to justify any assumptions you make about your data.

8. If you could make one brand new course what would it be?

Causal Inference (with R)!

9. What does a typical day look like for you?

Mornings

  • We’re currently building out the data science team at Codecademy, so I’ve been doing a lot of hiring calls in the mornings. I’m not a morning person, so I try to do the type of work in the mornings that I can ease into.

Lunch

  • We’re super lucky in that we get catered lunch most days. This means I don’t have to deal with Manhattan lunch crowds or throw leaky tupperware in my backpack. I try to spend time over lunch chatting with coworkers. Codecademy is a pretty quirky and nerdy place to be. My coworkers keep me inspired and motivated.

Afternoons

  • At most tech companies, technical managers are expected to do a mix of IC (individual contributor) and people management work. Mondays and Tuesdays are heavy meeting days for me. This is when I have my 1-on-1s and meetings with other teams to align on projects. As a manager, it's my job to make sure my team is feeling focused and engaged, as well as balancing their needs with the needs of the company.
  • As for IC work, I typically work on this Wed-Fri. Examples of projects we've worked on are: building a model on user engagement and retention, redesigning our database and ETLs to improve speed/performance, designing experiments and choosing the right statistical tests to evaluate results.

After Work

  • I usually leave the office around 5-7 pm depending on my workload. On Tuesdays, we have a Codecademy soccer team and I stay a little later since games are between 7-10 pm. Other days, I go home and relax with my partner or hang out with friends/family.

A fun fact about me is:

I didn't get to travel growing up, but I have visited 40 countries over the past 7 years. I travel a few times a year, and I try to take my parents with me at least once a year.

I mention this because my last piece of advice is to make sure your life isn't unilaterally defined by your work/education goals. This is the quickest way to burnout. Make sure you take time for yourself, friends, and family. And take a step back from time to time to reflect on the progress you've made!

Catherine giving a talk

Catherine speaking at the NYC Open Statistical Programming Meetup.

Cathering with coworkers at New York R Conference

Catherine (right) with Codecademy Curriculum Developers, Natalia (left) and Ian (middle) at the New York R Conference.

Top comments (0)