I'm happy to announce the first public release of Hippotable — a tool that lets you analyze data without leaving your browser, on desktop & mobile.
I often analyze small- to mid-sized datasets for work and for fun — e.g. to find out the distribution of a certain bug by platform, or calculate unique affected users. But what tools do I have to help me here?
-
Bash lets you
uniq | wc -l
— handy, but making advanced pipelines is hard. - Google sheets does the job, but struggles above 10K rows due to all the cruft, and using it for sensitive data such as personal budgets or user data is a no-no.
- Python + jupyter + pandas is up to any data problem, but it's overkill for my simplistic use cases, and requires a lot of code.
So I set out to build a simple browser-based tool to do the job. Hippotable can:
- Open CSV files up to 100 Mb in size.
- Scroll though thousands of rows.
- Filter and sort your data in real time.
- Aggregate / groupby data to gain deeper insights.
- 🏗️ Build powerful data pipelines with multiple filter / aggregate steps.
- Share results with CSV export.
It's also free and open source.
Example
Now, let me walk you through an example of analyzing an annotated movie dataset from kaggle. Let's start simple and see which countries, on average, make the best movies. Group by country, sort by average rating:
Hm, this looks like a selection of countries which happened to co-produce a decent film once, not that interesting. Let's try again, removing countries that have <10 movies:
Now that's unexpected! In case you're curious, lots and lots of bad films come from Italy:
Combining multiple filter and aggregation layers enables really powerful processing pipelines. For example, here are countries that were home to most great directors (see, not all is lost for Italy):
That's it for today! Give hippotable a try and star on GitHub to help spread the word. Join me next time to learn about the amazing tech I used to make this happen.
Top comments (0)