The past couple days in the US, the Democrats have been debating about who should get to run against Trump. The transcripts seemed like a fun subject for dataviz.
All the code for these visualizations are posted here in various commits.
Firstly I thought it would be helpful to get a simple bar showing how much the candidates spoke.
Note: as you'll see, I didn't take time to ensure a perfect cleanse of the data. There are some artifacts and errors, which will be obvious in the word clouds.
I was also surprised to find that if you create a TF-IDF based distance matrix...
... The speakers sort themselves out nicely. The lowest-polling person I've seen described as T1 is Mayor Pete, and the pattern holds whether or not you count him as T1.
Does this mean anything? I don't think so, at least not all on its own.
Finally here are some word clouds:
Overall I think this was a fun little exercise, but I don't suspect that it says too much about the race.
Let me know what you think! Especially if you notice a mistake.
Top comments (6)
The word clouds are really difficult to read due to the colors but regardless this is great!
Thanks for the feedback! I also did a light-themed version but in my quick testing I felt that it looked worse. You can check it out below:
Let me know if you think that's better
Probably easier to read, but both versions have contrast issues. The purples on the dark background, the yellows on the light background.
Additionally, word clouds are a bit difficult to read generally.
Yea I agree with that. The Python wordclouds package does the best it can.
Any chance we could get links to large format images? They look interesting but are small (when I click on them). I esp. like the word cloud concept.
Thanks! The source images are in the kaggle link I shared at the top.
kaggle.com/charleslandau/democrati...