F1-GOAT.com
This one started as a genuine data science problem and ended as a voting website. That’s not quite the failure it sounds like.
The question I wanted to answer was one that comes up constantly in Formula 1 circles: who is the greatest driver of all time? I was sick of the argument, not because it isn’t interesting but because most people were making it wrong. You can’t compare a 1962 driver with a 2022 driver and pretend the numbers mean the same thing. Different cars, different points structures, different number of races per season, different eras of safety. The early drivers were talented beyond measure and also, frankly, dying at a rate that puts the modern debate in a fairly sharp perspective.
I wanted to normalise all of it. Take the Kaggle dataset, clean it up, and ask questions like: what would a 1962 season look like mapped onto a 2022 points structure? What would modern championship tallies look like if only the top five finishers had scored points? Strip away the era and see what’s underneath.
It was considerably more work than I’d anticipated.
I started in Jupyter, using the Python I knew. Hit the limits of what I knew fairly quickly. Switched to R Studio, tried the same problems from a different angle. Hit those limits too. The analysis I wanted to do was ahead of what I could execute at the time, and I didn’t have the same quality of AI assistance available that I’d take for granted now. So I scaled back.
What I ended up with was still interesting — Python-generated visualisations of things like wins by constructor, wins by engine manufacturer, wins by country of driver and team, rendered as bar chart races. Genuinely fun to watch. And alongside all of that, a simple mechanic: vote for who you think is the GOAT, and vote for who you think is the TOAD.
The site was built in Vite, Firebase on the back end, no authentication required for voting. That last decision came back to bite me. One day I had a few hundred votes. The next I had somewhere between twenty and twenty-four thousand, with no corresponding traffic spike that could explain it. Someone had figured out the Firebase store and decided to have a bit of fun. The data was ruined.
I did promote it — posted about it on Twitter more times than I’d care to admit, got some real visitors, got some real votes before the stuffing incident.
The domain has since lapsed. Someone else has it now.
What I got out of it was a proper introduction to data cleaning, a working knowledge of R alongside Python, some experience with data visualisation that went beyond bar charts in a notebook, and a reminder that if you’re storing votes in Firebase you should probably make people log in first.
