Lessons learned from 6 months of operating a teensy-tiny news archive

The best websites are home-cooked meals. Andrew’s Selkouutiset Archive was birthed after I realized there was no obvious way to fetch the previous articles of the “Easy Finnish” daily news broadcast. This annoyed me as a student of the language. “Here we have a stream”, I thought, “of high-quality, human-written, interesting practice material, and no easy way to access it!” So I went out of my way to create such a way, and me and my language skills have been profiting off of it ever since....

June 1, 2024

fd + xargs + bat = quick document review

I’ve been on vacation this week, and part of what I’ve been up to is fixing up the Selkouutiset Archive. Like most of my websites these days, SA is powered by Hugo, which means handling a lot of Markdown documents, which means I opted to use an intermediate Git repo as a submodule to actually store the custom-processed documents. After a few tweaks here and there, I found myself wanting to quickly flip through all of the Markdown documents I had generated for each news day....

February 25, 2024

Outage postmortem: Why didn't SelkoArchive get today's news?

For some reason, my daily archive of Finland’s “clear news” broadcast didn’t work today. Why not? TL;DR: Just a Git snafu. Quick recap of the archive: There are 3 Git repos: selkouutiset-scrape simply scrapes the HTML https://yle.fi/selkouutiset at 6 PM every day via a Github Action. selkouutiset-scrape-cleaned pulls in scrape and turns it into a stack of translated, properly-named Markdown files, by the magic of shell scripts, systemd timers, and a tiny VM in a datacenter somewhere....

December 1, 2023