zerosleeps

Since 2010

Search!

Something that bugged me for the entire duration that this site was just a pile of static files was that I didn’t have a good solution for search. The previous incarnation of zerosleeps did have a search form on every blog page, but it just sent the user off to DuckDuckGo and performed a “site:zerosleeps.com” search there.

The results were never great, presumably because DuckDuckGo had no reason to make a decent index of a tiny site like this. Plus it was an external dependency which I’m never a fan of.

Well that changes today: blog posts are now searchable entirely in-house. I’m using PostgreSQL’s full-text search functionality, which is made a little easier by Django’s support for it in django.contrib.postgres. Just blog posts at the moment, but I want to get the reading log in there as well and have one global search.

Post titles carry more weight than the post body, and the results are sorted by whatever rank PostgreSQL comes up with. I’m also using the “headline” function to show the most relevant snippets. Behind the scenes it looks a little something like this:

vector = SearchVector("title", weight="A") + SearchVector("body")
return (
    Post.objects.annotate(
        search=vector,
        rank=SearchRank(vector, query),
        headline=SearchHeadline(
            Concat("title", Value(" "), "body"), query, max_fragments=2
        ),
    )
    .filter(search=query)
    .order_by("-rank")
)

It’s not perfect but it’s absolutely good enough. Better than good enough. Makes me glad I chose PostgreSQL over my other choice - SQLite - as well. SQLite does have full-text search built-in, but from what I can tell it involves creating virtual tables and keeping them up-to-date with the real content. Seems messy. This solution instead boils down to just one (slightly verbose and repetitive but who cares) SQL query which contains:

SELECT
    ts_rank(
        (
            setweight(to_tsvector(COALESCE("blog_post"."title", '')), 'A')
            ||
            to_tsvector(COALESCE("blog_post"."body", ''))
        ),
        plainto_tsquery('search term')
    ) AS "rank",
    ts_headline(
        CONCAT("blog_post"."title", ' ', "blog_post"."body"),
        plainto_tsquery('search term'),
        'MaxFragments=2'
    ) AS "headline"
FROM
    "blog_post"
WHERE
    (
        setweight(
            to_tsvector(COALESCE("blog_post"."title", '')), 'A')
            ||
            to_tsvector(COALESCE("blog_post"."body", ''))
    ) @@ (plainto_tsquery('search term'))
ORDER BY "rank" DESC;

Reading log for 2022

Reading log summary for 2022: just 22 books completed, continuing my downward yearly trend. I abandoned an additional two.

The first half of the year was a real mixed bag which probably put me off books a bit. Plenty of higher-than-average ratings in the second half of the year though.

Powered by Django

zerosleeps.com is now served by my very own Django application šŸ„³

Since first mentioning it 14 months ago, and posting about it several times since, it turns out I needed to actually sit down for a few hours and build the friggin’ thing. Who knew?!

Still a lot to do - I need to re-upload images for posts that have them. You’ll see there’s no favicon, and page titles are non-existent. I haven’t build my reading log yet, which I’m personally most excited about. And the database isn’t being automatically backed up yet, but that’s no big deal as so far it’s only this post that would be lost in a disaster.

Plus RSS/JSON feeds, maybe a site map, and site search. And the ability for me to create draft posts. And maybe different post “types”, like quick posts that don’t have a title, or image-only posts.

But at least the barrier of deploying this thing is now behind me.

Site rebuild update

A little update on my rebuild of zerosleeps.com:

  • Post migration was sorted a while ago. Python-Markdown is super easy to work with
  • Code blocks are handled by the migration as well. Pygments is also a delight to work with.
  • My migration script also takes care of Jekyll image tags, converting them to plain Markdown image tags.
    • The image files themselves are going to be trickier because the current site uses a mishmash of image sizes and thumbnails. I’ve got things working for the happy-path, and will probably just have to manually update the markdown for anything else.
    • Not sure how I’m going to model post images in Django either. Should I bother with a model for them? I suppose so, or I won’t be able to leverage Django’s tools for uploading files. But do they belong to a post, or are they their own thing?
  • Post URLsā€¦ I said I wanted to change the structure of post URLs, but that I also want to keep the existing structure for compatibility. Well if I’m going to do the latter why bother with the former?
    • I didn’t think about post slugs though, which are just URL-safe copies of the post title. I have toyed with creating a custom Transform in Django to avoid storing post title and slug, but safer (and more flexible) to just add a slug field to my Post model.
  • “Static” pages, i.e. CV and About took 30 seconds each - they’re both just using TemplateView.

Oh and I’m using Bulma for the whole lot - I am not a designer!

Once I get post URLs sorted I reckon I’ll “go live” as they say in the business. If I don’t then I’ll never deploy the thing. Everything else can follow later: search, tags and categories (maybe?), visitor statistics, JSON and XML feeds, maybe a sitemap. There’s more to a stupid blog than you’d think.