zerosleeps

Since 2010

mediaanalysisd has gone rogue

So mediaanalysisd has gone rogue in my installation of macOS Sequoia 15.5 (24F74). I noticed it about a week ago, but I don’t know when it started. I’ve set my M1 MacBook Pro to not sleep when plugged in, and every time I woke the displays up I could see - thanks to iStat Menus - that something had been very consistently using about 20% of available CPU since the last time I’d used the machine. Activity Monitor made it easy to find the culprit: mediaanalysisd was showing as clocking up hundreds of hours of CPU time.

I’ve been able to resolve this by excluding the directory where I store all my development work (/Users/scott/Developer/) from being indexed by Spotlight. This is obviously not a fix, it’s a workaround for what is very definitely a bug: mediaanalysisd is throwing errors for the same handful of files repeatedly and constantly. Just random files stored in my home directory - there’s nothing wrong with them. My Photos library contains about 12,000 items and I’ve got hundreds of other images and videos strewn about my home directory, but for whatever reason it’s just the few locations mentioned below that are resulting in errors.

I will report this to Apple, but I’m not sure how, and I have no expectation of a response which makes the process feel a bit futile. But pop “mediaanalysisd” into your preferred search engine and you’ll get hundreds of hits for Reddit posts or Apple Community posts from people asking what the hell mediaanalysisd is and why it’s consuming so many system resources. I can’t find any documentation about this process, it doesn’t have a man page, and none of the errors it outputs are actionable. I am not in control of this particular corner of my Mac, and I do not like that.

The only thing I can find that looks remotely like my particular problem is from this Mastodon thread by user Glyph back in February 2025. A couple of extracts:

Okay after looking at fs_usage from this process, the stuff it is repeatedly scanning is just … some images I have in my Documents folder, and … the image resources attached to various Python installations?!!

And:

…when you deploy software to an operator, even if that operator is relatively non-technical, you MUST supply some sort of operator-facing surface that makes its behavior legible.

Nicely said. Anyway, here’s a bit more about my thing. This is after a couple of days of futzing around with different tools trying to get some actionable data. It’s also worth pointing out that the paths mentioned below haven’t changed for months, sometimes years. I didn’t suddenly add hundreds of new media files to my home directory or anything like that.

So in one Terminal window I ran sudo fs_usage mediaanalysisd | grep 'open.*Users\/scott\/'. That will start outputting all filesystem activity generated by a process called “mediaanalysisd”, filtered to just lines which contain “open” events for stuff in my home directory to reduce as much noise as possible.

In another terminal window: log stream --process mediaanalysisd. That streams anything sent to the system log by mediaanalysisd.

The numbers below were taken after running those commands and walking away from my Mac for about 3 hours.

  • log stream --process mediaanalysisd output 1,641,196 lines, 992,607 (60%) of which were errors
  • The fs_usage/grep command output 271,375 entries, but only for 675 unique files:
    • The entire content of /Users/scott/Developer/[redacted]/public/images/photos/ (320 images)
    • The entire content of /Users/scott/Developer/[redacted]/public/images/thumbnails/ (320 images)
    • The entire content of /Users/scott/Developer/[redacted]/Screenshots/ (6 images)
    • favicon-sized assets from 4 other projects in /Users/scott/Developer/

I’d have thought all the files mediaanalysisd has a problem with would have had roughly the same number of hits, but not so. There are about 15,800 lines for each of the following 17 files:

  • /Users/scott/Developer/[redacted]/core/bulma/docs/favicons/favicon-16x16.png
  • /Users/scott/Developer/[redacted]/[redacted]/static/favicon-16x16.png
  • /Users/scott/Developer/[redacted]/[redacted]/static/favicon-32x32.png
  • /Users/scott/Developer/[redacted]/core/bulma/docs/favicons/favicon-32x32.png
  • /Users/scott/Developer/[redacted]/Screenshots/SNMAINS.GIF
  • /Users/scott/Developer/[redacted]/Screenshots/SNFRONTS.GIF
  • /Users/scott/Developer/[redacted]/Screenshots/SNFULLS.GIF
  • /Users/scott/Developer/[redacted]/public/favicon-16x16.png
  • /Users/scott/Developer/[redacted]/public/images/preloader.gif
  • /Users/scott/Developer/[redacted]/public/images/default-skin.png
  • /Users/scott/Developer/[redacted]/public/favicon-32x32.png
  • /Users/scott/Developer/[redacted]/core/static/core/favicon-32x32.png
  • /Users/scott/Developer/[redacted]/core/bulma/docs/assets/images/bulma-type.png
  • /Users/scott/Developer/[redacted]/core/static/core/favicon-16x16.png
  • /Users/scott/Developer/[redacted]/core/bulma/docs/assets/images/patreon.png
  • /Users/scott/Developer/[redacted]/core/bulma/docs/assets/brand/Bulma Logo.png
  • /Users/scott/Developer/[redacted]/core/bulma/docs/assets/brand/Bulma Icon.png

And only 2-or-3 entries for each of the other files that show up. It’s interesting that the 17 files which account for almost all of the errors are all small favicon-or-logo-style things.

If I remove what are obviously identifiers from the 992,607 errors output by log stream I end up with just 15 unique errors:

  • 918,968 occurrences of “Embedding version: 0 not supported, skip embedding publishing”
  • 21,127 occurrences of “Image has invalid or too small dimensions (1x1)”
  • 21,127 occurrences of “Failed to decode image”
  • 21,127 occurrences of “Failed to load Scene Taxonomy for analysis version: 0. Unable to translate scenes.”

Most of the rest of the errors are “Preparaing to restart query” (Apple’s typo, not mine).

I assume the last three counts are all for the same problem - one issue results in three errors? That error sounds reasonable as well - if the image is 1×1 skip it. But why is mediaanalysisd revisiting the same file thousands of times an hour? Is the processing done in batches, where the whole batch is dropped if any one thing inside it fails? That could explain all the “Preparaing [sic] to restart query” lines I guess?

The top count - 918,968 occurrences of “Embedding version: 0 not supported, skip embedding publishing” - is the one I want to know more about. I have no idea what that means. It’s unlikely I’d be able to do anything about this even if I did understand the error because like I say, there’s nothing wrong with any of the problematic images and even if there was mediaanalysisd should gracefully handle the problem and not simply thrash them over and over and over. But I’m a developer and I’m curious. I want to know what’s stuck in its craw.

Also, presumably mediaanalysisd relies on Spotlight’s indexes, or maybe it just uses the same ignore-list, which is why excluding the directory at the root of all of these errors makes this go away (at the expense of making Spotlight much less useful for me)?

I’ll post a follow-up if I ever get an answer to any of the above. Don’t hold your breath.

(PS folders called “Developer” in your home directory get a nice icon in Finder, which is why I’ve called mine that.)

Natural keys

Most of the discussion on Hacker News about this article by Eduardo Bellani is about the use of natural keys when designing databases.

I also disagree with the use of natural keys. The real world is messy and it changes all the time, including and especially the things that you’d think would never change. Education (and I’m sure all industries) is obsessed with giving things codes that mean something, and as a result the software I develop against uses natural keys almost exclusively.

It’s a pain in the arse.

This is one of those self-propelling death spiral things. People want codes to mean something because nobody’s ever shown them the alternative: design objects with attributes that describe the object and nobody will give a shit what primary key the database uses. But if the enterprise software you’ve bought doesn’t allow that (and perhaps even encourages some kind of coding convention) then nobody will ever see the alternative which puts us back at the start of the cycle.

JavaScript Temporal

I am very excited about this. The opening paragraphs of MDN’s documentation are perfect: they explain the problem, state how JavaScript ended up where it is, and outline the solution before launching in to the details.

Dates and times are hard.

systemd.timer

This article about systemd timers made it to the front page of Hacker News today. I always get a little bit upset when I see documentation or advice that talks about defining a cron job, as if cron is the only way of scheduling tasks. Cron served us well, but if you know what cron and systemd are you should read the documentation for systemd timer units.

This:

OnCalendar=Thursday 18:00 Australia/Melbourne

Is so much nicer than:

0 7 * * 4

Now I will conceded that as with most of the components of systemd, the manuals don’t make it very easy to discover all the magical things you can do with unit files, but the few minutes I took several years ago doing just this have saved me way more time since then.

Bonus tips: systemd-analyze calendar is a cute little tool to know about too:

$ systemd-analyze calendar "Thursday 18:00 Australia/Melbourne"
  Original form: Thursday 18:00 Australia/Melbourne
Normalized form: Thu *-*-* 18:00:00 Australia/Melbourne
    Next elapse: Thu 2025-03-20 07:00:00 UTC           
       From now: 6 days left

Plus! systemctl list-timers gives you a nice list of loaded timers with human-readable “last run” and “time to next run”, and systemctl status timername.timer works as you’d expect as well, including a tail of the output of the last run. You get all of that for free because the timers trigger regular systemd units, which in turn means (by default) output gets sent to the systemd journal. It also means you can manually trigger the same unit at any time and know it will behave the same whether run by a timer or not.

Way better than cron. Fight me.