zerosleeps

Since 2010

Ubiquiti USG: the conclusion

Predictably and frustratingly, Ubiquiti’s support staff have just shrugged their shoulders at my USG’s bizarre failure, and pointed me at a document explaining how to return it to them, even though the device is over two years out of warranty (which I was careful to point out in my first contact with Ubiquiti) and therefore not eligible for their RMA process.

I knew going into this that there wasn’t a hope in hell I’d get to talk to someone who could at least hypothesise about what’s gone wrong with my little gateway. Could have been a fun project to replace the busted chip/capacitor/whatever.

CalDigit TS3 Plus, Logitech StreamCam, and macOS

It’s another of my “putting this here in case it’s useful to someone” posts.

I have Logitech’s StreamCam, and when I connect it to one particular port on my CalDigit TS3 Plus - which is in turn connected to my MacBook Pro (16-inch, 2019) running macOS Big Sur 11.1 - I can receive audio from the StreamCam, but no video.

I contacted CalDigit about this, and they got me to triple check the port combination:

MacBook Thunderbolt port TS3 port Result
Back left Back Thunderbolt port OK
Back left Back USB Type-C port No video
Back left Front USB Type-C port OK
Front left Back Thunderbolt port OK
Front left Back USB Type-C port No video
Front left Front USB Type-C port OK
Back right Back Thunderbolt port OK
Back right Back USB Type-C port No video
Back right Front USB Type-C port OK
Front right Back Thunderbolt port OK
Front right Back USB Type-C port No video
Front right Front USB Type-C port OK

I also confirmed that the back USB Type-C port works a treat with other devices I connect to it.

CalDigit’s response:

This appears to be a conflict in macOS between the Streamcam and the 10gbps Controller on the dock. The USB-C 10gbps port and the USB-A port beside it are part of a different controller than the other ports on the dock. I’m afraid we don’t have another workaround aside from connecting to one of the other USB-C ports on the dock. As the drives are working through the same port, its likely related to compatibility over macOS.

Thumbs up to CalDigit support by the way. Fast, friendly, and knowledgeable.

Dead (but not dead) Ubiquiti UniFi Security Gateway

This is the story of my USG, which has failed in the weirdest way. The summary, which is repeated at the end of this post, is that after years of flawless operation it suddenly stopped communicating with the rest of my LAN via one particular port, but it can still send data on that port, has other ports which function perfectly, and the problem is definitely not software-related.

This isn’t a how-to or anything, because I can’t remember every command I typed or setting I changed, but I’m putting the general gist out there because it might help some poor sod someday. If you’re hoping for a solution, stop reading, but if you know anything about network equipment, please send help. I sent out a couple of cries for assistance about this on Ubiquiti’s community forums and on reddit, but crickets.

Basic troubleshooting

So one afternoon we suddenly lost internet connectivity. First thing I did was jump onto my UniFi Controller to see where the problem lay, and it told me that the USG had failed to adopt. Since the device was previously working fine, that status was UniFi saying “it’s disappeared from your network”.

I can honestly say that in three years of running Ubiquiti equipment I’ve never had to reboot or reset anything. In fact, it was because I got so fed up with flaky crappy all-in-one consumer “routers” that I went all-in on Ubiquiti in the first place. Anyway, that’s where I started - pull power from the USG, count to 10, plug it back in.

All indications were good: all the status lights and Ethernet link lights illuminated, flashed, and changed colour as they should, but the Controller still couldn’t find the USG.

It’s worth pointing out that as well as routing traffic between WAN and LAN and performing firewall duties the USG also provides DHCP services, so LAN traffic was starting to fail as well: new clients couldn’t join the network and existing DHCP leases couldn’t be renewed. Fabulous.

I tried a few reboots of the entire network stack with no change, so I dug out a paper-clip and did a hardware reset of the USG. Again, all the lights did the correct thing, and the USG appeared ready to adopt, except… it still didn’t appear to be connected to the network. Sure it was physically connected - OSI layers 1 and 2 looked good - but nothing above that. After a reset of that nature the USG sets it’s LAN port to 192.168.1.1/24 and runs a DHCP server, so at this point it should have been dishing out DHCP leases and routing traffic, even without adoption.

Corrupt storage?

I then ended up on a path to nowhere. I discovered that the USG boots from a USB drive tucked away inside it. There were lots of grumbles online about these internal drives failing. I was reasonably confident that wasn’t my problem, because my USG was booting. Well the status lights certainly indicated it was booting, and if the boot drive had failed it would have been more obvious, but with no way of communicating with the device via Ethernet I didn’t know for sure. So I replaced the internal drive, farted about with dd and .img files and blah blah blah. Didn’t make the blindest bit of difference so I’m not going to dwell on this part of story. Sure did learn a lot about USB drive initialisation times and Octeon boot commands though…

There’s nothing wrong with this thing

I bought a rollover cable from eBay. The USG has a console port, and with no packet data it was my only hope of making progress. The cable arrived, and immediately revealed that the USG was indeed properly booting, all services were running, and it correctly reported eth1 link status. It was also connected to the WAN (the USG is a regular DHCP client on that side of the network) and able to communicate with the outside world.

What the hell? At this point I’m starting to suspect a bad port, despite positive link status. But that can’t be a thing that happens, can it? Let’s see if we can confirm that:

As well as console, WAN, and LAN ports, the USG has a third port which is labelled “WAN 2 / LAN 2”, and it shows up in software as just another Ethernet interface. I mucked about for a while trying to work out how to configure this port to assume the duties of eth1. To Ubiquiti’s credit, once you understand a few of EdgeOS’s basic commands it’s actually pretty enjoyable to view the device’s configuration settings and change them.

To my surprise I managed to set up LAN 2 and disable LAN 1, and it worked. Connecting a device to LAN 2 immediately resulted in a DHCP offer, and all traffic flowed as it should. As it did before the start of this story. With the same attached devices and cables!

So there we are: bad port. Well, yes, I’m 90% sure about that, but there’s more…

(By this stage I’d already bought a new USG, and there was no way I was going to put the broken USG back into service. I can’t trust it, and I’m pretty sure the UniFi Controller will fight me at every step of the way if I try to use LAN2 instead of LAN1.)

Can speak but not hear

While I was poking about via console during all of the above, I discovered that I could capture traffic flowing through the USG’s ports using a command like show interfaces ethernet eth1 capture, and I could see some activity being logged. Surely if the port was dead nothing would work, or it would fail in some other obvious way: no link light, no traffic, or errors in a log perhaps, but as far as I could tell the USG was reporting all systems green. Remember I mentioned the OS was even correctly reporting link status in addition to this trickle of data I was now seeing.

I cracked out Wireshark, and it’s at this point that I gave up. This makes no sense:

I could see that upon connecting my MacBook to the USG’s LAN 1 port, DHCP discover requests were being sent from my Mac, but the USG’s capturing tool didn’t show them ever being received. But what the USG did capture were it’s own outgoing UniFi discovery requests and they were being received by my Mac. Here’s a screenshot from Wireshark (I’ve filtered out some irrelevant junk caused by my Mac sending out mDNS and ARP probes):

And here’s the corresponding USG capture:

ubnt@ubnt:~$ show interfaces ethernet eth1 capture
Capturing traffic on eth1 ...
10:27:21.461396 IP 192.168.1.1.51602 > 255.255.255.255.10001: UDP, length 145
10:27:31.682842 IP 192.168.1.1.51456 > 255.255.255.255.10001: UDP, length 145
10:27:41.915718 IP 192.168.1.1.59045 > 255.255.255.255.10001: UDP, length 145
10:27:52.156490 IP 192.168.1.1.41723 > 255.255.255.255.10001: UDP, length 145
10:28:02.381654 IP 192.168.1.1.47105 > 255.255.255.255.10001: UDP, length 145
10:28:12.599347 IP 192.168.1.1.41440 > 255.255.255.255.10001: UDP, length 145
10:28:22.819994 IP 192.168.1.1.54372 > 255.255.255.255.10001: UDP, length 145

The purple lines from the screenshot are my Mac’s DHCP requests, which never show up in the USG’s capture, but the 7 yellow lines match up perfectly with the 7 generated and captured by the USG.

So, to summarise, I have an Ethernet device with factory-supplied software and settings, no software related issues, warnings, or errors, which suddenly stopped receiving packet data on one port, but can still negotiate a link and send data on that port, and has other ports which function perfectly.

🤯

Please, if anyone reading this has any idea how this kind of failure is possible, get in touch.

Astonishing indeed

Yep to this entire thread on Ruby on Rails Discussions. Rails is a shit-show at the moment, and this thread shows why. You need (deep breath): Ruby, probably a Ruby version manager, Rubygems, Bundler, Rails itself, Node.js, the Webpacker gem, the Webpacker Node.js package, Webpack itself, Yarn (because Webpacker doesn’t use npm), Python (WTF), and some sort of database layer.

What versions of all of the above work together? Nobody knows. And the first time Bundler is run a bunch of Rails dependencies are compiled - in my experience, in a brand new environment, at least one of those always fails (usually sass).

And even if you get all of that working together, the “Getting Started with Rails” guide is woefully out-of-date. Last time I looked, the Rails Guides didn’t mention Webpack at all. Anywhere.

Yet front-and-centre on the Rails home page it states:

Learning to build a modern web application is daunting. Ruby on Rails makes it much easier and more fun.

Not any more it doesn’t.

Reading log for 2020

According to my own records I completed 39 books last year, and abandoned an additional two. It’s been decades since I’ve read as much as I did in 2020.

I used to devour books when I was a pre-teen: The Hardy Boys, The Secret Seven, stuff like that. When I started high-school, my first English teacher decided those books were beneath my ability and somewhat forcibly tried to get me to read other stuff. For whatever reason her suggestions never stuck, and the implied ridicule put me off reading for a very long time.

She must have had good intentions, but her execution needed some work. Never liked her much. Ms. Awlson was her name.