zerosleeps

Since 2010

Dead (but not dead) Ubiquiti UniFi Security Gateway

This is the story of my USG, which has failed in the weirdest way. The summary, which is repeated at the end of this post, is that after years of flawless operation it suddenly stopped communicating with the rest of my LAN via one particular port, but it can still send data on that port, has other ports which function perfectly, and the problem is definitely not software-related.

This isn’t a how-to or anything, because I can’t remember every command I typed or setting I changed, but I’m putting the general gist out there because it might help some poor sod someday. If you’re hoping for a solution, stop reading, but if you know anything about network equipment, please send help. I sent out a couple of cries for assistance about this on Ubiquiti’s community forums and on reddit, but crickets.

Basic troubleshooting

So one afternoon we suddenly lost internet connectivity. First thing I did was jump onto my UniFi Controller to see where the problem lay, and it told me that the USG had failed to adopt. Since the device was previously working fine, that status was UniFi saying “it’s disappeared from your network”.

I can honestly say that in three years of running Ubiquiti equipment I’ve never had to reboot or reset anything. In fact, it was because I got so fed up with flaky crappy all-in-one consumer “routers” that I went all-in on Ubiquiti in the first place. Anyway, that’s where I started - pull power from the USG, count to 10, plug it back in.

All indications were good: all the status lights and Ethernet link lights illuminated, flashed, and changed colour as they should, but the Controller still couldn’t find the USG.

It’s worth pointing out that as well as routing traffic between WAN and LAN and performing firewall duties the USG also provides DHCP services, so LAN traffic was starting to fail as well: new clients couldn’t join the network and existing DHCP leases couldn’t be renewed. Fabulous.

I tried a few reboots of the entire network stack with no change, so I dug out a paper-clip and did a hardware reset of the USG. Again, all the lights did the correct thing, and the USG appeared ready to adopt, except… it still didn’t appear to be connected to the network. Sure it was physically connected - OSI layers 1 and 2 looked good - but nothing above that. After a reset of that nature the USG sets it’s LAN port to 192.168.1.1/24 and runs a DHCP server, so at this point it should have been dishing out DHCP leases and routing traffic, even without adoption.

Corrupt storage?

I then ended up on a path to nowhere. I discovered that the USG boots from a USB drive tucked away inside it. There were lots of grumbles online about these internal drives failing. I was reasonably confident that wasn’t my problem, because my USG was booting. Well the status lights certainly indicated it was booting, and if the boot drive had failed it would have been more obvious, but with no way of communicating with the device via Ethernet I didn’t know for sure. So I replaced the internal drive, farted about with dd and .img files and blah blah blah. Didn’t make the blindest bit of difference so I’m not going to dwell on this part of story. Sure did learn a lot about USB drive initialisation times and Octeon boot commands though…

There’s nothing wrong with this thing

I bought a rollover cable from eBay. The USG has a console port, and with no packet data it was my only hope of making progress. The cable arrived, and immediately revealed that the USG was indeed properly booting, all services were running, and it correctly reported eth1 link status. It was also connected to the WAN (the USG is a regular DHCP client on that side of the network) and able to communicate with the outside world.

What the hell? At this point I’m starting to suspect a bad port, despite positive link status. But that can’t be a thing that happens, can it? Let’s see if we can confirm that:

As well as console, WAN, and LAN ports, the USG has a third port which is labelled “WAN 2 / LAN 2”, and it shows up in software as just another Ethernet interface. I mucked about for a while trying to work out how to configure this port to assume the duties of eth1. To Ubiquiti’s credit, once you understand a few of EdgeOS’s basic commands it’s actually pretty enjoyable to view the device’s configuration settings and change them.

To my surprise I managed to set up LAN 2 and disable LAN 1, and it worked. Connecting a device to LAN 2 immediately resulted in a DHCP offer, and all traffic flowed as it should. As it did before the start of this story. With the same attached devices and cables!

So there we are: bad port. Well, yes, I’m 90% sure about that, but there’s more…

(By this stage I’d already bought a new USG, and there was no way I was going to put the broken USG back into service. I can’t trust it, and I’m pretty sure the UniFi Controller will fight me at every step of the way if I try to use LAN2 instead of LAN1.)

Can speak but not hear

While I was poking about via console during all of the above, I discovered that I could capture traffic flowing through the USG’s ports using a command like show interfaces ethernet eth1 capture, and I could see some activity being logged. Surely if the port was dead nothing would work, or it would fail in some other obvious way: no link light, no traffic, or errors in a log perhaps, but as far as I could tell the USG was reporting all systems green. Remember I mentioned the OS was even correctly reporting link status in addition to this trickle of data I was now seeing.

I cracked out Wireshark, and it’s at this point that I gave up. This makes no sense:

I could see that upon connecting my MacBook to the USG’s LAN 1 port, DHCP discover requests were being sent from my Mac, but the USG’s capturing tool didn’t show them ever being received. But what the USG did capture were it’s own outgoing UniFi discovery requests and they were being received by my Mac. Here’s a screenshot from Wireshark (I’ve filtered out some irrelevant junk caused by my Mac sending out mDNS and ARP probes):

And here’s the corresponding USG capture:

ubnt@ubnt:~$ show interfaces ethernet eth1 capture
Capturing traffic on eth1 ...
10:27:21.461396 IP 192.168.1.1.51602 > 255.255.255.255.10001: UDP, length 145
10:27:31.682842 IP 192.168.1.1.51456 > 255.255.255.255.10001: UDP, length 145
10:27:41.915718 IP 192.168.1.1.59045 > 255.255.255.255.10001: UDP, length 145
10:27:52.156490 IP 192.168.1.1.41723 > 255.255.255.255.10001: UDP, length 145
10:28:02.381654 IP 192.168.1.1.47105 > 255.255.255.255.10001: UDP, length 145
10:28:12.599347 IP 192.168.1.1.41440 > 255.255.255.255.10001: UDP, length 145
10:28:22.819994 IP 192.168.1.1.54372 > 255.255.255.255.10001: UDP, length 145

The purple lines from the screenshot are my Mac’s DHCP requests, which never show up in the USG’s capture, but the 7 yellow lines match up perfectly with the 7 generated and captured by the USG.

So, to summarise, I have an Ethernet device with factory-supplied software and settings, no software related issues, warnings, or errors, which suddenly stopped receiving packet data on one port, but can still negotiate a link and send data on that port, and has other ports which function perfectly.

🤯

Please, if anyone reading this has any idea how this kind of failure is possible, get in touch.

Astonishing indeed

Yep to this entire thread on Ruby on Rails Discussions. Rails is a shit-show at the moment, and this thread shows why. You need (deep breath): Ruby, probably a Ruby version manager, Rubygems, Bundler, Rails itself, Node.js, the Webpacker gem, the Webpacker Node.js package, Webpack itself, Yarn (because Webpacker doesn’t use npm), Python (WTF), and some sort of database layer.

What versions of all of the above work together? Nobody knows. And the first time Bundler is run a bunch of Rails dependencies are compiled - in my experience, in a brand new environment, at least one of those always fails (usually sass).

And even if you get all of that working together, the “Getting Started with Rails” guide is woefully out-of-date. Last time I looked, the Rails Guides didn’t mention Webpack at all. Anywhere.

Yet front-and-centre on the Rails home page it states:

Learning to build a modern web application is daunting. Ruby on Rails makes it much easier and more fun.

Not any more it doesn’t.

Reading log for 2020

According to my own records I completed 39 books last year, and abandoned an additional two. It’s been decades since I’ve read as much as I did in 2020.

I used to devour books when I was a pre-teen: The Hardy Boys, The Secret Seven, stuff like that. When I started high-school, my first English teacher decided those books were beneath my ability and somewhat forcibly tried to get me to read other stuff. For whatever reason her suggestions never stuck, and the implied ridicule put me off reading for a very long time.

She must have had good intentions, but her execution needed some work. Never liked her much. Ms. Awlson was her name.

Advent of Code 2020 day 25

Advent of Code 2020 day 25. A bit overcooked considering (spoiler) there’s no part 2 🙁

Very happy to have gotten all 50 stars this year. With the exception of day 10 part 2 the code is all my own. Yes I’ve used hints and pointers, and I’m already embarrassed by some of my solutions, but that all comes hand-in-hand with being a developer, right?

Great fun. Was nice to have a little puzzle to look forward to at the end of each day.

class Handshake:
    def __init__(self, public_key):
        self.public_key = public_key
        self.value = 1
        self.loop_size = self.get_loop_size()

    def transform(self):
        self.value = (self.value * 7) % 20201227

    def private_key(self, public_key):
        value = 1
        for _ in range(self.loop_size):
            value = (value * public_key) % 20201227
        return value

    def get_loop_size(self):
        loop_size = 0
        while self.value != self.public_key:
            self.transform()
            loop_size += 1
        return loop_size

def part_one(public_keys):
    card = Handshake(public_keys[0])
    return card.private_key(public_keys[1])

if __name__ == '__main__':
    print(f"Part one: {part_one([5764801, 17807724])}")

Advent of Code 2020 day 24

Advent of Code 2020 day 24. Two fun days in a row! Kinda wish I went a bit more object-orientated with this one - the result looks a bit messy and “scripty”.

I vaguely remembered a previous Advent of Code puzzle that used a hexagonal grid, and after a little poke around I used the cube coordinates system described by Red Blob Games.

Spent a while wondering why I wasn’t getting the correct answers for the part two examples, before I realised that I’d need to somehow cater for tiles beyond those I’d already looked at, as the puzzle clearly stated that they exist and default to white. pad_floor was therefore born.

from pathlib import Path
import re
import unittest

DIRECTIONS = {
    'e': [1, -1, 0],
    'se': [0, -1, 1],
    'sw': [-1, 0, 1],
    'w': [-1, 1, 0],
    'nw': [0, 1, -1],
    'ne': [1, 0, -1]
}

def get_raw_input():
    return (Path(__file__).parent / 'day_24_input.txt').read_text()

def parse_raw_input(raw_input):
    regexp = re.compile(r'e|se|sw|w|nw|ne')
    return [
        [direction for direction in re.findall(regexp, line.strip())]
        for line in raw_input.strip().splitlines()
    ]

def build_floor(parsed_input):

    tiles = {}
    # 0/None: white, 1: black

    for line in parsed_input:
        x, y, z = 0, 0, 0
        for direction in line:
            x += DIRECTIONS[direction][0]
            y += DIRECTIONS[direction][1]
            z += DIRECTIONS[direction][2]

        if tiles.get((x, y, z), 0) == 0:
            tiles[(x, y, z)] = 1
        else:
            tiles[(x, y, z)] = 0

    return tiles

def pad_floor(floor):
    new_tiles = {}
    for tile in floor:
        for adjacent_direction in DIRECTIONS.values():
            k = (
                tile[0] + adjacent_direction[0],
                tile[1] + adjacent_direction[1],
                tile[2] + adjacent_direction[2]
            )
            if k not in floor:
                new_tiles[k] = 0

    floor.update(new_tiles)
    return floor

def part_one(parsed_input):
    return len([tile for tile in build_floor(parsed_input).values() if tile == 1])

def adjacent_tile_colours(floor, tile):
    x, y, z = tile
    return [
        floor.get((x + 1, y - 1, z), 0),
        floor.get((x, y - 1, z + 1), 0),
        floor.get((x - 1, y, z + 1), 0),
        floor.get((x - 1, y + 1, z), 0),
        floor.get((x, y + 1, z - 1), 0),
        floor.get((x + 1, y, z - 1), 0)
    ]

def part_two(parsed_input):
    floor = build_floor(parsed_input)

    for day in range(100):
        floor = pad_floor(floor)
        changes = {}
        for position, colour in floor.items():
            adjacent_black_tiles = len([
                adjacent_colour
                for adjacent_colour
                in adjacent_tile_colours(floor, position)
                if adjacent_colour == 1
            ])
            if colour == 1 and (adjacent_black_tiles == 0 or adjacent_black_tiles > 2):
                changes[position] = 0
            elif colour == 0 and adjacent_black_tiles == 2:
                changes[position] = 1
        floor.update(changes)
    return len([tile for tile in floor.values() if tile == 1])

class TestExamples(unittest.TestCase):
    def setUp(self):
        self.example_input = """sesenwnenenewseeswwswswwnenewsewsw
                                neeenesenwnwwswnenewnwwsewnenwseswesw
                                seswneswswsenwwnwse
                                nwnwneseeswswnenewneswwnewseswneseene
                                swweswneswnenwsewnwneneseenw
                                eesenwseswswnenwswnwnwsewwnwsene
                                sewnenenenesenwsewnenwwwse
                                wenwwweseeeweswwwnwwe
                                wsweesenenewnwwnwsenewsenwwsesesenwne
                                neeswseenwwswnwswswnw
                                nenwswwsewswnenenewsenwsenwnesesenew
                                enewnwewneswsewnwswenweswnenwsenwsw
                                sweneswneswneneenwnewenewwneswswnese
                                swwesenesewenwneswnwwneseswwne
                                enesenwswwswneneswsenwnewswseenwsese
                                wnwnesenesenenwwnenwsewesewsesesew
                                nenewswnwewswnenesenwnesewesw
                                eneswnwswnwsenenwnwnwwseeswneewsenese
                                neswnwewnwnwseenwseesewsenwsweewe
                                wseweeenwnesenwwwswnew"""

    def test_part_one_example(self):
        self.assertEqual(part_one(parse_raw_input(self.example_input)), 10)

    def test_part_two_example(self):
        self.assertEqual(part_two(parse_raw_input(self.example_input)), 2208)

class TestPuzzleInput(unittest.TestCase):
    def test_part_one(self):
        self.assertEqual(part_one(parse_raw_input(get_raw_input())), 465)

if __name__ == '__main__':
    print(f"Part one: {part_one(parse_raw_input(get_raw_input()))}")
    print(f"Part two: {part_two(parse_raw_input(get_raw_input()))}")