Recently, our area was struck with thunderstorms. These are rare for us, and I always enjoy them. However, it's been a few years since I've lived in an area where we this is commonplace, so the power flickering out was a surprise. Unfortunately, the flicker was just long enough to completely shut off our home networking equipment as well as the lights. I went to assess the damage, and I found a perfect storm.
The first significant problem is a circular dependency I didn't realise existed in the network. The following components make up my network:
- NAS - hosting a PiHole container and an Unbound container, acting as DNS servers
- Modem/Router - the principal gateway and DHCP server, using the DNS servers hosted by the NAS
- Wireless AP - serving the wireless network, originally an Apple AirPort Extreme and, now, a Netgear Orbi with two satellites
All three of these components are core to the network running correctly. The ordering above was intentional. The DNS servers use static IP addresses - 192.168.1.175 and 192.168.1.200. The router's DHCP server only issues addresses from 192.168.1.2 to 192.168.1.174, using 192.168.1.175 and 192.168.1.200 as the primary and secondary DNS servers and 192.168.1.1 as the gateway. The wireless AP, before the Orbi's, was an Apple AirPort Extreme in bridge-mode which booted and provided a wireless network, using the router as the gateway and DHCP server.
The only hitch in all this design was that if the NAS rebooted, you had to unlock the storage volumes for the containers hosting the DNS servers to begin running. In the case of a power outage, the network would reboot but cannot resolve domain names. Not a huge deal, but not great.
A few months back, I bought a Netgear Orbi kit with an extra satellite. I hoped to overcome the slow throughput of a single AirPort Extreme and a PowerLine adapter kit that didn't live up to expectations. The Orbi's promised excellent throughput and a way to provide hard-wire connections to devices which simplifies setup. After setting up the Orbi's, I found I was getting 500+ Mbps performance for intra-network throughput as well as 350+ Mbps performance to the Internet.
All of this was great until the power went out. While trying to restore the home network, I found the Orbi's wouldn't "boot up" correctly. They'd reach a point where they claimed to be trying to boot up and then would finally claim no network connectivity. I checked the routing table, and it was clear the Orbi's were issued addresses. I was also able to connect to the network with my laptop, but no DHCP address would be issued.
Back to the NAS
As part of triaging the Orbi's, I connected an ethernet cable to the router and checked connectivity to other components. I found that the NAS needed some time to recover. Because the NAS had lost power, it was buried in a RAID scrub after I unlocked the volume storage pool. DNS services were not going to work until the scrub finished. To bypass the issue of no DNS, I updated the router to use Cloudflare DNS for the time being and discovered that suddenly the Orbis began to work.
And back to the Orbi’s
This was disappointing. They're wireless access points, why do they need DNS? I'm sure it's for some managed service to support the Orbi's. Consulting my DNS logs later, I found that the Orbi's were connecting to netgear.com and orbilogin.com.
Now some might ask, why is the wireless network so important? Well, many of the appliances connected to my network are hard-wired into the Orbi Base or one of its Satellites. This design allows for a simplified network setup - no need to enter wireless credentials everywhere and improved speeds because the only wireless hop is the Orbi backplane.
Additionally, my router and the NAS are in an inconvenient location for triaging network issues by wire. They sit inside and on top of a cabinet with no real surface to hook up my laptop. The wireless network lets me triage most of the problems we encounter.
Cutting the circular dependency
There are two practical next steps to cut the circular dependency:
get a UPS (Uninterruptible Power Supply) to power the core infrastructure
reduce the dependence on the NAS by moving DNS service hosting to another component
I haven't found a UPS I like yet but the second point gives me a reason to buy a new computer. Initially, I thought using a Pine64 or RaspberryPi would be a good choice for hosting the infrastructure. After some research, I found that power loss with embedded boards is particularly fatal. They often become unbootable with sudden power loss — Scratch that idea.
Time to buy another computer…
After some thought, a single board computer or low power desktop seems like a good fit. The Intel NUC comes to mind though they're more expensive than I'd like to pay. Reading a little further, I found in the ~$150-200 range several low power machines. I settled on a quad-core Celeron machine(https://www.amazon.com/gp/product/B07VX5BYBR/) with 6GB of RAM
Breaking out the DNS services and possibly DHCP services, into a separate network component will provide more resiliency. I'll throw a Linux distribution on it - probably Alpine or Centos, and set up a couple of containers running in Docker or LXC.
Moral of the story
Be careful with your networking hardware. Some of the products introduce unexpected dependencies. In the future, I'm not sure if I'd buy another set of Netgear Orbi's or a similar product. They have excellent network throughput, but I'm disappointed that my wireless access points (OSI Layer 3) suddenly have a DNS dependency (OSI Layer 7).