Mastodon, Part IV: SSL with Let’s Encrypt

DNS is the cause of, and solution to, all our networking problems

personal
raspberry pi
k3s
mastodon
ssl
dns
Author

Shannon Quinn

Published

Posted on the 16th of February in the year 2023, at 10:30am. It was Thursday.

Sean Bean LOTR meme, stating: 'One does not simply use HTTPS when it doesn't work' (obtained from imgflip).

This article is part of a series about installing and configuring a Mastodon instance on a cluster of Raspberry Pi computers running k3s. To go back to previous articles in the series, try any of the links below:

  1. Introduction
  2. Part I: My home network topology
  3. Part II: The Mastodon Helm chart
  4. Part III: Configuring and installing prerequisites
  5. Part IV: The waking nightmare that is Let’s Encrypt (this post)
  6. Part V: Actually installing Mastodon
  7. Conclusions

I want to start by saying: I am no expert here. As such, it is not only possible, but is in fact likely, that I was missing something so painfully obvious as to be worthy of derision for the rest of my natural life.

To be fair, that’d be mean, but still very possibly deserved.

When I started the whole Mastodon installation process, I loosely followed the recipe outlined here at Geek Cookbook, which—until this series of blog posts!—was the only resource I could find anywhere on the world wide interwubs on installing Mastodon on kubernetes (thank you again, funkypenguin; go sponsor him!).

Of course, I could only follow it to a point; I didn’t make use of flux (though I know I really, really should; that’s a blog series for another time, methinks) so a lot of the constructs weren’t precisely relevant to me. As such there was a lot I had to infer. Ingress was no exception.

So much tra[e|f]fik

This bit is actually entirely separate from Mastodon; it just felt like it was part of Mastodon, since it was Mastodon that I couldn’t reach. When SSL is misconfigured, your experience is strictly browser-dependent: in my case, Chrome1 would throw up myriad warnings and, in regular mode, wouldn’t let me navigate to the page at all. In incognito mode, it would allow me to navigate there provided I clicked a link to acknowledge I was doing something “unsafe”.

So let’s back up a bit. I’m using traefik as my ingress controller. Not, I should add, because it’s packaged with k3s; I actually disabled that because it’s an old version. Rather, I installed traefik via its official helm chart.

Now. In keeping with my philosophy of doing the simplest thing that gets results, I tried for the http-01 option when it came to Let’s Encrypt challenge modes, as it’s by far the most straightforward, at least when it comes to configuring the ingress. I had to make only very minor modifications to the traefik helm chart; it ended up looking something like this:

ports:
  websecure:
    tls:
      certResolver: "leresolver"
      domains:
      - main: "quinnwitz.house"

certResolvers:
  leresolver:
    email: <my email>
    httpChallenge:
      entryPoint: "web"

That’s pretty much it! I made a few other modifications around debugging and logging, and a small addition around load balancing (to account for metallb), but simplicity won the day.

…and lo, did the problems come thence unceasingly.

GET problems

Here’s the part that drove me absolutely insane.

E1129 00:45:08.092782       1 sync.go:190] cert-manager/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://quinnwitz.house/.well-known/acme-challenge/jVKFmi_ulnESi0pCj-KnVC5mPt0RouNcfUOEKcYe9ro': Get \"http://quinnwitz.house/.well-known/acme-challenge/jVKFmi_ulnESi0pCj-KnVC5mPt0RouNcfUOEKcYe9ro\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" "dnsName"="quinnwitz.house" "resource_kind"="Challenge" "resource_name"="mastodon-tls-lzwbv-1403372365-523260434" "resource_namespace"="mastodon" "resource_version"="v1" "type"="HTTP-01"

This is what I saw appear over and over on Let’s Debug (thank you for that website, by the way!). No matter what (http-01) configuration I tried, this was what came back: timeouts.

As best I can tell, this recent comment on GitHub miiight identify the source of the problem and how to fix it:

So basically, your node cannot access http://1.2.3.4/.well-known/whatever and http://your-page.com/.well-known/whatever, but computers on the internet (such as letsencrypt) can.

though I’m still not certain, because the method for testing this eventuality (dig or curling the URL from the kubernetes nodes themselves) worked fine for me.

After (I kid you not) an entire week of this, I gave up. It was time to switch to dns-01.

It’s always DNS

This took more doing. First, I moved my DNS nameservers from Google over to Cloudflare (Google was and continues to be my domain name provider, so initially that’s where my nameservers were, too). The reason for this move was because Google charges for DNS API access (which we’re going to need in a minute), whereas Cloudflare does not2.

Second, I created API keys with Cloudflare specifically for modifying DNS entries. The cert-manager documentation has a whole page on dns-01 verification steps and uses Cloudflare as its examplar.

FUN LITTLE FACTOID: Using the method outlined in the cert-manager documentation, I still encountered weird errors involving “invalid request headers.” For whatever bloody reason, when creating the Secret that stores the Cloudflare API tokens, I had to use data rather than stringData otherwise it simply wouldn’t work. Here’s the solution. It also needs to be in the same namespace as Mastodon. Also also, I needed to use api-key, api-token, and email fields, since these are specific to the permissions LE will need.

Third, I had to make more modifications to the traefik helm chart. These included flipping the certResolver.leresolver from httpChallenge to dnsChallenge, and also included adding environment variables for each of the data fields in the cloudflare API secret. Here’s what those changes looked like:

certResolvers:
  leresolver:
    email: <my email>
    dnsChallenge:
      provider: cloudflare
      resolvers:
        - 1.1.1.1
        - 8.8.8.8
env:
  - name: CF_API_EMAIL
    valueFrom:
      secretKeyRef:
        name: <my secret>
        key: email
  - name: CF_API_KEY
    valueFrom:
      secretKeyRef:
        name: <my secret>
        key: api-key
  - name: CF_DNS_API_TOKEN
    valueFrom:
      secretKeyRef:
        name: <my secret>
        key: api-token

The final step was to create a new ClusterIssuer, one that specifically used dns-01 verification. Here’s what it looked like:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod-dns
spec:
  acme:
    email: <my email>
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: <cert key>
    solvers:
    - dns01:
        cloudflare:
          apiTokenSecretRef:
            name: <my secret>
            key: api-token

I wouldn’t call this super-complicated, but it definitely wasn’t the ease of http-01 configuration. But unlike http-01, this actually worked!

Other musings

Oddly enough, this was the part that took the longest on the road to getting my own Mastodon instance running. Not the obfuscated and undocumented network topology hiccups, not the literally incompatible and also required prerequisites; no, we got to bang our heads against a 9-year old technology that took nearly a week to find a working configuration.

I’ve had some folks suggest kcert as a drop-in replacement for cert-manager. For specific use-cases it certainly seems as though its simplicity could grease the skids of development; personally, I did not have a chance to try it out before landing on a working dns-01 setup, but I wanted to mention it as others have spoken positively about it.

Another promising bit of technology the Mastodon devs pointed me to was Tailscale, which recently released their own implementation of HTTPS certificates for nodes on a tailscale network. Traefik has even started working on integrating tailscale network overlays into its proxy configuration, so this could be worth looking at instead of LE in the very near future.

Oh hey!

Until next time!

Footnotes

  1. Yeah yeah yeah I know I need to switch to Brave or Firefox, IN MY SPARE TIME FOLKS.↩︎

  2. At least, not at the rates I’d be using it.↩︎

Citation

BibTeX citation:
@online{quinn2023,
  author = {Quinn, Shannon},
  title = {Mastodon, {Part} {IV:} {SSL} with {Let’s} {Encrypt}},
  date = {2023-02-16},
  url = {https://magsol.github.io/2023-02-16-ssl-with-lets-encrypt},
  langid = {en}
}
For attribution, please cite this work as:
Quinn, Shannon. 2023. “Mastodon, Part IV: SSL with Let’s Encrypt.” February 16, 2023. https://magsol.github.io/2023-02-16-ssl-with-lets-encrypt.