deep·tech·intuition
intermediate ·

Traefik Deep Intuition

An experienced engineer's guide to Traefik

1. One-Sentence Essence

Traefik is a reverse proxy that treats its routing table as a derived projection of your infrastructure’s current state, not as a configuration file you maintain.

That sentence is the whole thing. Once you understand it, everything else — the labels, the providers, the hot reloads, the gotchas, the comparisons to nginx — falls out as consequences. Nginx says “here is my config, follow it.” Traefik says “tell me where your services live, and I’ll figure out the config myself, continuously, forever.”

Hold that idea while you read the rest of this document. Most of what makes Traefik confusing — and most of what makes it powerful — comes from that one design choice.


2. The Problem It Solved

Picture the world circa 2014–2015. Docker was exploding. Microservices were becoming a thing real teams shipped. People were spinning up dozens of containers per host, hundreds per cluster. And in front of all that sat… nginx. Or HAProxy. Or Apache.

Those are excellent pieces of software. They’re also fundamentally static. You write a config file that says “host api.example.com10.0.0.5:8080.” You reload nginx. It listens. When 10.0.0.5 dies and Docker schedules the container on 10.0.0.6:32918, nginx has no idea. Your traffic 502s. You SSH in, edit the config, reload, hope nothing else changed in the meantime.

People built bandaids. nginx-proxy watched the Docker socket and regenerated nginx configs on container events. consul-template did the same with Consul. confd watched etcd. Kubernetes invented Ingress controllers, which were basically the same bandaid wrapped in a CRD. All of them shared a pattern: an external loop watched the world, generated a config file, and asked the proxy to reload. The proxy itself was oblivious to anything outside its file.

This worked, but the architecture leaked everywhere. Every reload was a potential blip. Configs drifted. Race conditions during scaling were silent traffic loss. And every team running this stack was reinventing the same template generator with slightly different bugs.

Emile Vauge — then a software engineer at Zenika, frustrated trying to expose services in a Mesos/Marathon cluster — went the other direction in late 2015. Instead of bolting a watcher onto a static proxy, what if the proxy itself watched the orchestrator? What if the routing table was a function of cluster state, recomputed live, and the “config file” was just the bootstrap that told the proxy where to look? Written in Go, single binary, no reload concept — when state changes, the in-memory routing table is rebuilt and swapped atomically. No kill -HUP. No regenerated files. No bandaid.

That’s Traefik. Released March 2016 as v1.0, hit a million Docker pulls within a year, now over 3.3 billion pulls and the second most popular ingress controller in Kubernetes after nginx-ingress. The thing it got right wasn’t features — it was inverting the relationship between the proxy and the world. Everything else Traefik does, well or badly, follows from that inversion.


3. The Concepts You Need

Traefik has its own vocabulary, and trying to learn it without these terms is like trying to learn chess by watching games without knowing which piece is which. Memorize this section. The rest of the document leans on it constantly.

The four pillars (the routing pipeline)

These are the things you’ll work with every day. Internalize the order: a request enters through an EntryPoint, gets matched by a Router, optionally passes through a chain of Middlewares, and is forwarded to a Service.

EntryPoint — A network port that Traefik listens on, plus the protocol (TCP or UDP). web on :80, websecure on :443, metrics on :8082. EntryPoints know nothing about your applications; they’re just doors. You define a small number of them once and rarely touch them again.

Router — The thing that decides, for a request that has entered through an EntryPoint, which Service should handle it. A router has a rule (Host(\api.example.com`) && PathPrefix(`/v1`)`), an entryPoints list (which doors it accepts requests from), an optional middlewares chain, and a target service. Routers are where the routing logic lives. You’ll create hundreds of these in a real system — one per exposed app.

Middleware — A function that wraps a request/response: rate limiting, auth, header rewriting, compression, retries, redirects, IP allowlists, circuit breaking, prefix stripping. Middlewares chain. The order matters. They live between the router match and the service forward.

Service — The thing that knows how to reach your actual backend. For a Docker container, the Service is the container’s IP and port. For a Kubernetes Service, it’s the endpoints behind it. The Service also encodes load-balancing strategy (round-robin, sticky sessions), health checks, and connection pooling.

The flow, end to end:

Request → EntryPoint (:443) → Router (matches Host(\api.example.com`)`) → Middleware chain (rate limit → auth → strip prefix) → Service (round-robin across 3 backend pods) → Backend.

The configuration split

This catches everyone. Traefik has two completely separate kinds of configuration, and they live in different places, change at different rates, and reload differently.

Static configuration — Set at startup, requires a Traefik restart to change. Defines EntryPoints, which Providers Traefik watches, certificate resolvers, logging, metrics, plugins, API/dashboard exposure. Lives in traefik.yml, CLI flags, or environment variables. The thing Traefik is, before it knows anything about your apps.

Dynamic configuration — The Routers, Middlewares, Services, and TLS certs. Comes from Providers. Hot-reloads continuously, no restart. The thing Traefik does, which changes minute by minute as services come and go.

Mixing these up is the single most common cause of “why doesn’t my change take effect” pain. Anything about a specific app is dynamic. Anything about Traefik itself is static.

Providers

A Provider is a source of dynamic configuration. Each provider is a plugin that watches some piece of infrastructure and translates what it sees into Traefik’s internal Router/Middleware/Service model.

The ones that matter in practice:

  • Docker — Watches the Docker socket. Reads container labels. Most common for single-host setups.
  • Kubernetes Ingress — Implements the standard Ingress resource. Works but limited; doesn’t expose Traefik’s richer features.
  • Kubernetes CRD (IngressRoute) — Traefik-native CRDs. The way you actually want to run Traefik in Kubernetes. Exposes everything.
  • Kubernetes Gateway API — The newer, vendor-neutral standard. Traefik supports it. Slowly replacing Ingress for serious deployments.
  • File — A YAML or TOML file (or directory of them) on disk. Traefik watches it for changes. Useful for routes that don’t belong to any container — external services, redirects, global middlewares.
  • Consul, Nomad, ECS, Redis, etcd, ZooKeeper, HTTP — Various flavors of “watch some other store.” Real but rarely the main provider for greenfield setups.

You can run multiple providers simultaneously. Most production setups run two: an orchestrator provider (Docker or Kubernetes) for app-level routing, plus the File provider for shared middlewares, TLS options, and external-service routes.

Labels and annotations

The Docker provider reads labels. The Kubernetes Ingress provider reads annotations. The IngressRoute CRD reads its own structured spec. They’re different syntactic forms of the same underlying concept: attach Traefik configuration to the object that defines the service.

A label like traefik.http.routers.myapp.rule=Host(\app.example.com`)says: "create a router namedmyapp` with this rule.” Traefik mechanically translates labels into the Router/Middleware/Service model in memory. The format is rigid and verbose; it’ll feel ugly the first dozen times. You get used to it.

TLS, certificates, and ACME

TLS termination happens at the EntryPoint. When you mark an EntryPoint as TLS-enabled (or a router as requiring TLS), Traefik needs a certificate.

Certificate Resolver — A configured method for getting certificates. The most common is acme, which means “use the ACME protocol to fetch certificates from a CA, usually Let’s Encrypt.”

ACME Challenges — How a CA verifies you control a domain before issuing a cert. Three types:

  • HTTP-01 — CA hits http://yourdomain/.well-known/acme-challenge/.... Simple. Requires port 80 reachable from the internet. Cannot do wildcards.
  • TLS-ALPN-01 — Similar, on port 443 via a TLS extension. Slightly more flexible.
  • DNS-01 — CA queries a TXT record you publish via your DNS provider’s API. Works behind firewalls, supports wildcard certs, but requires Traefik to have DNS API credentials.

acme.json — The local file where Traefik stores fetched certs and account info. Permissions must be 600 or Traefik refuses to use it.

The Pieces You Won’t Use At First But Should Know Exist

ServersTransport — Configuration for how Traefik talks to backends: TLS settings to backends (yes, you can re-encrypt to a backend), connection limits, timeouts. You won’t touch this until you do, and then you’ll touch it a lot.

TLSOption — Named bundles of TLS cipher/protocol settings. Useful for “modern” vs “intermediate” profiles or pinning specific ciphers for compliance.

Plugins — Custom middlewares or providers loaded dynamically. Originally written in Go and run via Yaegi (an embedded Go interpreter). Traefik 3 added WebAssembly (Wasm) plugins, which lift the language restriction. The Plugin Catalog has hundreds; the most famous is Coraza WAF.

The Dashboard / API — A web UI at :8080 (default) that shows routers, services, middlewares, and live config. Read-only. Brilliantly useful when debugging “why isn’t my route working.”

That’s the vocabulary. With these terms in your head, the rest of the document will read like English instead of Esperanto.


4. The Distilled Introduction

Everything a tutorial covers, with the “why” baked in. We’ll start with Docker because that’s where most people meet Traefik first; Kubernetes gets its own subsection at the end.

Setup: Traefik on Docker, the minimum that works

Traefik is a single Go binary. The Docker image is traefik:v3.x (use a pinned tag in production, never :latest). A working setup needs three things: a static config, port bindings, and access to the Docker socket.

# docker-compose.yml — minimum viable Traefik
services:
  traefik:
    image: traefik:v3.4
    command:
      - --api.dashboard=true
      - --api.insecure=true                   # dashboard on :8080 with no auth — dev only
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - proxy

networks:
  proxy:
    name: proxy

Three things to notice and never forget:

  1. The Docker socket is mounted read-only. Traefik reads it to learn what containers exist. If you ever see a tutorial mounting it read-write, close the tab.
  2. exposedbydefault=false means “don’t route to a container unless its labels explicitly say so.” Always set this. The default-true behavior leaks internal services to the internet.
  3. api.insecure=true is for development only. It exposes the dashboard on port 8080 with no auth. We’ll secure it properly in a minute.

Bring this up: docker compose up -d. Visit http://localhost:8080/dashboard/ (the trailing slash matters). You’ll see Traefik’s UI, with one entry: itself. No services yet, because nothing is labeled.

Exposing your first service

Add another container to the compose file:

  whoami:
    image: traefik/whoami
    networks:
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.whoami.rule=Host(`whoami.localhost`)"
      - "traefik.http.routers.whoami.entrypoints=web"
      - "traefik.http.services.whoami.loadbalancer.server.port=80"

traefik/whoami is a tiny test image that responds with the container’s IP and headers. Useful for sanity-checking routing. Bring it up: curl -H "Host: whoami.localhost" http://localhost/. You should get a response. If you don’t, you’ve just had your first Traefik debugging experience — head to the “How It Breaks” section.

The label syntax follows a strict pattern: traefik.http.<routers|middlewares|services>.<name>.<key>=<value>. The <name> is yours to choose; it just has to be unique. The router named whoami connects to the service named whoami by default (same name match). You can override that with traefik.http.routers.whoami.service=whoami-svc if you want.

The loadbalancer.server.port=80 tells Traefik which port on the container to hit. Traefik does not infer this from the Dockerfile’s EXPOSE. If you have multiple ports exposed, you must say which one. If you have exactly one, you can often omit this, but be explicit — being explicit is faster than debugging.

Adding HTTPS with Let’s Encrypt

Traefik’s most-praised feature is genuinely good: automatic TLS. Adjust the static config:

    command:
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entryPoint.to=websecure
      - --entrypoints.web.http.redirections.entryPoint.scheme=https
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.le.acme.email=you@example.com
      - --certificatesresolvers.le.acme.storage=/letsencrypt/acme.json
      - --certificatesresolvers.le.acme.tlschallenge=true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./letsencrypt:/letsencrypt

And on each service that should be HTTPS:

    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.whoami.rule=Host(`whoami.example.com`)"
      - "traefik.http.routers.whoami.entrypoints=websecure"
      - "traefik.http.routers.whoami.tls.certresolver=le"

Things that will trip you up the first time:

  • The domain must actually point to your server. Let’s Encrypt validates by hitting yourdomain.com:443 (TLS-ALPN-01) or yourdomain.com:80 (HTTP-01). If DNS doesn’t resolve to your server, you get an obscure “unable to issue certificate” error. Validate with dig yourdomain.com first.
  • You must reference the resolver explicitly per router. tls.certresolver=le. Configuring a resolver in static config doesn’t apply it to anything. Each router opts in.
  • acme.json must be chmod 600. Traefik refuses to start otherwise. The first time it creates the file it’ll set permissions correctly, but if you copy it between hosts you’ll need to fix them.
  • Use the staging server while testing. Let’s Encrypt’s production server has rate limits (50 certs per registered domain per week). One bad config in a loop will exhaust them and lock you out for a week. Add --certificatesresolvers.le.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory while debugging, then remove it and rm acme.json to get real certs.

Middlewares: where the actual work gets done

A reverse proxy that just forwards bytes is rare. You almost always want headers tweaked, auth checked, rate limits enforced, paths rewritten. Middlewares are how Traefik does this.

Defining and attaching a middleware via labels:

    labels:
      - "traefik.http.middlewares.myapp-auth.basicauth.users=alice:$$apr1$$abc$$xyz"
      - "traefik.http.middlewares.myapp-strip.stripprefix.prefixes=/api"
      - "traefik.http.routers.myapp.middlewares=myapp-auth,myapp-strip"

Note: the $ characters are doubled because docker-compose interprets $VAR. In a file provider YAML, you’d write single $.

The middleware chain order matters and runs in the order you list them. myapp-auth,myapp-strip means: first auth, then strip. If you stripped first, the auth middleware would see the stripped path. That doesn’t matter for basicauth, but it matters enormously for things like ForwardAuth that route to backends based on path.

The middlewares you’ll reach for repeatedly:

  • stripPrefix / addPrefix — Path surgery. “Strip /api before forwarding so the backend sees /users not /api/users.”
  • headers — Set CORS, HSTS, X-Frame-Options, X-Forwarded-*. The most-used middleware in any real setup.
  • rateLimit — Token bucket per source IP. average requests per second sustained, burst allowed in spikes.
  • basicAuth / digestAuth — Quick HTTP auth. Fine for dashboards, never for user-facing apps.
  • forwardAuth — Delegates auth to an external HTTP endpoint. The bridge to OAuth proxies, Authelia, Authentik, Keycloak-via-oauth2-proxy. Pattern: Traefik sends the request headers to your auth service; if it returns 2xx, request proceeds; otherwise it returns the auth service’s response.
  • redirectScheme / redirectRegex — HTTP→HTTPS, naked-domain to www, old-path to new-path. Use entryPoint-level redirection for the global HTTP→HTTPS case (shown above); use these middlewares for app-specific redirects.
  • compress — Gzip/Brotli the response. Defaults are fine. Just turn it on.
  • circuitBreaker — Trip if LatencyAtQuantileMS(50.0) > 100 or similar. Niche but life-saving when a backend is misbehaving.
  • chain — A meta-middleware that bundles a sequence of other middlewares. Useful for “the standard middleware stack for all our internal services.”

Putting it together: a realistic compose

services:
  traefik:
    image: traefik:v3.4
    restart: unless-stopped
    command:
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --providers.file.directory=/etc/traefik/dynamic
      - --providers.file.watch=true
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entryPoint.to=websecure
      - --entrypoints.web.http.redirections.entryPoint.scheme=https
      - --entrypoints.websecure.address=:443
      - --entrypoints.websecure.http.tls.certresolver=le
      - --certificatesresolvers.le.acme.email=ops@example.com
      - --certificatesresolvers.le.acme.storage=/letsencrypt/acme.json
      - --certificatesresolvers.le.acme.tlschallenge=true
      - --metrics.prometheus=true
      - --accesslog=true
      - --log.level=INFO
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./letsencrypt:/letsencrypt
      - ./dynamic:/etc/traefik/dynamic:ro
    networks:
      - proxy

  api:
    image: my-api:1.2.3
    networks:
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.api.rule=Host(`api.example.com`)"
      - "traefik.http.routers.api.middlewares=security-headers@file,api-rl"
      - "traefik.http.middlewares.api-rl.ratelimit.average=100"
      - "traefik.http.middlewares.api-rl.ratelimit.burst=200"

networks:
  proxy:
    name: proxy

And ./dynamic/middlewares.yml:

http:
  middlewares:
    security-headers:
      headers:
        frameDeny: true
        contentTypeNosniff: true
        browserXssFilter: true
        stsSeconds: 31536000
        stsIncludeSubdomains: true

That’s a real Traefik setup. Note how shared middlewares live in the file provider with @file references, while app-specific config lives on the container labels. This split is the idiom — read it twice. We’ll revisit it in The Taste Test.

Kubernetes, briefly

In Kubernetes, you don’t use labels — you use either standard Ingress resources or Traefik’s IngressRoute CRDs. The CRD is what you want; it’s strictly more capable.

Install via Helm:

helm repo add traefik https://traefik.github.io/charts
helm install traefik traefik/traefik

Then define an IngressRoute:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: my-app
spec:
  entryPoints: [websecure]
  routes:
    - match: Host(`app.example.com`)
      kind: Rule
      services:
        - name: my-app-svc
          port: 80
      middlewares:
        - name: rate-limit
  tls:
    certResolver: le

The structure mirrors the label model exactly — entryPoints, rule, middlewares, services — but as a typed YAML resource that kubectl can validate. Middlewares are also CRDs (apiVersion: traefik.io/v1alpha1, kind: Middleware). Same with TLSOption, ServersTransport, etc.

One painful note for Kubernetes: Let’s Encrypt does not work cleanly with multiple Traefik replicas, because Traefik’s ACME implementation doesn’t share state across instances. The fix in production-grade Kubernetes is to use cert-manager to provision certs into Kubernetes Secrets, and have Traefik read those secrets via tls.secretName. cert-manager handles renewals; Traefik just consumes. We’ll come back to this in Judgment Calls and Downsides.

What you now know

If you stopped reading here, you could deploy a real Traefik setup. You’d be a tutorial graduate. The rest of this document is what separates “I followed a guide” from “I understand this.”


5. The Mental Model

Four ideas. Internalize these and you can predict Traefik’s behavior in situations you’ve never seen.

Core Idea 1: The routing table is a projection, not a configuration

The most important thing about Traefik. Every other tool — nginx, HAProxy, Apache, Envoy with static config — treats its routing table as state you own and maintain. Traefik treats it as derived state, computed from inputs (Providers) on a continuous reconciliation loop.

This is the same shift as imperative vs declarative, or React’s diff-and-render vs jQuery’s DOM manipulation, or Kubernetes’ controllers vs SSH-and-edit-files. Traefik is the React of reverse proxies.

What this predicts:

  • You don’t restart Traefik to add a route. Adding a route means creating a thing the Provider sees (a container with labels, a CRD, a file entry). Traefik recomputes the projection. The new route exists within milliseconds.
  • You don’t have a “live config” to introspect via files. The dashboard and API show what Traefik currently believes. If a router isn’t there, it’s because no Provider is currently reporting it. Don’t grep configs; look at the dashboard.
  • Drift between “what’s configured” and “what’s running” cannot happen. There’s no canonical config to drift from. Whatever the orchestrator says exists, Traefik routes to.
  • A typo in a label is a silent failure. Traefik computes the projection from whatever it sees. If your label is misspelled, that misspelled property is simply absent in the projection. No syntax error. No reload failure. The route just… isn’t there. (This is the source of 80% of “I configured it but it’s not working” pain.)

Core Idea 2: Static is for Traefik; dynamic is for your apps

Two configurations, two lifecycles, two reload models. Conflating them is the second-biggest source of pain after typos.

Static is who Traefik is: which ports it listens on, which providers it watches, which cert resolvers exist, where logs go, what plugins to load. Set at startup. Changing it requires a Traefik restart.

Dynamic is what Traefik does: the actual routers, middlewares, services, certs. Pulled from providers continuously. Hot reloaded on change.

This predicts:

  • You will be tempted to put a TLS option or a middleware in static config. Don’t. They go in dynamic config, even though they feel “global.” TLS options, default certificates, named middlewares, ServersTransport — all dynamic.
  • When you change traefik.yml (the static file) and “nothing happens,” it’s because you didn’t restart the container. Traefik does not watch its static config.
  • The File provider is the bridge. It’s a dynamic provider that reads YAML/TOML files, hot-reloading on change. People often confuse traefik.yml (static) with dynamic.yml (dynamic via File provider). They are not the same file, even though both are YAML on disk. Name them clearly. I beg you.

Core Idea 3: Everything is named, and names match by string

Routers, middlewares, services — every Traefik object has a name. Labels and CRDs reference each other by string. There is no type-checking, no foreign-key enforcement, nothing.

traefik.http.routers.myapp.middlewares=auth,strip

This says “router myapp uses middlewares named auth and strip.” If auth doesn’t exist as a middleware anywhere Traefik knows about, the router silently drops that reference. Sometimes it logs a warning; sometimes it doesn’t.

This predicts:

  • Misspelled middleware names are silent. A common version of the typo footgun above.
  • You can reference middlewares across providers using name@provider syntax: middlewares=auth@file,rl@docker. Without the suffix, Traefik looks in the same provider that defined the router.
  • Names live in a flat namespace per provider. Two containers both defining traefik.http.middlewares.rate-limit.ratelimit.average=100 aren’t both creating their own private rate-limit — they’re both writing to the same name. Last one wins, sort of, but actually it’s nondeterministic across reloads. Either namespace your middleware names per-app (myapp-ratelimit) or define shared middlewares once in a file provider.
  • Service names default to the router name. A router named myapp connects to a service named myapp unless told otherwise. Convenient when starting out, confusing when you start splitting things up.

Core Idea 4: Middlewares are a pipeline, not a checklist

The middleware chain is a Unix pipe: each one’s output becomes the next one’s input. Order is meaningful. Some middlewares only make sense in certain positions.

Router matches → MW1 → MW2 → MW3 → Service forward

Service responds ← MW3 ← MW2 ← MW1 ← Client receives

Each middleware sees the request on the way in, can let it through, modify it, or short-circuit it. On the way out, the same middlewares see the response in reverse. Most middlewares only operate on one direction (rate-limit on the way in, compress on the way out). Some operate on both (headers can rewrite request and response headers).

This predicts:

  • stripPrefix before forwardAuth versus after produces different behavior for any auth service that cares about the request path. The auth service either sees /api/users or /users depending on order. Both are valid, both work, but they’re different.
  • Rate limiting before auth is the right default. You want to drop floods before paying the cost of an auth round-trip. Putting auth first means you can be rate-limited by fake auth attempts that exhaust your auth backend.
  • Compress is essentially always last. It needs to see the final response body. Putting it before any middleware that modifies the response means you compressed something that’s about to be rewritten.
  • Headers middlewares stack. A customResponseHeaders middleware setting X-Foo: 1 followed by another setting X-Foo: 2 results in X-Foo: 2. Last write wins per header.

6. The Architecture in Plain English

Let’s follow a single request all the way through. You curl https://api.example.com/v1/users on a Kubernetes cluster running Traefik. What happens?

Boot time

When Traefik starts:

  1. It reads the static configurationtraefik.yml, CLI flags, env vars. From this it learns: what EntryPoints to open, which Providers to enable, which CertResolvers exist, what plugins to load, where to send metrics and logs.
  2. It binds the EntryPoints. The kernel hands it sockets listening on :80, :443, etc. At this point, requests arriving get a 404 because no routers are configured yet.
  3. It starts the Provider goroutines. Each provider runs an event loop watching its source. The Kubernetes provider establishes a watch against the API server for IngressRoutes, Middlewares, Services, Endpoints, and Secrets in its watched namespaces. The Docker provider connects to the socket and subscribes to container events.
  4. Each provider does an initial sync — fetches the current state and produces a dynamic.Configuration struct (Go type). Sends it to Traefik’s internal config channel.
  5. Traefik’s reconciliation loop receives the config from each provider, merges them into a unified internal state, and builds the routing tables: a map from EntryPoint to a list of routers, each router with a compiled rule (the AST of Host() && PathPrefix() etc.), its middleware chain (resolved by name), and its target service (resolved to a load-balanced set of backend endpoints).
  6. This unified state is swapped in atomically. From the next request onward, Traefik routes against it.

The whole thing typically takes a second or two. There’s no “config validation” step in a separate phase; if your config is broken, you find out by seeing errors in the logs and missing routers in the dashboard.

Request time

A request arrives on :443:

  1. TLS handshake. Traefik looks at the SNI (Server Name Indication) from the ClientHello — the domain the client is asking for. It picks the right certificate. If multiple certs match the SNI, the most specific one wins. If none match, it serves a self-signed default and HTTPS clients will scream. The handshake completes; we now have a TLS-decrypted byte stream.
  2. HTTP parsing. Traefik’s HTTP/2 (or HTTP/1.1, or HTTP/3 if enabled) implementation parses the request line and headers. It now has a Go *http.Request.
  3. Router matching. Traefik consults the routing table for this EntryPoint (websecure). It walks the list of routers in priority order, evaluating each router’s compiled rule against the request. The first match wins — that’s the chosen router. (Rules are not a longest-prefix match; they’re a first-match in priority-then-length order. We’ll come back to this.)
  4. Middleware chain. The chosen router has a list of middleware names, resolved at config-build time to actual Go function handlers. Traefik invokes them as nested http.Handlers — the standard Go middleware composition. Each one can pass through, modify, or short-circuit.
  5. Service resolution and load balancing. The router points at a service. The service is a Go struct holding a list of backend URLs (from the Kubernetes Endpoints, Docker container IPs, or static config). The load balancer picks one — round-robin by default, with optional sticky sessions via a cookie or weighted random.
  6. Reverse proxy to backend. Traefik opens a connection (or pulls one from the per-backend connection pool) and forwards the request. The request gets X-Forwarded-For, X-Forwarded-Proto, X-Real-IP, and friends added. The backend handles it.
  7. Response streams back. Through the same middleware chain in reverse. compress might gzip the body. headers might add response headers. Logs are written if access logging is on. Metrics are incremented if metrics is on. The TLS layer encrypts. The byte stream goes back to the client.

Where state lives

This is the key insight. Most reverse proxies hold their config on disk (and a working copy in memory after reload). Traefik holds all config in memory, derived from sources of truth that live elsewhere:

  • Routing table — In-memory only. Rebuilt on every reconciliation.
  • Container/pod IPs and ports (the Services’ backends) — Discovered from the Docker socket / Kubernetes API, kept in memory, updated on events.
  • Certificates from ACME — Stored in acme.json (file) on disk. This is the one piece of stateful, durable data Traefik owns and must protect.
  • In-flight connections, sessions, sticky cookies — In-memory, lost on restart. For sticky sessions across replicas, you need a shared store (Redis), and even then it’s brittle.
  • Metrics counters — In-memory between scrapes, exported to Prometheus.

Traefik is almost stateless. The one exception — acme.json — is why scaling Traefik horizontally is awkward. Two Traefik instances each managing their own acme.json will independently request certificates from Let’s Encrypt, racing for the same domain, and eventually hitting rate limits. That problem and its workarounds shape a lot of Traefik production design.

The provider machine in more detail

Each provider is a Go goroutine running an event loop. The interface is roughly:

ProvideConfig(ctx) → channel<dynamic.Configuration>

It sends a full snapshot every time anything changes. Not deltas — a complete picture. Traefik’s reconciler diffs the snapshot against the current state and figures out what changed.

For the Docker provider, the trigger is Docker socket events. The provider gets notified that container X started, fetches its labels, builds the partial config (routers/middlewares/services derived from those labels), and pushes a new snapshot.

For Kubernetes, the trigger is informer events on watched resources. Same pattern — assemble the snapshot, push it.

This architecture means you should think of providers as conveyor belts. Things go on, things come off, the routing table reflects the current cargo. The internal config is always whatever the most recent push says. There’s no merge logic between snapshots from the same provider — last-write-wins per provider.

Multiple providers merge in a defined-but-not-always-obvious priority. Each provider’s config lives in its own namespace, accessed via name@provider. Default behavior when names collide across providers is well-defined per resource type; default behavior within a single provider when names collide is “last definition processed wins,” which is undefined-in-practice.


7. The Things That Bite You

The non-obvious behaviors that consume the first six months of your time. Each one connects back to a Mental Model concept.

Gotcha 1: The silent typo

What you’d expect: A misspelled label or annotation produces a config error, the same way a typo in nginx config fails the reload.

What actually happens: Nothing. The label is just… ignored. The router doesn’t get the middleware. The middleware doesn’t get the setting. Your route appears in the dashboard but doesn’t work, or works without the protection you thought you added.

This follows directly from Mental Model #1: Traefik computes a projection. If a label doesn’t match the expected schema, it’s not part of the projection. There’s no “validate” step that rejects bad input — input that doesn’t make sense is simply silently dropped.

How to handle: Use the dashboard religiously while developing. Every router you add — go look at it. Verify the middlewares list, the service, the rule, the TLS config. If something isn’t there that you expected, you have a typo. The dashboard is the source of truth for “what does Traefik think your config is.”

Gotcha 2: The “no available server” 502

What you’d expect: If a container with the right labels is up and reachable on its declared port, Traefik should route to it.

What actually happens: Sometimes you get a 502, “no available server,” even when the container appears healthy. Almost always one of these three causes:

  1. Wrong network. Traefik and the target container must share a Docker network. The labels say “route here,” but Traefik can’t actually reach the container’s IP because they’re on different networks. The dashboard will show the service exists, with an endpoint, but the endpoint isn’t reachable. Fix: put both in the same explicit network, never rely on the default bridge.
  2. Wrong port. traefik.http.services.X.loadbalancer.server.port must be the port the app listens on inside the container, not the host-published port. Confusing this is universal among beginners.
  3. App not bound to 0.0.0.0. Container apps that bind to 127.0.0.1 aren’t reachable from outside the container. The container’s listen address must be the all-interfaces wildcard.

This is the corollary to Mental Model #1: when Traefik discovers a service via labels, it trusts that the discovered endpoint is actually reachable. Discovery is metadata, not health-checking.

How to handle: Add health checks (traefik.http.services.X.loadbalancer.healthcheck.path=/health). Traefik will probe the backend and remove unhealthy ones from rotation. Without health checks, Traefik will happily route to a container that’s still booting and serve users 502s.

Gotcha 3: Router rule priority surprises

What you’d expect: Given two rules Host(\api.example.com`)andHost(`api.example.com`) && PathPrefix(`/v1`), the more specific one wins for /v1` paths.

What actually happens: Sort of, but the rule is “longest rule string wins” by default — which usually does the right thing but occasionally doesn’t. The rule length is the length of the rule as a string, not its specificity. A HostRegexp rule that matches everything but has a long regex pattern will beat a Host rule for a specific domain.

How to handle: When you have overlapping rules, set explicit priority on each router. Higher priority wins. Numbers are arbitrary but the convention is small (1-1000). Make priorities sparse so you can insert later: 100, 200, 300, not 1, 2, 3.

This connects to Mental Model #3: every router is just a named object with properties. Priority is one of those properties. There’s no inheritance, no fallback chain — explicit priority is the way to be sure.

Gotcha 4: ACME and multiple replicas — the wildcard certificate problem

What you’d expect: Run Traefik with 3 replicas behind a load balancer. Each one fetches certs from Let’s Encrypt as needed. Works fine.

What actually happens: Each Traefik replica independently runs the ACME flow. They race to register accounts, race to request certificates, race to update their own (separate!) acme.json files. Let’s Encrypt’s HTTP-01 challenge requires the challenge response to be served by the same instance the CA hits — but the load balancer might send the validation request to a different replica. You hit rate limits, certificates flap, half your replicas serve self-signed defaults.

This is documented but trips up everyone the first time. It follows from Mental Model #2 and from the simple fact that acme.json is the one piece of state Traefik owns and doesn’t share.

How to handle (Docker): Run exactly one Traefik replica. If you need HA, put a TCP load balancer in front and use active-passive, or accept that your “HA” is “restart fast.” Or use the DNS-01 challenge — no per-replica validation, certs renewed on a leader. Or move the cert lifecycle outside Traefik entirely (cert-manager + an external store).

How to handle (Kubernetes): Don’t use Traefik’s ACME at all in production. Use cert-manager. Have cert-manager issue certs into Kubernetes Secrets. Tell Traefik to use those secrets via tls.secretName in your IngressRoute. cert-manager handles renewals across replicas, Traefik just consumes.

Gotcha 5: HTTP-01 challenge during initial deploy

What you’d expect: Bring up Traefik with HTTP-01 ACME enabled. It’ll get its certs and start serving HTTPS.

What actually happens: Traefik can’t get certs because the domain doesn’t yet point to your server (you haven’t pointed DNS yet, or it hasn’t propagated, or you’re still in CI). Traefik retries forever; the staging server eats your acme.json; you bring up production with the staging cert mistakenly persisted.

How to handle: Always use the Let’s Encrypt staging server while you’re iterating. Add --certificatesresolvers.le.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory. Once everything works, remove that flag and rm acme.json to force re-issuance against production. Test DNS propagation with dig before flipping the switch.

Gotcha 6: PathPrefix routes the entire prefix tree

What you’d expect: PathPrefix(\/api`)matches/api/users. It also matches /apicatbecause/apiis a prefix of/apicat`.

What actually happens: Right, it matches both. PathPrefix is literal prefix matching, not segment matching. So PathPrefix(\/api`)routes anything starting with/api— including/apicat, /apixyz`, etc.

How to handle: Use Path() for exact match, PathPrefix() only when you actually want all sub-paths. If you want strict segment matching (/api and /api/... but not /apicat), construct your rule explicitly: Path(\/api`) || PathPrefix(`/api/`). Or use a regex via PathRegexp(`^/api(/|$)`)`.

This connects to Mental Model #4: rules are exactly what they say. There is no implicit “smart matching.” Traefik does exactly what the rule string says, in the most literal interpretation.

Gotcha 7: The dashboard doesn’t show what you think

What you’d expect: The dashboard lists all your routers and their current state, including which middlewares are attached.

What actually happens: It does, but the dashboard only shows routers that successfully built. A router with an unresolved middleware reference (Gotcha 1) might show up but with the middleware list missing or partial. A router whose rule failed to parse won’t appear at all — but you have to scroll through the static logs to find the parse error.

How to handle: Combine the dashboard with --log.level=DEBUG (temporarily) when something isn’t matching what you expect. The logs at startup show every router as it’s processed. Errors in rule parsing or service resolution appear there, not in the dashboard.

Gotcha 8: Trailing slash sensitivity in routes

What you’d expect: A request to https://api.example.com/v1 and https://api.example.com/v1/ route to the same place.

What actually happens: They might not, if your rule is PathPrefix(\/v1/`)(trailing slash) orPath(`/v1`)` (no trailing). Different backend apps care about this differently — some 301 you, some 404 you, some treat them as the same.

How to handle: Pick a convention for your URLs (with or without trailing slash) and enforce it with a redirect middleware. Don’t assume Traefik will handle it for you — it won’t. Or use rules like PathPrefix(\/v1`) || Path(`/v1`)` if you genuinely want to accept both.

Gotcha 9: exposedByDefault=true is a security footgun

What you’d expect: Containers without Traefik labels are simply not exposed. Traefik only routes to labeled containers.

What actually happens: That’s the safe behavior, but it’s not the default in the Docker provider. With exposedByDefault=true (which is the default default), Traefik treats every container as if it had traefik.enable=true. It auto-generates a router using the container name. Your internal database with no labels suddenly has a public route called Host(\postgres.example.com`)`.

How to handle: Always, always set --providers.docker.exposedbydefault=false in the static config. Then opt each service in with traefik.enable=true. This is one of those defaults that should be inverted but isn’t, for backwards-compat reasons.

Gotcha 10: The Kubernetes Ingress provider is not the IngressRoute CRD provider

What you’d expect: “Kubernetes provider” is one thing.

What actually happens: There are several Kubernetes providers, you can enable any combination, and they behave differently:

  • kubernetes (alias for kubernetesingress) — handles standard Ingress resources.
  • kubernetescrd — handles IngressRoute and other Traefik CRDs.
  • kubernetesgateway — handles Gateway API resources.

If you defined IngressRoute CRDs but only enabled kubernetesingress, nothing happens. The CRDs sit in etcd, untouched. You get no error, just nothing routing. Same pattern as Gotcha 1 — silent because the projection doesn’t include things from providers you didn’t turn on.

How to handle: In Helm chart values, enable both kubernetesingress and kubernetescrd unless you have a specific reason not to. Most production clusters use IngressRoute for app-team workloads and Ingress for things installed by Helm charts that emit Ingress resources.


8. The Judgment Calls

What separates experienced operators from people who followed a tutorial. Each one is a real, recurring decision.

Judgment 1: Labels vs file provider for dynamic config

The decision: Where do your routes, middlewares, and services live?

Option A — Labels on the workload. Each container or each Kubernetes resource carries its own routing config in labels/annotations. The config travels with the workload.

Option B — File provider with a structured YAML tree. Routes defined separately from workloads, in a Git-managed directory.

When labels win: Single-host Docker setups, dev environments, monolithic compose files where the routes change with the apps. The colocation is genuinely valuable — adding an app and adding its route is one commit, one file.

When file provider wins: When you have shared infrastructure (security headers, rate-limit policies, TLS options) that applies to many services. When ops owns the proxy config but devs own the apps. When your routes have logic that doesn’t map cleanly to a single container (multi-backend routing, weighted splits, external services). When you want code review on routing changes separately from app deploys.

What experienced engineers actually do: Both. Workload-specific config (the router rule, the service port) on labels/annotations on the workload. Shared, ops-controlled config (security headers, rate-limit policies, TLS options) in the file provider, referenced by name with @file. Middleware chains (the standard stack of headers + rate limit + compression for all internal services) defined once in file, attached by reference.

Signal: Are you copy-pasting the same middleware definition into 12 different containers? Move it to the file provider.

Judgment 2: HTTP-01 vs DNS-01 challenge

The decision: How does ACME validate domain ownership?

Option A — HTTP-01. Let’s Encrypt hits :80 on your public IP and reads a challenge file Traefik serves. Simple, works out of the box.

Option B — DNS-01. Traefik puts a TXT record at your DNS provider via their API. Let’s Encrypt queries DNS. Requires API credentials.

When HTTP-01 wins: Public services, single instance, port 80 reachable, no wildcard certs needed. The dead-simple case.

When DNS-01 wins: You need wildcard certs (*.example.com). You’re behind a firewall, on a private network, or in front of another proxy that doesn’t expose Traefik on port 80. You have many Traefik replicas (DNS-01 doesn’t require the validating request to hit a specific instance). You want certs issued before DNS points at the box (useful for blue/green deploys).

What experienced engineers actually do: DNS-01 for anything serious. The setup cost is one-time (provider API token), and it removes an entire class of failure modes. The wildcard support alone is usually worth it — one cert covers all your subdomains, you stop touching cert config when you launch a new service.

Signal: If you’ve ever had a cert renewal fail because port 80 was temporarily blocked, you should be on DNS-01.

Judgment 3: Traefik’s built-in ACME vs cert-manager in Kubernetes

The decision: Who owns certificate lifecycle in K8s?

Option A — Traefik’s built-in ACME (the certResolver).

Option B — cert-manager (separate operator) provisions certs into Secrets; Traefik reads them.

When Traefik’s built-in wins: Single-replica Traefik, dev/staging clusters, low-stakes services where a brief cert flap doesn’t matter.

When cert-manager wins: Anything multi-replica. Anything where the cert lifecycle is regulated (audit, rotation requirements). Anything where you might one day swap the ingress (the certs are in Secrets, not coupled to Traefik). Production. Always production.

What experienced engineers actually do: cert-manager. Period. The argument against — “more components, more complexity” — gets refuted the first time you scale Traefik to 3 replicas. cert-manager is a one-time setup that solves the problem properly.

Signal: If you have more than one Traefik pod, you should be on cert-manager.

Judgment 4: Traefik vs nginx vs Caddy vs HAProxy vs Envoy

The decision: Which reverse proxy?

This is the perennial question. Here’s the honest take, by use case:

  • Cloud-native, container-orchestrated, dynamic service discovery is your dominant pattern. Traefik or Envoy. Traefik if you want defaults that just work and your team is small. Envoy if you have a platform team that can carry the complexity in exchange for serious traffic control.
  • Static config, raw performance matters (>50k req/s, low-latency P99). Nginx or HAProxy. Both are old, fast, and battle-tested. HAProxy is the better load balancer; nginx is the better web server.
  • Single-host with a handful of services, you value simplicity over flexibility. Caddy. Its Caddyfile is the friendliest config syntax in the industry. Automatic HTTPS is even slicker than Traefik’s.
  • You need a full service mesh. Envoy (Istio, Linkerd2-proxy uses a custom variant, Consul Connect uses Envoy). Traefik Mesh existed but is essentially deprecated.
  • You’re running in Kubernetes and want vendor-neutral. ingress-nginx (most common), Traefik (second most common), or one of the newer Gateway API implementations (cilium, gateway-api-inference-extension).

What experienced engineers actually do: Choose by team and pattern, not by benchmark. Traefik genuinely shines for dynamic, container-native, sub-team-of-platform environments. It’s not the fastest, it’s not the most powerful, it’s not the simplest. It’s the one that hits the right point for “we want a real reverse proxy without dedicating a platform engineer to it.”

Signal: Are you spending more time fighting your proxy than configuring your apps? You picked wrong.

Judgment 5: Where to put the middleware chain — entryPoint, router, or both

The decision: Some middlewares (security headers, default rate limits) should apply to everything. Where do they live?

Option A — On each router individually. Repetitive, but explicit.

Option B — Default middlewares at the entryPoint level. Static config; applies to every router on that entryPoint.

Option C — A chain middleware definition, referenced by every router. DRY but still per-router opt-in.

When entryPoint defaults win: Truly universal middlewares — global rate limiting, IP allowlist, audit logging. Things that must apply, no exceptions.

When chain wins: Standard but not universal — your “internal service” stack of (security headers + compression + access log enrichment). Most services use it, but the dashboard or a special API endpoint might not.

What experienced engineers actually do: Entry-point-level for a small set of true universals. Named chains for the common cases. Inline references for everything else. The trap is putting “universal” middlewares at the entryPoint level and then needing one exception — entryPoint middlewares apply to everything on that entryPoint, and the workaround is awkward.

Signal: If you find yourself adding middlewares.exclude logic, your “universal” middleware wasn’t universal.

Judgment 6: TCP/UDP routing vs HTTP routing

The decision: Traefik supports TCP and UDP routers, not just HTTP. Should you put non-HTTP services behind it?

When yes: Postgres, Redis, raw MQTT, gRPC over TCP (without HTTP/2 termination), game servers. Traefik can do SNI-based TCP routing (route by domain even on TCP, because TLS ClientHello carries SNI). Useful for multi-tenant Postgres-as-a-service kinds of setups.

When no: When you want any kind of L7 features (path routing, header rewrites, retries) for those protocols — Traefik can’t give them. When you’d be using Traefik as a glorified port forwarder — use a real L4 LB.

What experienced engineers actually do: Use Traefik’s TCP routing only when SNI-based routing of TLS-terminated TCP genuinely adds value (multiple Postgres clusters on the same port, each with their own cert). Otherwise, run a dedicated L4 LB or use Kubernetes Services with NodePort/LoadBalancer for L4.

Signal: If you’re routing TCP and not using SNI, you’re using a sledgehammer where a hammer would do.

Judgment 7: Sticky sessions or not?

The decision: Some apps assume the same client keeps hitting the same backend. Do you enable sticky session routing in Traefik?

Option A — Sticky sessions (loadbalancer.sticky.cookie). Traefik plants a cookie identifying the backend; subsequent requests with that cookie go to that backend.

Option B — Round-robin everything, fix the app to be stateless.

What experienced engineers actually do: Option B, almost always. Sticky sessions are a band-aid for stateful backends, and they create operational nightmares: deploys cause sessions to die en masse, unbalanced load on whichever backend has the long-lived users, sticky cookies surviving past backend rotation, etc. The right fix is almost always making the backend stateless (store session in Redis, JWT, etc.).

When sticky sessions are actually correct: WebSocket-heavy apps with hand-rolled session state on the connection. Even then, prefer sharding by user ID at a layer Traefik can see (e.g., a header), not opaque cookie stickiness.

Signal: If you’re enabling stickiness “for performance,” you’re probably misdiagnosing.

Judgment 8: How much access logging to enable

The decision: Access logs are useful for debugging and analytics, but they’re expensive in volume and CPU.

Options: Off / sampled / per-router / always-on.

What experienced engineers actually do: Always on, JSON format, sent to a structured log pipeline (Loki, Elastic, CloudWatch). Sample only if the volume genuinely overwhelms downstream. Per-router access log enabling exists but is rarely worth the operational complexity.

The CPU cost of access logging in Traefik is real but small (~5-10%). The information cost of not having access logs when something breaks is catastrophic.

Signal: If you’ve ever been stuck debugging without access logs, you’ll always have them on after that.

Judgment 9: Plugins — when to write one vs find an alternative

The decision: You need behavior Traefik doesn’t have. Write a plugin, or find another path?

Plugin pros: Stays in-process, no extra network hop. Hot-loaded (Yaegi/Wasm). Catalog has 100+ existing options.

Plugin cons: Yaegi plugins are slower than compiled middlewares (interpreted Go has ~5-10x overhead). Wasm plugins are better but newer. Restart required to load a new plugin. Limited debugging. Authoring is fiddly.

What experienced engineers actually do:

  1. Check the plugin catalog first. Many problems have solutions.
  2. Look for an off-the-shelf sidecar (a small auth service behind ForwardAuth, oauth2-proxy, Authelia).
  3. Consider whether the behavior belongs in the application instead.
  4. Write a plugin only when the behavior is genuinely about request routing/transformation that can’t be done well anywhere else.

Signal: If your plugin is implementing business logic, it’s in the wrong place.

Judgment 10: When to give up and pick something else

The decision: When does Traefik stop being the right tool?

Real signals it’s time to leave:

  • You’re regularly hitting CPU walls because of TLS overhead at >5k req/s — consider an L4 LB doing TLS termination upstream of Traefik, or move to nginx/HAProxy for the edge.
  • You need fine-grained, programmable traffic control — header-based weighted routing, gradual rollouts, traffic mirroring with sampling, Lua scripting. Envoy.
  • You’re building a service mesh, not just an ingress. Istio (Envoy under the hood), Linkerd2.
  • You need real-time, mutable config from an external control plane via xDS. Envoy is the only adult in the room.
  • Your team has deep nginx muscle memory and reverse proxy is not where you want to spend learning budget. Stay on nginx.

Signal: Are you using <30% of Traefik’s features and frustrated with the parts that don’t fit your workflow? Wrong tool.


9. The Commands/APIs That Actually Matter

The 20% you’ll use 80% of the time, with the why.

Static configuration: the canonical structure

# traefik.yml — the install-time config

global:
  checkNewVersion: false       # turn off telemetry phone-home
  sendAnonymousUsage: false    # ditto

log:
  level: INFO                  # DEBUG only when chasing a problem
  format: json                 # JSON in prod, common in dev
  filePath: /var/log/traefik/traefik.log  # or stdout (default)

accessLog:
  format: json
  filePath: /var/log/traefik/access.log
  bufferingSize: 100           # buffer this many entries before flush

api:
  dashboard: true
  insecure: false              # NEVER true in production
  # In production, expose api@internal via a router with auth middleware

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
          permanent: true
  websecure:
    address: ":443"
    http:
      tls:
        certResolver: le
    transport:
      respondingTimeouts:
        readTimeout: 60s
        writeTimeout: 60s
        idleTimeout: 180s

providers:
  docker:
    exposedByDefault: false
    network: proxy             # the shared network for all routed containers
    watch: true
  file:
    directory: /etc/traefik/dynamic
    watch: true

certificatesResolvers:
  le:
    acme:
      email: ops@example.com
      storage: /letsencrypt/acme.json
      dnsChallenge:
        provider: cloudflare
        resolvers:
          - "1.1.1.1:53"
          - "8.8.8.8:53"

metrics:
  prometheus:
    addEntryPointsLabels: true
    addServicesLabels: true
    addRoutersLabels: true
    buckets: [0.1, 0.3, 1.2, 5.0]

tracing:
  otlp:
    http:
      endpoint: http://otel-collector:4318/v1/traces

That’s a production-shaped static config. Read each block and know what each one does. Most “advanced” Traefik tutorials are just showing you minor variations on this.

Router rules — the operators you’ll actually use

Host(`example.com`)                          # exact host match
Host(`example.com`, `www.example.com`)       # multiple hosts (OR)
HostRegexp(`^.+\.example\.com$`)             # regex host match
Path(`/api/users`)                           # exact path
PathPrefix(`/api`)                           # path starts with
PathRegexp(`^/api/v[12]/`)                   # regex path
Headers(`X-Custom-Header`, `value`)          # exact header value
HeadersRegexp(`X-Custom`, `^v[0-9]+$`)       # regex header
Method(`POST`)                               # HTTP method
ClientIP(`192.168.0.0/24`)                   # IP-based routing
Query(`mobile`, `true`)                      # query param match

Combine with &&, ||, !, and (...):

Host(`api.example.com`) && (PathPrefix(`/v1`) || PathPrefix(`/v2`)) && !ClientIP(`10.0.0.0/8`)

The matching is fast (compiled to a tree) but you can absolutely shoot yourself with regex backtracking. Keep regexes simple.

Essential middlewares, with the actual invocation

# In file provider, dynamic/middlewares.yml
http:
  middlewares:

    # Standard security headers — every internal service should have these
    security-headers:
      headers:
        frameDeny: true
        contentTypeNosniff: true
        browserXssFilter: true
        referrerPolicy: "strict-origin-when-cross-origin"
        stsSeconds: 31536000
        stsIncludeSubdomains: true
        stsPreload: true
        customRequestHeaders:
          X-Forwarded-Proto: "https"

    # Rate limit by source IP
    api-ratelimit:
      rateLimit:
        average: 100        # sustained rate
        burst: 200          # short-term burst
        period: 1s

    # Forward auth to an external service
    forward-auth:
      forwardAuth:
        address: http://auth-service:8080/verify
        trustForwardHeader: true
        authResponseHeaders:
          - X-User-Id
          - X-User-Role

    # Strip a prefix before forwarding
    strip-api-prefix:
      stripPrefix:
        prefixes:
          - /api/v1

    # IP allowlist for admin endpoints
    admin-only:
      ipAllowList:
        sourceRange:
          - 10.0.0.0/8
          - 192.168.0.0/16

    # The standard chain for internal services
    internal-stack:
      chain:
        middlewares:
          - security-headers
          - api-ratelimit

Then reference: traefik.http.routers.myapp.middlewares=internal-stack@file,forward-auth@file.

Dashboard and API

# Default dashboard:
http://localhost:8080/dashboard/    # trailing slash matters

# API endpoints (read-only):
http://localhost:8080/api/http/routers
http://localhost:8080/api/http/services
http://localhost:8080/api/http/middlewares
http://localhost:8080/api/overview

The /api endpoints return JSON. Useful for scripting (curl localhost:8080/api/http/routers | jq).

In production, expose the dashboard via a regular router with api@internal as the service, auth middleware in front, and ideally restricted by IP:

# in dynamic config
http:
  routers:
    dashboard:
      rule: "Host(`traefik.example.com`)"
      entryPoints: [websecure]
      service: api@internal
      tls:
        certResolver: le
      middlewares:
        - dashboard-auth
        - admin-only

  middlewares:
    dashboard-auth:
      basicAuth:
        users:
          - "admin:$apr1$abc...xyz"

Health check configuration

# Docker labels:
- "traefik.http.services.myapp.loadbalancer.healthcheck.path=/healthz"
- "traefik.http.services.myapp.loadbalancer.healthcheck.interval=10s"
- "traefik.http.services.myapp.loadbalancer.healthcheck.timeout=3s"

In Kubernetes, Traefik trusts the Service’s Endpoints, which are already filtered by readiness probes. You don’t need Traefik-level health checks in K8s — let the kubelet do it. In Docker, Traefik-level health checks are essential.

Metrics that matter

When scraped by Prometheus, Traefik emits a small but useful set:

traefik_router_requests_total{router, code, method, protocol}
traefik_router_request_duration_seconds_bucket{router, ...}
traefik_service_requests_total{service, code, ...}
traefik_service_open_connections{service}
traefik_entrypoint_requests_total{entrypoint, ...}
traefik_tls_certs_not_after{cn}              # cert expiry timestamp

Three alerts you should have day one:

  1. Request rate of 5xx on any router > some threshold — your backends are unhappy.
  2. Cert expiry within 14 days — your ACME setup is broken.
  3. Traefik down — basic uptime monitor.

Common operations

# Validate a config file (sort of):
traefik --configFile=traefik.yml --help    # exit code tells you parse status

# Reload by SIGHUP (signal to reread some things — but not most):
docker kill --signal=SIGHUP traefik         # rarely useful in v3

# In Kubernetes: just delete the pod, the deployment recreates it.
kubectl rollout restart deployment/traefik

# Tail access logs:
docker logs -f traefik | grep -v '"path":"/healthz"'

# Watch the dashboard via API for changes:
watch -n 1 "curl -s localhost:8080/api/http/routers | jq 'length'"

10. How It Breaks

Failure modes, what to check first, the debugging workflow.

Symptom: 404 on a route you just configured

Root cause: The router doesn’t exist, or its rule doesn’t match. Connects to Mental Model #1 (silent projection failures) and Gotcha 1.

Diagnose:

  1. Open the dashboard at /dashboard/. Is the router listed under HTTP > Routers?
  2. If no — the labels/annotations didn’t register. Check docker logs traefik | grep -i error or kubectl logs deploy/traefik | grep -i error. Look for typos in label keys. Verify traefik.enable=true is set (when exposedByDefault=false).
  3. If yes but routing fails — click the router. Check the rule. Run curl -v with the exact host/path you expect to match. The Host header matters; for Host(\api.example.com`)to match, the request must arrive withHost: api.example.com`, not the server’s IP.
  4. Check that the router is on the right entryPoint. A router on web only catches :80 traffic; if you’re hitting :443, it needs to be on websecure.

Symptom: 502 Bad Gateway / “no available server”

Root cause: The service has no reachable backends. Connects to Gotcha 2.

Diagnose:

  1. Dashboard > Services. Is the service listed? If no, you have a label/CRD problem.
  2. If yes, click it. How many “servers” does it show? If 0, no endpoint was discovered.
  3. Is the container in the same Docker network as Traefik? docker network inspect proxy | jq '.[0].Containers' should list both.
  4. Is the port correct? Check the service’s loadbalancer.server.port against what the app actually listens on inside the container (not the host port).
  5. Can Traefik reach the backend? Exec into the Traefik container and wget -O- http://my-app:port/ — if that fails, it’s a network/listen issue, not a Traefik issue.

Symptom: 503 Service Unavailable

Root cause: Traefik knows about the service but every backend failed its health check (or the circuit breaker is tripped).

Diagnose:

  1. Check the health check path. curl http://backend-ip:port/health from inside the Traefik container. Does it return 2xx?
  2. Check that the path is correct (/health vs /healthz vs /).
  3. Disable the health check temporarily to confirm. If 503 → 502 with no health check, the backend is genuinely down. If 503 → 200, your health check is misconfigured.

Symptom: Certificate errors / self-signed cert served

Root cause: ACME didn’t succeed for this domain. Connects to Gotcha 4 and Gotcha 5.

Diagnose:

  1. Check acme.json exists and has content. cat acme.json | jq '.le.Certificates[].Domain' should list your domains.
  2. Check the cert resolver is referenced on the router. traefik.http.routers.X.tls.certresolver=le must be set.
  3. Check Traefik logs for ACME errors. Common ones:
    • “unable to obtain ACME certificate” — DNS doesn’t resolve, port 80 unreachable, rate limit hit.
    • “challenge timeout” — DNS propagation slower than the challenge timeout, fix with dnsChallenge.delayBeforeCheck or use better DNS resolvers.
    • “too many requests” — Let’s Encrypt rate limit. Wait, or use staging while debugging.
  4. If running multiple Traefik replicas with ACME — that’s your problem. See Gotcha 4.

Symptom: Routing was working, then stopped after a deploy

Root cause: Almost always one of: a network changed, a label was lost, or a name collision was resolved against you.

Diagnose:

  1. Dashboard > compare current state to what you expected. What’s different?
  2. docker network ls and verify the proxy network still exists and contains both containers.
  3. Check whether anything else uses the same router/middleware name. (Two containers both with traefik.http.routers.api.rule=... race; one wins.)

Symptom: Slow / high latency on routes that should be fast

Root cause: Several possibilities, ranked by likelihood:

  1. Compression doing extra work. If your backend already gzips and Traefik tries to compress again, you pay CPU twice. Fix: configure compress middleware properly (it shouldn’t re-compress) or remove it for that route.
  2. Re-resolving DNS for backends every request. Happens with file-provider Services using DNS names. Set up proper resolvers; prefer container/pod IPs.
  3. TLS termination overhead with weak ciphers. Check tls.options — modern profile is faster than legacy.
  4. Backend slow, not Traefik. Compare traefik_service_request_duration_seconds to traefik_router_request_duration_seconds. The delta is Traefik’s overhead; the rest is backend latency.

Symptom: Traefik crashed / restarted

Root cause: Rare. Almost always one of:

  1. OOM — Traefik with massive numbers of certs and routers can use real memory. Check docker stats or pod memory limits.
  2. Plugin panic — Yaegi-loaded plugins can crash. Logs show a stack trace.
  3. Provider connection lost catastrophically — happens if Docker daemon dies, K8s API is unreachable for too long.

Traefik’s restart behavior is graceful; in-flight requests usually complete. New requests get refused briefly during the restart window.

The general debugging workflow

When something is wrong with Traefik routing, run these in order:

  1. curl -v the failing endpoint. Look at the actual response status and headers. A 404 vs 502 vs 503 narrows the problem space dramatically.
  2. Check the dashboard. Is the router there? With the rule you expect? With the middlewares attached? With a service that has servers?
  3. docker logs traefik | tail -200 (or kubectl logs deploy/traefik --tail=200). Errors and warnings. Look for ACME, parser, provider errors.
  4. Bump log level to DEBUG temporarily. --log.level=DEBUG. This is noisy but tells you exactly which router matched, what middlewares ran, what backend was chosen. Restart, reproduce, examine, restore INFO.
  5. curl from inside the Traefik container to the backend’s IP:port. Network and listen-binding issues become obvious.
  6. Check the access log (if enabled). It shows what router/service handled each request. Useful for “did my request even reach this route?”

That five-step loop solves 95% of Traefik issues. The other 5% involve weirder things (network policies, eBPF, kernel-level connection tracking) that aren’t really about Traefik.


11. The Downsides / Disadvantages

The honest accounting. Traefik is a good piece of software. It also has structural problems you should know about before betting your platform on it.

Downside 1: ACME at scale is fundamentally broken

The downside: Traefik’s built-in ACME implementation does not scale beyond one replica.

Where it comes from: ACME state lives in a local file (acme.json). Each Traefik instance owns its own. HTTP-01 challenges arrive at port 80; the load balancer can’t guarantee they hit the instance that initiated the challenge. The team tried distributed state via KV store in v1 and dropped it in v2 because it never worked well enough. Direct quote from the docs: “it is not possible to run multiple instances of Traefik 2.0 with Let’s Encrypt enabled.”

What it costs you: In any production-grade setup (multiple replicas for HA), you must either move the cert lifecycle out of Traefik (cert-manager), use DNS-01 with a coordination dance, or pay for Traefik Enterprise which has distributed ACME as a supported feature. Most teams discover this on day 1 of “let me scale Traefik to 2 replicas” and have to rework their setup.

When it’s a dealbreaker: Never quite — there are workarounds. But it’s an embarrassing rough edge for a tool marketed as cloud-native.

What people think mitigates it but doesn’t: Shared acme.json over NFS or some shared volume. Don’t. The locking is wrong and you get corruption.

Downside 2: Performance ceiling is real

The downside: Traefik is meaningfully slower than nginx or HAProxy under load.

Where it comes from: It’s written in Go. Go has GC. Go’s HTTP stack, while excellent for application code, has more overhead than nginx’s hand-tuned C event loop. Add the cost of label introspection, middleware composition through Go interfaces, and the result is real.

What it costs you: Benchmarks consistently show nginx handling 2-3x the requests-per-second of Traefik on the same hardware, with lower P99 latency. For a service doing 50,000 req/s, “5-10ms extra latency from Traefik” is meaningful. Traefik 3 added an experimental “FastProxy” mode that helps, but it’s still not at parity.

When it’s a dealbreaker: If you’re running a serious public edge — say, an API gateway in front of a high-traffic site, or a CDN-adjacent component — Traefik is the wrong tool. nginx, HAProxy, or Envoy will all give you better perf. Traefik is fine for “in front of your apps in your cluster”; not fine for “the thing handling 100k req/s of public traffic.”

What people think mitigates it but doesn’t: “Just scale it horizontally.” Doesn’t help, because the cost-per-request is what’s higher. You can scale around it but you’ll spend more on infrastructure.

Downside 3: The config syntax is genuinely ugly

The downside: Label-based configuration is verbose, repetitive, and visually noisy. CRDs are better but still have ergonomic warts.

Where it comes from: The need to express deeply nested Go structures as flat key-value pairs (Docker labels). There’s no natural way to make traefik.http.middlewares.foo.headers.customResponseHeaders.X-Frame-Options=DENY pretty.

What it costs you: Your docker-compose files become long sequences of labels. Adding a non-trivial middleware to a service means 4-8 lines of labels. Code review of routing changes is painful — diffs are noisy. Onboarding new engineers takes longer because the syntax doesn’t match anything else they know.

When it’s a dealbreaker: Rarely on its own, but combined with team scale issues — at some point you start hand-writing tools to generate the labels from a higher-level spec, which means you’re building a config compiler for your config compiler.

What people think mitigates it but doesn’t: “Move everything to the file provider.” Helps for shared stuff, but per-app routing still has to live somewhere, and labels are the natural home for that. You can’t get away from labels entirely without giving up Traefik’s discovery model.

Downside 4: Silent failure on misconfiguration

The downside: Traefik’s “compute a projection from inputs” architecture means bad input produces a wrong-but-running system, not a failure to start.

Where it comes from: Mental Model #1, the core design choice. If you wanted Traefik to refuse to start when a label is malformed, you’d lose the “deploy a new container and it just appears” magic — because that container’s labels might be malformed too, and refusing to start would cascade.

What it costs you: Hours of debugging “why isn’t my middleware being applied.” It’s not because of an error message; it’s because the projection didn’t include the thing you thought it would. The dashboard becomes essential as a debugging tool. You learn to triple-check labels before deploying anything time-sensitive.

When it’s a dealbreaker: Never; it’s a tax you pay forever, not a wall you hit.

What people think mitigates it but doesn’t: “We’ll write a linter for our labels.” You can, but it’s never complete because the schema evolves and the rules are subtle. Better to just lean on the dashboard.

Downside 5: Plugins are slower than they look

The downside: Yaegi-based Go plugins run 5-10x slower than equivalent compiled middlewares. WebAssembly plugins are better but newer and rougher.

Where it comes from: Yaegi is an interpreter for Go. It’s a remarkable piece of engineering — running real Go code at runtime without compilation — but interpretation has costs. Wasm plugins compile ahead-of-time and are faster, but Wasm-Go interop has its own overhead.

What it costs you: If you put a Yaegi plugin in the hot path of every request, you’ll see latency increase. Authentication via plugin, header munging via plugin, anything that hits every request — bad fit. Webhooks, occasional rewrites, batch processing — fine fit.

When it’s a dealbreaker: When the plugin you wanted to write is the obvious answer to a hot-path problem. You’ll either bear the latency or move that logic into a sidecar.

What people think mitigates it but doesn’t: “We’ll just write a fast Yaegi plugin.” Yaegi has a fixed-cost overhead per Go function call; you can’t optimize past it.

Downside 6: HTTP/2 and HTTP/3 implementation quality lags

The downside: Traefik’s HTTP/2 and especially HTTP/3 implementations are functional but not battle-tested at scale the way nginx’s are. Bug reports trickle in for edge cases that nginx solved years ago.

Where it comes from: Smaller team, fewer eyes, Go’s net/http stack rather than a custom implementation.

What it costs you: Subtle bugs at the protocol level. gRPC over HTTP/2 mostly works but occasionally gets weird with trailers, streaming, and connection pooling. HTTP/3 is opt-in and you should treat it as experimental for now.

When it’s a dealbreaker: When you’re running heavy gRPC traffic with bidirectional streams. Envoy is purpose-built for this. Traefik can do it but you’ll find rough edges.

What people think mitigates it but doesn’t: “We’ll just upgrade Traefik versions when issues come up.” Cycles for these things are slow because the Traefik team has many priorities.

Downside 7: Observability is good, but not great

The downside: Traefik’s metrics, logs, and tracing are usable but not best-in-class.

Where it comes from: Default metrics are coarser than nginx’s stub_status + ngx_http_vts_module. Access logs are JSON but the schema is less flexible than nginx’s log_format. Tracing works but the per-middleware spans are noisy.

What it costs you: When you want to slice traffic by some dimension that isn’t a default label (request size buckets, custom user attributes), you can’t. The “what was Traefik doing during that latency spike” question requires more work than it should.

When it’s a dealbreaker: If your SRE practice depends on rich, multi-dimensional latency analysis at the edge, Envoy’s stat sink and the depth of its observability dwarfs Traefik.

What people think mitigates it but doesn’t: “We’ll add custom labels via a plugin.” See Downside 5 — plugins on the hot path have costs.

Downside 8: Vendor pull toward Traefik Enterprise / Hub

The downside: Several features that are essential at scale exist only in the commercial product. Distributed ACME. Advanced API management. Full WAF integration (Coraza is free, but the full ecosystem is commercial).

Where it comes from: Traefik Labs is a company. They need to monetize. The features that are commercial are the ones that lock-in enterprise users.

What it costs you: A subtle pressure to “graduate” to the commercial product as you scale. The free product remains fully featured for most use cases — Traefik isn’t doing open-core-bait-and-switch — but you’ll hit some walls that have commercial answers.

When it’s a dealbreaker: When you discover one of those walls and your budget can’t fund the commercial license. Now you’re refactoring around the missing capability.

What people think mitigates it but doesn’t: “We’ll just use the OSS one forever.” You can. But the walls are real, particularly in API management and distributed cert ops.

Downside 9: The community is small relative to nginx

The downside: When you Google a problem, Stack Overflow has 10x more nginx answers than Traefik answers.

Where it comes from: nginx has 17 years of head start. Traefik launched in 2016.

What it costs you: Slightly more time per debugging session. Niche use cases sometimes have no community answer at all. The official forum is active but moves slowly. ChatGPT/Claude are less reliable on Traefik than on nginx because their training data is thinner.

When it’s a dealbreaker: Almost never. It’s friction, not a wall.


12. The Taste Test

What good Traefik usage looks like vs. what cargo-culted defaults look like.

Good vs Bad: The traefik.yml static config

Bad — beginner default, dropped here unchanged from the docs:

api:
  insecure: true
providers:
  docker: {}
entryPoints:
  web:
    address: ":80"

This is “I followed the quickstart.” Insecure API exposed. No exposedByDefault: false. No HTTPS. No metrics. No log format. Production this is not.

Good — what an experienced operator writes:

global:
  checkNewVersion: false
  sendAnonymousUsage: false

log:
  level: INFO
  format: json

accessLog:
  format: json
  fields:
    headers:
      defaultMode: drop
      names:
        User-Agent: keep
        Referer: keep
        X-Forwarded-For: keep

api:
  dashboard: true
  insecure: false

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint: { to: websecure, scheme: https, permanent: true }
  websecure:
    address: ":443"
    http:
      tls: { certResolver: le }

providers:
  docker:
    exposedByDefault: false
    network: proxy
  file:
    directory: /etc/traefik/dynamic
    watch: true

certificatesResolvers:
  le:
    acme:
      email: ops@example.com
      storage: /letsencrypt/acme.json
      dnsChallenge:
        provider: cloudflare

metrics:
  prometheus:
    addEntryPointsLabels: true
    addServicesLabels: true

Every choice in the second version is deliberate. JSON logs because they go to a structured log pipeline. Headers filtered because access log volume is real. Default cert resolver on the entryPoint because every router on websecure needs TLS. File provider for shared config. DNS-01 because we want wildcards and HA. Prometheus because metrics aren’t optional.

Good vs Bad: Labels on a containerized app

Bad — every middleware redefined per-service:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.myapp.rule=Host(`myapp.example.com`)"
  - "traefik.http.routers.myapp.entrypoints=websecure"
  - "traefik.http.routers.myapp.tls=true"
  - "traefik.http.middlewares.myapp-headers.headers.frameDeny=true"
  - "traefik.http.middlewares.myapp-headers.headers.contentTypeNosniff=true"
  - "traefik.http.middlewares.myapp-headers.headers.browserXssFilter=true"
  - "traefik.http.middlewares.myapp-headers.headers.stsSeconds=31536000"
  - "traefik.http.middlewares.myapp-rl.ratelimit.average=100"
  - "traefik.http.middlewares.myapp-rl.ratelimit.burst=200"
  - "traefik.http.routers.myapp.middlewares=myapp-headers,myapp-rl"
  - "traefik.http.services.myapp.loadbalancer.server.port=8080"

Twelve labels, half of them defining things that should be shared. Now do this for 30 services and watch the copy-paste drift.

Good — shared middlewares in file provider, app-specific labels minimal:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.myapp.rule=Host(`myapp.example.com`)"
  - "traefik.http.routers.myapp.middlewares=standard-stack@file"
  - "traefik.http.services.myapp.loadbalancer.server.port=8080"

The standard-stack@file is a chain middleware in dynamic/middlewares.yml that bundles security-headers, rate-limit, compression, and access-log enrichment. One source of truth. Per-app labels stay short.

Notice: no entrypoints=websecure because it’s the default-via-the-tls-config-on-the-entryPoint. No tls=true for the same reason. The static config does the right default; the per-app labels stay focused on what’s app-specific.

Good vs Bad: Router rules

Bad:

traefik.http.routers.api.rule=PathPrefix(`/api`)

Routes everything starting with /api, including /api, /apicat, /apidocs, etc. (See Gotcha 6.) Probably not what you wanted.

Good:

traefik.http.routers.api.rule=Host(`api.example.com`) && PathPrefix(`/v1`)

Anchored by host. Path is a specific version. Other API versions can coexist on routers with their own paths.

Bad:

traefik.http.routers.everything.rule=HostRegexp(`{subdomain:[a-z]+}.example.com`)

A catch-all regex router that handles every subdomain. Now your routing is opaque — which subdomain goes where? The backend has to figure it out from the Host header. You’ve shoved L7 logic into your app.

Good:

traefik.http.routers.api.rule=Host(`api.example.com`)
traefik.http.routers.admin.rule=Host(`admin.example.com`)
traefik.http.routers.docs.rule=Host(`docs.example.com`)

Explicit routers per service. The routing decision happens in Traefik, where the dashboard shows it.

Good vs Bad: Middleware ordering

Bad:

middlewares=compress,forward-auth,rate-limit

Compresses the request (no-op), runs forward-auth (might 401), then rate-limits. So a flood of bad-auth requests still hits the auth service, which is expensive. And compression is in the wrong position.

Good:

middlewares=rate-limit,forward-auth,compress

Rate limit first — drop floods before paying for auth. Then auth. Then compression at the end, where it actually compresses the response.

Good vs Bad: TLS configuration

Bad:

Default TLS, accept everything, no opinion. Half your clients on TLS 1.0 with weak ciphers.

Good:

# dynamic/tls.yml
tls:
  options:
    modern:
      minVersion: VersionTLS13
    intermediate:
      minVersion: VersionTLS12
      cipherSuites:
        - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
        - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
        - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
        - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
      curvePreferences: [CurveP256, X25519]

Two named profiles. Routers opt in: tls.options=modern@file for internal stuff, intermediate@file for public-facing endpoints that need broader client support.

Code review red flags

When reviewing Traefik PRs, treat these as “explain this” signals:

  • A middleware defined in service labels rather than the file provider when more than one service uses the same name.
  • Any router rule using HostRegexp that could have been multiple Host rules.
  • priority set to specific numbers without comments explaining why.
  • ACME-related labels on individual services rather than entryPoint-level defaults.
  • tls.options=default (or omitted) on public-facing routers — should be explicit.
  • Any change touching acme.json directly. (You don’t edit acme.json. Ever.)
  • A new EntryPoint without a matching change to security middlewares.

13. Where to Go Deeper

Curated and opinionated. Skip the random Medium articles.

  1. The official Traefik docs at doc.traefik.io/traefik/ — Genuinely good. The “Concepts” and “Routing” sections especially. Read them once cover-to-cover. The reference sections are excellent for “what does this option actually do.” When to read: as the canonical reference once you’ve absorbed this document.

  2. The Traefik 3.0 release blog series at traefik.io/blog/ — Posts on WebAssembly plugins, OpenTelemetry, FastProxy, Gateway API support. These walk through the design rationale, not just the features. When to read: when you’ve used Traefik for a few months and want to understand where it’s heading.

  3. The Traefik GitHub source code, specifically the pkg/server/ and pkg/middlewares/ directories. Don’t be intimidated — Go is readable. Reading the middleware implementations is the fastest way to understand what each one actually does at byte level. When to read: when you’re debugging weird behavior or considering writing a plugin.

  4. “Reverse Proxy Hot Dog Eating Contest” by Tyler Langlois at blog.tjll.net — Honest benchmarking of Caddy vs nginx (and by extension Traefik). The methodology matters more than the numbers; you’ll learn how to think about proxy performance. When to read: when “is Traefik fast enough” becomes a real question.

  5. “The Cloud Native Reverse Proxy Landscape” (search for current versions; the space moves) — Comparative pieces written by working engineers. The DEV.to and Medium piece by ruchira (“Reverse proxies & traefik: how it actually works”) is one of the clearer recent takes. When to read: when you’re deciding between options for a real project.

  6. The Kubernetes Gateway API specification at gateway-api.sigs.k8s.io — Traefik supports it; it’s likely the future of ingress in Kubernetes. Reading the spec gives you the meta-view that lets you see why IngressRoute exists and where it’s heading. When to read: when you’re designing for the next 2-3 years of Kubernetes ingress.

  7. The cert-manager docs at cert-manager.io — If you’re running Traefik in Kubernetes, you need to know cert-manager. The docs are good and the mental model (Issuer, ClusterIssuer, Certificate, Order, Challenge) is worth absorbing. When to read: before deploying production K8s Traefik.

  8. Hands-on project: build a homelab. Run Traefik on a single VPS or Raspberry Pi behind a real domain. Deploy 5-10 services. Use DNS-01 challenge. Add Authelia for SSO. Wire up Prometheus + Grafana. The breadth of issues you encounter teaches you faster than any reading. When to do: now, today.


14. The Final Verdict

Traefik is the best reverse proxy you can pick when you genuinely value a small team’s time and your traffic pattern is “dynamic containers behind a single edge.” It is not the fastest. It is not the most flexible. It is not the simplest. It is the one whose design choices most closely match what cloud-native infrastructure actually feels like to operate, day to day, when the people operating it aren’t dedicated proxy engineers.

What Traefik gets profoundly right: the projection-from-providers model. Treating routing as derived state, recomputed continuously from infrastructure-as-it-is, was a real insight in 2016 and it has aged beautifully. Every other proxy has had to graft this capability on. Traefik was born with it. The result is that adding a service feels like adding a service — labels go on, traffic flows, no proxy-team ticket. That is a fundamentally different developer experience from anything before it, and once you’ve worked with it you find it hard to go back.

What it costs you: the silent-failure tax, the performance ceiling, the ugly label syntax, and an ACME story that breaks the moment you scale past one replica. None of these are dealbreakers in isolation. Combined, they mean that the gap between “Traefik works great” and “Traefik is fighting me” is narrower than the marketing suggests. You will spend more time peering at the dashboard than you would peering at an nginx config. You will swear, more than once, at a label typo that consumed an hour. You will eventually hit a feature that the OSS version doesn’t have, and you will have to decide whether to pay for Enterprise, replace Traefik, or work around it. None of that is hidden — it’s just unromantic.

Who should reach for Traefik: small-to-medium platform teams running Kubernetes or Docker, where the proxy is one of fifty things you own and you can’t afford for it to be a tarpit. Teams whose traffic is internal-or-near-edge, not the public edge of a global service. Teams who want sensible defaults more than they want infinite knobs. Teams who will respect the “use cert-manager, use the file provider for shared config, run one replica or graduate to Enterprise” guidance and not try to bend the tool past its design.

Who should not: teams running serious public edge traffic where every millisecond matters — go nginx or HAProxy. Teams building a service mesh — go Envoy/Istio/Linkerd. Teams whose entire infrastructure is static and rarely changes — nginx is simpler. Teams who need vendor-grade support and SLAs and aren’t willing to pay Traefik Labs — pick a cloud-managed gateway (AWS ALB, GCP HTTPS LB, Cloudflare). Teams whose use case is primarily gRPC with heavy streaming — Envoy will be less painful.

What you should now believe: believe that Traefik is genuinely a category-leading product for its sweet spot, not a marketing fiction. Don’t believe that “modern means better” — nginx is older and still better at several things. When you hear someone say “Traefik is slow” or “Traefik is the future of ingress,” they’re usually overgeneralizing from a slice; the truth is sub-categorical. Traefik is excellent at exactly what it’s good at, mediocre at things adjacent to that, and bad at things that are someone else’s specialty.

The hard-won line: Traefik works best when you let it be the simple thing. The moment you start fighting its model — wrestling labels into shapes they don’t want, scaling ACME past where it scales, building elaborate plugins for hot-path logic — you’re using it wrong, and the right move is either to bend back toward its grain or to admit you need a different tool. Most of the operators I respect who have run Traefik for years have arrived at the same posture: Traefik for the bulk of the inside-the-cluster routing, paired with something else (nginx, a cloud LB, Envoy) for the parts where Traefik isn’t the right fit. That’s not a failure of Traefik. It’s the mature answer.


The ideas are mine. The writing is AI assisted