tailscale-operator¶
Metadata¶
| Field | Value |
|---|---|
| Service | tailscale-operator |
| Purpose | Expose ozilab through Tailscale and Tailscale Funnel without a public IP or inbound NAT |
| Criticality | Tier 1 |
| Owner | Platform / Networking owner |
| Clusters | ozilab |
| Namespace | tailscale |
| Exposure | VPN and internet via Funnel |
| Stateful | no |
| Backup class | none |
| RPO / RTO | Git-backed config, 15 to 30 minutes to restore |
| Last reviewed | 2026-04-22 |
1. Service Overview¶
Tailscale Operator replaces cloudflared for the ozilab cluster edge path. It gives the cluster an outbound-only connectivity model that works in the corporate environment, then exposes Traefik through a Tailscale Funnel ingress.
Summary¶
Without this service, ozilab cannot publish HTTP services from inside the corporate network without reopening the Cloudflare Tunnel path that is currently blocked by the firewall.
Dependencies¶
| Dependency | Type | Why it matters |
|---|---|---|
| traefik | ingress | Shared L7 router and TLS endpoint for cluster workloads |
| Tailscale control plane | control plane | Authenticates the operator and provisions ingress proxies |
| Cloudflare DNS | DNS | Keeps the public domain and points it at the generated Funnel hostname |
| Tailnet ACL policy | identity | Grants tag ownership, Funnel permission, and OAuth client scope |
2. Architecture Diagram¶
[Client]
-> [Cloudflare DNS / proxy]
-> [ozilab-edge.<tailnet>.ts.net]
-> [Tailscale Funnel ingress proxy]
-> [Service traefik:443]
-> [Traefik routes by Host header]
-> [Cluster services]
[Tailscale operator]
-> [Tailscale control plane]
-> [OAuth client and tagged proxy devices]
3. Deployment Specifications¶
| Item | Value |
|---|---|
| Source path | tailscale-operator |
| Deployment model | Fleet native Helm chart with Kustomize post-render |
| Namespace | tailscale |
| Workload kind | Deployment plus Tailscale-managed ingress proxy StatefulSet |
| Chart or image version | tailscale-operator chart 1.96.5, appVersion v1.96.5 |
| Config files | fleet.yaml, overlays/ozilab/kustomization.yaml, overlays/ozilab/values.yaml, overlays/ozilab/.operator-oauth.env.example, overlays/ozilab/traefik-funnel-ingress.yaml |
Cluster mapping¶
| Cluster | Overlay path | Notes |
|---|---|---|
| ozilab | tailscale-operator/overlays/ozilab | Replaces cloudflared as the edge transport for Traefik |
4. Configuration Guide¶
Environment variables¶
Fleet renders the external Tailscale Helm chart directly from tailscale-operator/fleet.yaml, then runs the ozilab Kustomize overlay as a post-render step to add the namespace, bootstrap Secret, and Funnel ingress.
The Helm chart mounts a precreated Secret named operator-oauth into the operator pod. The Secret is generated locally by Kustomize from a non-committed env file.
| Variable | Source | Purpose | Secret? |
|---|---|---|---|
| client_id | tailscale-operator/overlays/ozilab/.operator-oauth.env via secretGenerator to Secret operator-oauth | OAuth client identity for the operator | yes |
| client_secret | tailscale-operator/overlays/ozilab/.operator-oauth.env via secretGenerator to Secret operator-oauth | OAuth client secret for the operator | yes |
| operatorConfig.hostname | tailscale-operator/overlays/ozilab/values.yaml | Device hostname registered in the tailnet | no |
| proxyConfig.defaultTags | tailscale-operator/overlays/ozilab/values.yaml | Default tag for ingress proxies and Funnel devices | no |
ConfigMaps¶
The operator creates its own runtime configuration artifacts for proxies inside the tailscale namespace. They are not stored as static manifests in this repository.
Secrets management¶
- Secret names: operator-oauth for bootstrap credentials, plus Tailscale-managed runtime secrets for generated proxy pods
- Source of truth: local tailscale-operator/overlays/ozilab/.operator-oauth.env file rendered by secretGenerator into Secret operator-oauth, populated from a Tailscale OAuth client created in the admin console
- Rotation trigger: OAuth client rotation, tailnet policy changes, or compromise response
- Recovery note: replacing the OAuth client requires rotating the client in Tailscale, updating the local .operator-oauth.env file, and reconciling the overlay so the operator pod remounts fresh credentials
Tailnet prerequisites¶
Before rollout, the tailnet policy must provide:
- tagOwners for tag:k8s-operator and tag:k8s
- an OAuth client scoped with Devices Core, Auth Keys, and Services write permissions
- a nodeAttrs rule granting the funnel attribute to tag:k8s or whatever proxy tag you choose
5. Access Protocols¶
| Path | URL or endpoint | Audience | Auth | TLS terminates at |
|---|---|---|---|---|
| Internal | traefik.traefik.svc.cluster.local:443 | Cluster workloads and the Funnel proxy | Traefik middleware chain | Traefik |
| Tailnet | https://ozilab-edge. |
Tailnet members | Tailnet ACLs | Tailscale ingress |
| Public | Cloudflare-hosted domain proxied to the Funnel hostname | Internet users | Cloudflare plus app-specific auth at Traefik | Cloudflare edge then Tailscale ingress |
Cloudflare integration¶
The Kubernetes manifests only create the Tailscale side. To keep using your Cloudflare-hosted domain, point the public hostname at the generated Funnel hostname and keep the public host name distinct from the upstream TLS name used between Cloudflare and Tailscale.
Recommended pattern:
- Create a proxied CNAME from the public hostname to ozilab-edge.
.ts.net. - Keep Cloudflare SSL mode on Full (strict) so the edge validates the origin certificate.
- Keep the HTTP Host header sent upstream as the public hostname so Traefik can continue routing by Host or HostRegexp.
- If your Cloudflare plan exposes Origin Rules SNI override, set only the upstream SNI to ozilab-edge.
.ts.net. - Do not override the upstream Host header to the ts.net hostname unless you intentionally want Traefik to route on that ts.net host.
Precise flow for a public hostname such as app.example.com:
| Hop | Value to expect |
|---|---|
| Browser URL | https://app.example.com |
| Browser to Cloudflare Host header | app.example.com |
| Cloudflare proxied DNS target | ozilab-edge. |
| Cloudflare to origin TCP destination | ozilab-edge. |
| Cloudflare to origin TLS SNI | ozilab-edge. |
| Cloudflare to origin HTTP Host header | app.example.com |
| Certificate presented by Tailscale Funnel | ozilab-edge. |
| Host seen by Traefik for routing | app.example.com |
Failure modes to avoid:
- If Cloudflare sends SNI for the public hostname instead of the ts.net hostname, you can get a 526 because the Tailscale certificate only covers the ts.net hostname.
- If Cloudflare overrides the HTTP Host header to the ts.net hostname, Traefik will route on the wrong host and your existing IngressRoute matches may fall through to the default router.
- Cloudflare documents that a Host header override also rewrites SNI unless you set a separate SNI override, so do not use a Host header override as a shortcut here.
If your Cloudflare plan does not support Origin Rules SNI override, start with the proxied CNAME only and validate the cutover end to end. If the public hostname returns 526, you need either a plan or feature path that lets you override SNI, or a different edge pattern for that hostname.
6. Operations and Observability¶
- Primary health indicators: operator deployment available, Funnel ingress has a hostname in status, and direct access to the ts.net hostname succeeds
- Dashboards or alerts: operator and proxy pod logs, plus Tailscale admin console machine state
- Log locations: kubectl logs in the tailscale namespace for the operator and generated proxy pods
- Known failure modes: missing operator-oauth Secret, bad OAuth client scopes, missing Funnel nodeAttrs, namespace PSA blocking privileged pods, wrong Cloudflare SNI to the generated ts.net hostname, or wrong Cloudflare Host header reaching Traefik
7. Backup and Recovery Notes¶
- Backup method: Git plus Tailscale admin console credential recreation
- Restore prerequisites: working Tailscale tailnet, OAuth client, tag policy, and the traefik service in the ozilab cluster
- Related runbook: ../runbooks/tailscale-operator.md
8. Release and Change Notes¶
- Current deployed app version: Tailscale operator v1.96.5
- Current chart version: 1.96.5
- Last significant change: ozilab bundle switched from Kustomize helmCharts inflation to Fleet-native Helm rendering so Fleet can reconcile the chart without requiring unsupported --enable-helm behavior
- Rollback reference: restore cloudflared to fleet/layer7/gitrepo-ozilab.yaml and remove the tailscale-operator path