Files
aitbc/.windsurf/skills/gitea-runner-log-debugger.md
aitbc 3df724d9fc docs: add gitea-runner SSH-based CI log debugging skill and workflow
Added comprehensive documentation for autonomous investigation of failed Gitea Actions runs via SSH access to gitea-runner host. Includes log location mapping, classification heuristics for distinguishing workflow/dependency/application/service/infrastructure failures, and evidence-based debug suggestion templates. Provides read-only investigation sequences with safety constraints to prevent conflating application failures with runner inst
2026-04-20 12:05:31 +02:00

6.2 KiB

description, title, version
description title version
Autonomous skill for SSH-based investigation of gitea-runner CI logs, runner health, and root-cause-oriented debug guidance Gitea Runner Log Debugger 1.0

Gitea Runner Log Debugger Skill

Purpose

Use this skill to diagnose failed Gitea Actions runs by connecting to gitea-runner, reading CI log files, correlating them with runner health, and producing targeted debug suggestions.

Activation

Activate this skill when:

  • a Gitea workflow fails and the UI log is incomplete or inconvenient
  • Windsurf needs direct access to runner-side CI logs
  • you need to distinguish workflow failures from runner failures
  • you need evidence-backed debug suggestions instead of generic guesses
  • a job appears to fail because of OOM, restart loops, path mismatches, or missing dependencies

Known Environment Facts

  • Runner host: ssh gitea-runner
  • Runner service: gitea-runner.service
  • Runner binary: /opt/gitea-runner/act_runner
  • Persistent CI logs: /opt/gitea-runner/logs
  • Indexed log manifest: /opt/gitea-runner/logs/index.tsv
  • Latest log symlink: /opt/gitea-runner/logs/latest.log
  • Gitea Actions on this runner exposes GitHub-compatible runtime variables, so GITHUB_RUN_ID is the correct run identifier to prefer over GITEA_RUN_ID

Inputs

Minimum Input

  • failing workflow name, job name, or pasted error output

Best Input

{
  "workflow_name": "Staking Tests",
  "job_name": "test-staking-service",
  "run_id": "1787",
  "symptoms": [
    "ModuleNotFoundError: No module named click"
  ],
  "needs_runner_health_check": true
}

Expected Outputs

{
  "failure_class": "workflow_config | dependency_packaging | application_test | service_readiness | runner_infrastructure | unknown",
  "root_cause": "string",
  "evidence": ["string"],
  "minimal_fix": "string",
  "follow_up_checks": ["string"],
  "confidence": "low | medium | high"
}

Investigation Sequence

1. Connect and Verify Runner

ssh gitea-runner 'hostname; whoami; systemctl is-active gitea-runner'

2. Locate Relevant CI Logs

Prefer indexed job logs first.

ssh gitea-runner 'tail -n 20 /opt/gitea-runner/logs/index.tsv'
ssh gitea-runner 'tail -n 200 /opt/gitea-runner/logs/latest.log'

If a run id is known:

ssh gitea-runner "awk -F '\t' '\$2 == \"1787\" {print}' /opt/gitea-runner/logs/index.tsv"

If only workflow/job names are known:

ssh gitea-runner 'grep -i "production tests" /opt/gitea-runner/logs/index.tsv | tail -n 20'
ssh gitea-runner 'grep -i "test-production" /opt/gitea-runner/logs/index.tsv | tail -n 20'

3. Read the Job Log Before the Runner Log

ssh gitea-runner 'tail -n 200 /opt/gitea-runner/logs/<resolved-log>.log'

4. Correlate With Runner State

ssh gitea-runner 'systemctl status gitea-runner --no-pager'
ssh gitea-runner 'journalctl -u gitea-runner -n 200 --no-pager'
ssh gitea-runner 'tail -n 200 /opt/gitea-runner/runner.log'

5. Check for Resource Exhaustion Only if Indicated

ssh gitea-runner 'free -h; df -h /opt /var /tmp'
ssh gitea-runner 'dmesg -T | grep -i -E "oom|out of memory|killed process" | tail -n 50'

Classification Rules

Workflow Config Failure

Evidence patterns:

  • script path not found
  • wrong repo path
  • wrong service/unit name
  • wrong import target or startup command
  • missing environment export

Default recommendation:

  • patch the workflow with the smallest targeted fix

Dependency / Packaging Failure

Evidence patterns:

  • ModuleNotFoundError
  • ImportError
  • failed editable install
  • Poetry package discovery failure
  • missing pip/Node dependency in lean CI setup

Default recommendation:

  • add only the missing dependency when truly required
  • otherwise fix the import chain or packaging metadata root cause

Application / Test Failure

Evidence patterns:

  • normal environment setup completes
  • tests collect and run
  • failure is an assertion or application traceback

Default recommendation:

  • patch code or tests, not the runner

Service Readiness Failure

Evidence patterns:

  • health endpoint timeout
  • process exits immediately
  • server log shows startup/config exception

Default recommendation:

  • inspect service startup logs and verify host/path/port assumptions

Runner / Infrastructure Failure

Evidence patterns:

  • oom-kill in journalctl
  • runner daemon restart loop
  • truncated logs across unrelated workflows
  • disk exhaustion or temp space errors

Default recommendation:

  • treat as runner capacity/stability issue only when evidence is direct

Decision Heuristics

  • Prefer the job log over journalctl for code/workflow failures
  • Prefer the smallest fix that explains all evidence
  • Do not suggest restarting the runner unless the user asks or the runner is clearly unhealthy
  • Ignore internal task <id> values for workflow naming or file lookup
  • If /opt/gitea-runner/logs is missing a run, check whether the workflow had the logging initializer at that time

Debug Suggestion Template

When reporting back, use this structure:

Failure Class

<workflow_config | dependency_packaging | application_test | service_readiness | runner_infrastructure | unknown>

Root Cause

One sentence describing the most likely issue.

Evidence

  • <specific log line>
  • <specific log line>
  • <runner health correlation if relevant>

Minimal Fix

One focused change that addresses the root cause.

Optional Follow-up

  • <verification step>
  • <secondary diagnostic if needed>

Confidence

low | medium | high

Safety Constraints

  • Read-only first
  • No service restarts without explicit user approval
  • No deletion of runner files during diagnosis
  • Do not conflate application tracebacks with runner instability

Fast First-Pass Bundle

ssh gitea-runner '
  echo "=== latest runs ===";
  tail -n 10 /opt/gitea-runner/logs/index.tsv 2>/dev/null || true;
  echo "=== latest log ===";
  tail -n 120 /opt/gitea-runner/logs/latest.log 2>/dev/null || true;
  echo "=== runner service ===";
  systemctl status gitea-runner --no-pager | tail -n 40 || true;
  echo "=== runner journal ===";
  journalctl -u gitea-runner -n 80 --no-pager || true
'
  • .windsurf/workflows/gitea-runner-ci-debug.md
  • scripts/ci/setup-job-logging.sh