docs: improve gitea-runner CI debug workflow with ripgrep and failure markers

- Added ripgrep (rg) usage notes and preference over grep for targeted searches
- Updated log discovery examples to use rg with --fixed-strings for workflow/job name searches
- Added failure-marker search pattern (|Traceback|FAILED|etc.) for quick issue identification
- Replaced grep with rg in runner health checks (dmesg, journalctl)
- Added failure marker search to quick-start one-liner
- Preserved awk usage for tab
This commit is contained in:
aitbc
2026-04-20 13:21:19 +02:00
parent 3d4300924e
commit 92f175c54f
2 changed files with 17 additions and 5 deletions

1
.gitignore vendored
View File

@@ -46,6 +46,7 @@ htmlcov/
*.db-shm *.db-shm
data/ data/
apps/blockchain-node/data/ apps/blockchain-node/data/
!apps/coordinator-api/src/app/data/
# =================== # ===================
# Runtime Directories (System Standard) # Runtime Directories (System Standard)

View File

@@ -20,6 +20,7 @@ Use this workflow when a Gitea Actions job fails and you need Windsurf to:
- Prefer `GITHUB_RUN_ID` and `GITHUB_RUN_NUMBER`, not `GITEA_RUN_ID` - Prefer `GITHUB_RUN_ID` and `GITHUB_RUN_NUMBER`, not `GITEA_RUN_ID`
- Internal runner `task <id>` messages in `journalctl` are useful for runner debugging, but are not stable workflow-facing identifiers - Internal runner `task <id>` messages in `journalctl` are useful for runner debugging, but are not stable workflow-facing identifiers
- CI job logs created by the reusable logging wrapper live under `/opt/gitea-runner/logs` - CI job logs created by the reusable logging wrapper live under `/opt/gitea-runner/logs`
- `rg` is installed on `gitea-runner`; prefer it over `grep` for targeted log discovery and failure-marker searches
## Safety Rules ## Safety Rules
- Start with read-only inspection only - Start with read-only inspection only
@@ -58,10 +59,12 @@ If you know the workflow or job name, start there.
```bash ```bash
ssh gitea-runner 'ls -lah /opt/gitea-runner/logs' ssh gitea-runner 'ls -lah /opt/gitea-runner/logs'
ssh gitea-runner 'tail -n 20 /opt/gitea-runner/logs/index.tsv' ssh gitea-runner 'tail -n 20 /opt/gitea-runner/logs/index.tsv'
ssh gitea-runner 'rg -n --fixed-strings "Production Tests" /opt/gitea-runner/logs/index.tsv | tail -n 20'
ssh gitea-runner 'rg -n --fixed-strings "test-production" /opt/gitea-runner/logs/index.tsv | tail -n 20'
ssh gitea-runner 'tail -n 200 /opt/gitea-runner/logs/latest.log' ssh gitea-runner 'tail -n 200 /opt/gitea-runner/logs/latest.log'
``` ```
If you know the run id: If you know the run id, keep using `awk` because `index.tsv` is tab-separated and you want an exact column match:
```bash ```bash
ssh gitea-runner "awk -F '\t' '\$2 == \"1787\" {print}' /opt/gitea-runner/logs/index.tsv" ssh gitea-runner "awk -F '\t' '\$2 == \"1787\" {print}' /opt/gitea-runner/logs/index.tsv"
@@ -70,8 +73,8 @@ ssh gitea-runner "awk -F '\t' '\$2 == \"1787\" {print}' /opt/gitea-runner/logs/i
If you know the workflow/job name: If you know the workflow/job name:
```bash ```bash
ssh gitea-runner 'grep -i "staking tests" /opt/gitea-runner/logs/index.tsv | tail -n 20' ssh gitea-runner 'rg -n -i --fixed-strings "staking tests" /opt/gitea-runner/logs/index.tsv | tail -n 20'
ssh gitea-runner 'grep -i "test-staking-service" /opt/gitea-runner/logs/index.tsv | tail -n 20' ssh gitea-runner 'rg -n -i --fixed-strings "test-staking-service" /opt/gitea-runner/logs/index.tsv | tail -n 20'
``` ```
### Step 3: Read the Most Relevant Job Log ### Step 3: Read the Most Relevant Job Log
@@ -87,6 +90,12 @@ If `latest.log` already matches the failing run:
ssh gitea-runner 'tail -n 200 /opt/gitea-runner/logs/latest.log' ssh gitea-runner 'tail -n 200 /opt/gitea-runner/logs/latest.log'
``` ```
For a fast failure-marker pass inside a resolved log file:
```bash
ssh gitea-runner 'rg -n "❌|Traceback|FAILED|FAILURES|ModuleNotFoundError|AssertionError|not ready|oom|Killed" /opt/gitea-runner/logs/<resolved-log-file>.log'
```
### Step 4: Correlate With Runner Health ### Step 4: Correlate With Runner Health
Only do this after reading the job log, so you do not confuse test failures with runner failures. Only do this after reading the job log, so you do not confuse test failures with runner failures.
@@ -101,8 +110,8 @@ Use these when the log suggests abrupt termination, hanging setup, missing conta
```bash ```bash
ssh gitea-runner 'free -h; df -h /opt /var /tmp' ssh gitea-runner 'free -h; df -h /opt /var /tmp'
ssh gitea-runner 'dmesg -T | grep -i -E "oom|out of memory|killed process" | tail -n 50' ssh gitea-runner 'dmesg -T | rg -i "oom|out of memory|killed process" | tail -n 50'
ssh gitea-runner 'journalctl -u gitea-runner --since "2 hours ago" --no-pager | grep -i -E "oom|killed|failed|panic|error"' ssh gitea-runner 'journalctl -u gitea-runner --since "2 hours ago" --no-pager | rg -i "oom|killed|failed|panic|error"'
``` ```
### Step 6: Classify the Failure ### Step 6: Classify the Failure
@@ -208,6 +217,8 @@ ssh gitea-runner '
tail -n 10 /opt/gitea-runner/logs/index.tsv 2>/dev/null || true; tail -n 10 /opt/gitea-runner/logs/index.tsv 2>/dev/null || true;
echo "=== latest job log ==="; echo "=== latest job log ===";
tail -n 120 /opt/gitea-runner/logs/latest.log 2>/dev/null || true; tail -n 120 /opt/gitea-runner/logs/latest.log 2>/dev/null || true;
echo "=== latest job markers ===";
rg -n "❌|Traceback|FAILED|FAILURES|ModuleNotFoundError|AssertionError|not ready|oom|Killed" /opt/gitea-runner/logs/latest.log 2>/dev/null | tail -n 40 || true;
echo "=== runner journal ==="; echo "=== runner journal ===";
journalctl -u gitea-runner -n 80 --no-pager || true journalctl -u gitea-runner -n 80 --no-pager || true
' '