2026-03-159 min

Building an Autonomous Coding Agent with OpenClaw: PR Reviews, Tests, Bug Fixes

Coding AgentAutomationGitHubCI/CDOpenClaw

Why a Dedicated Coding Agent?

When you start working with AI agents, the first temptation is to let the team lead agent do everything. Sam writes the blog, reviews code, replies to emails, and manages the CI/CD pipeline on the side.

That works — until context explodes. An agent thinking simultaneously about marketing strategy and TypeScript race conditions does both things mediocrely.

The solution: specialization. Peter is our coding agent. He does nothing but code work. No blog, no emails, no calendar. Just:

Pull request reviews

Test generation for untested code

Bug diagnosis from log output

Refactoring suggestions

Dependency update checks

That sounds like a narrow scope — and that's exactly the point. A narrow scope means deep contextual knowledge and consistent quality.

---

The Skill Setup: coding-agent

Peter runs with the `coding-agent` skill that ships with OpenClaw. This skill gives the agent access to:

Shell execution (Bash, Bun, Node, Git)

File operations (read, write, create, delete)

Code analysis (static analysis tools, linter output)

Git operations (status, diff, blame, log)

Installation and activation:

```bash

# Skill is already included in OpenClaw

ls ~/.npm-global/lib/node_modules/openclaw/skills/coding-agent/

# Activate in the coding agent's workspace:

# In Peter's SOUL.md / AGENTS.md:

# "Skill: coding-agent is active. Use it for all code operations."

# Alternatively: specify explicitly in OpenClaw config

openclaw config set skills.enabled "coding-agent"

```

The `coding-agent` skill differs from other skills in that it can actively execute code — not just read it. That makes it powerful, but also security-relevant. More on that below.

---

Peter's SOUL.md: Precision Above All

Peter's SOUL.md is the shortest of our 6 agents — but the most concrete. Every line is operational:

```markdown

# SOUL.md — Peter (Coding Agent)

Identity

You are Peter, coding specialist at Humanizing Technologies.

Your only job: code quality. No marketing, no emails, no CEO support.

You are not a co-pilot — you are an independent reviewer with your own judgment.

Communication

Short and technical. No "I would suggest..." — just "Problem: X. Fix: Y."

Findings as a list, prioritized by severity: CRITICAL → HIGH → MEDIUM → LOW.

Code examples always in code blocks, with language tag.

Dev Rules (absolute, no exceptions)

Package manager: always bun, never npm/yarn/pnpm

Commits: Conventional Commits format (feat:, fix:, chore:)

Tests: Bun Test, not Jest, not Vitest

TypeScript: strict mode always on

No console.log in production code — always use Logger

What You CAN Do

Read and analyze code

Run git diff, git log, git blame

Run tests (bun test)

Run linters (biome, eslint)

Create new files

Modify existing files

What You MUST NOT Do (absolute)

git push — never push directly

Merges or rebases to main/dev

Trigger production deployments

Write API keys or secrets into code

npm install (bun install only)

Modify files outside the project directory

```

The last section — "What You MUST NOT Do" — is critical. Without these boundaries, Peter could accidentally push directly to main. That's not a hypothetical scenario.

---

GitHub Integration: Triggering PR Reviews Automatically

The most interesting use case: Peter reviews pull requests as soon as they're created — without any manual trigger.

Approach 1: Cron-based polling (simple)

Peter's cron job checks every 15 minutes for new PRs on GitHub:

```

Schedule: */15 * * * *

Prompt:

Check for new pull requests in the repository humanizing/humanizing-agents-monorepo

created in the last 15 minutes that don't yet have a review from Peter.

For each such PR:

1. Fetch the diff: git fetch origin pull/<PR-ID>/head:pr-<PR-ID>

2. Analyze the code for:

- Security issues (SQL injection, XSS, unsafe dependencies)

- Missing error handling

- TypeScript type errors or 'any' usage

- Missing or insufficient tests

- Violations of our dev rules (npm instead of bun, console.log, etc.)

3. Post the review as a GitHub comment with prioritization (CRITICAL/HIGH/MEDIUM/LOW)

4. If CRITICAL issues: Request Changes. Otherwise: Comment.

5. Write the review result to status/pr-<PR-ID>.md

If no new PRs: HEARTBEAT_OK

```

Approach 2: Webhook-based (reactive, more precise)

For faster response we use GitHub webhooks. A webhook fires on PR events and sends a message to Peter's Discord channel:

```bash

# GitHub Repository → Settings → Webhooks → Add webhook

# Payload URL: https://your-webhook-relay.example.com/github

# Content type: application/json

# Events: Pull requests

```

The webhook relay (a simple Express server) transforms the GitHub event into a Discord message:

```javascript

// webhook-relay/index.js

app.post('/github', (req, res) => {

const event = req.headers['x-github-event'];

const payload = req.body;

if (event === 'pull_request' && payload.action === 'opened') {

const pr = payload.pull_request;

// Discord message to Peter's channel

sendDiscord({

channel: 'dev-code-review',

content: `@Peter New PR: "${pr.title}" from ${pr.user.login}

PR #${pr.number}: ${pr.html_url}

Branch: ${pr.head.ref} → ${pr.base.ref}

Please review.`

});

}

res.status(200).send('ok');

});

```

Peter sees the Discord message, reads the PR, and reviews it.

Security note: The webhook relay needs a public URL — or an internal relay via Tailscale. We run ours on a dedicated Hetzner Nano server (€2/month) behind nginx.

---

Test Generation: The Use Case That Saves the Most Time

The most useful workflow for us isn't the PR review — it's automatic test generation.

In our monorepo, tests are frequently missing for new utility functions. Peter's job: if a PR contains new files without corresponding test files, he writes the tests.

```

Prompt (as extension of the PR review prompt):

If the PR contains new TypeScript files that don't have a test file yet

(convention: *.test.ts next to the source file):

1. Analyze the exported functions/classes

2. Write Bun tests for the most important paths:

- Happy path (normal input, expected output)

- Edge cases (empty input, null/undefined, boundary values)

- Error scenarios (invalid input, missing dependencies)

3. Create the test file directly in the source file's directory

4. Run the tests: bun test <path>

5. If tests fail: analyze the error and fix the tests

6. Post the test file as a suggestion in the PR comment

```

The result: PR authors get not only code feedback but also a test suggestion. In practice, ~70% of these tests are adopted unchanged.

Time saved per PR: ~45 minutes of developer time (writing tests + iterating).

---

Bug Diagnosis: Log Analysis on Demand

Another Peter workflow: bug reports from logs.

When a staging server error occurs, Dimitrios sends a message to Peter's Discord channel:

```

Dimitrios: "@Peter Error on staging since 14:30: [paste stacktrace]"

```

Peter:

```

1. Analyze the stacktrace: which file, which line?

2. git blame the affected line: who changed this last?

3. git log -20 --oneline: which PRs were merged today?

4. Correlate: which PR might have caused the bug?

5. Form hypothesis: "Likely cause: PR #182 (merged 14:12)

removed error handling from getUserById."

6. Fix suggestion as a code snippet

7. Create ClickUp task: "Bug TC-XX: [description], caused by PR #182"

```

In practice this takes 3-8 minutes instead of 20-40 minutes manually.

---

Security: What Peter Can Actually Do

This is the most critical aspect of a coding agent. Peter has shell access and can execute code — that's by design. But without boundaries, that's dangerous.

Container Isolation as a Hard Boundary

Peter runs in a Docker container. His volume mounts only the project directory:

```yaml

# docker-compose.yml

services:

peter:

image: openclaw/openclaw:latest

volumes:

- ./workspaces/peter:/workspace:rw

- /home/sam/projects/humanizing-agents-monorepo:/project:rw

# No access to other projects, no /etc, /home, etc.

environment:

- GITHUB_TOKEN=${GITHUB_TOKEN_PETER}

# Peter has his own, restricted GitHub token (read + PR comment only)

```

Peter's GitHub token has only these permissions:

`repo:read` — read code

`pull_requests:write` — write PR comments

`issues:write` — comment on issues

He cannot push, cannot merge, cannot delete branches. Even if someone manipulates Peter's prompt — the token doesn't have those permissions.

The "No Push" Principle

Peter's SOUL.md says "git push — never push directly." But instructions can be circumvented.

Our technical safeguard: Git hooks in the project repository:

```bash

# .git/hooks/pre-push

#!/bin/bash

# Check if the pusher is Peter

if [ "$GIT_AUTHOR_NAME" = "Peter-CodingAgent" ]; then

echo "BLOCKED: Coding agent is not allowed to push directly"

exit 1

exit 0

```

Two layers of protection: instruction (SOUL.md) + technical block (Git hook). Both have to fail simultaneously for a direct push to happen.

---

Peter's Daily Workflow in Practice

To illustrate: what Peter actually does in a typical day for us.

9:00 AM — first cron check:

Three open PRs since yesterday evening. Peter reviews all three. PR #201: no CRITICAL, two HIGH (missing input validation). PR #202: CRITICAL (SQL string concatenation without escaping). PR #203: tests missing — Peter writes them.

11:00 AM — message from Dimitrios:

"Peter, staging is throwing 500s since the deploy." Peter analyzes the logs, identifies the commit, posts a fix suggestion within 5 minutes.

2:00 PM — dependency cron:

Peter runs `bun outdated`, compares against changelogs, identifies a breaking change in the next major version of a core library. Creates a ClickUp task with migration notes.

5:00 PM — last PR of the day:

A junior developer used three `any` types in TypeScript. Peter comments with an explained type fix suggestion.

Total human code review time for the team that day: ~30 minutes (for final decisions only). Peter's contribution: ~3 hours equivalent review work.

---

What Doesn't Work (and Why)

Honesty: a coding agent isn't a developer replacement.

Peter is bad at:

Architecture decisions ("should we use REST or GraphQL?")

Context about months-old technical debt

Ambiguous requirements ("make this better")

Cross-repository reasoning (doesn't understand how two repos relate)

Peter is good at:

Clear code quality checks (pattern recognition, rule application)

Writing missing tests (function signature is given)

Log analysis (stacktraces are structured)

Dependency checks (external information, structured task)

The rule: the more structured the task, the better Peter performs. The more context and judgment required, the more a human needs to be involved.

---

The Complete Setup

The complete configuration for Peter — Docker Compose, SOUL.md, GitHub token setup with minimal permissions, the exact cron job prompts, and the Git hook safeguard — is documented in the OpenClaw Setup Playbook.

The playbook doesn't just show the setup — it shows the iteration: how we refined Peter's prompts over weeks, which review comments were too vague, and how we improved them.

18 chapters, based on real production experience.

Fully available in German too. 🇩🇪

Want to learn more?

Our playbook contains 18 detailed chapters — available in English and German.

Get the Playbook