Building an Autonomous Coding Agent with OpenClaw: PR Reviews, Tests, Bug Fixes
Why a Dedicated Coding Agent?
When you start working with AI agents, the first temptation is to let the team lead agent do everything. Sam writes the blog, reviews code, replies to emails, and manages the CI/CD pipeline on the side.
That works — until context explodes. An agent thinking simultaneously about marketing strategy and TypeScript race conditions does both things mediocrely.
The solution: specialization. Peter is our coding agent. He does nothing but code work. No blog, no emails, no calendar. Just:
That sounds like a narrow scope — and that's exactly the point. A narrow scope means deep contextual knowledge and consistent quality.
---
The Skill Setup: coding-agent
Peter runs with the `coding-agent` skill that ships with OpenClaw. This skill gives the agent access to:
Installation and activation:
```bash
# Skill is already included in OpenClaw
ls ~/.npm-global/lib/node_modules/openclaw/skills/coding-agent/
# Activate in the coding agent's workspace:
# In Peter's SOUL.md / AGENTS.md:
# "Skill: coding-agent is active. Use it for all code operations."
# Alternatively: specify explicitly in OpenClaw config
openclaw config set skills.enabled "coding-agent"
```
The `coding-agent` skill differs from other skills in that it can actively execute code — not just read it. That makes it powerful, but also security-relevant. More on that below.
---
Peter's SOUL.md: Precision Above All
Peter's SOUL.md is the shortest of our 6 agents — but the most concrete. Every line is operational:
```markdown
# SOUL.md — Peter (Coding Agent)
Identity
You are Peter, coding specialist at Humanizing Technologies.
Your only job: code quality. No marketing, no emails, no CEO support.
You are not a co-pilot — you are an independent reviewer with your own judgment.
Communication
Short and technical. No "I would suggest..." — just "Problem: X. Fix: Y."
Findings as a list, prioritized by severity: CRITICAL → HIGH → MEDIUM → LOW.
Code examples always in code blocks, with language tag.
Dev Rules (absolute, no exceptions)
What You CAN Do
What You MUST NOT Do (absolute)
```
The last section — "What You MUST NOT Do" — is critical. Without these boundaries, Peter could accidentally push directly to main. That's not a hypothetical scenario.
---
GitHub Integration: Triggering PR Reviews Automatically
The most interesting use case: Peter reviews pull requests as soon as they're created — without any manual trigger.
Approach 1: Cron-based polling (simple)
Peter's cron job checks every 15 minutes for new PRs on GitHub:
```
Schedule: */15 * * * *
Prompt:
Check for new pull requests in the repository humanizing/humanizing-agents-monorepo
created in the last 15 minutes that don't yet have a review from Peter.
For each such PR:
1. Fetch the diff: git fetch origin pull/<PR-ID>/head:pr-<PR-ID>
2. Analyze the code for:
- Security issues (SQL injection, XSS, unsafe dependencies)
- Missing error handling
- TypeScript type errors or 'any' usage
- Missing or insufficient tests
- Violations of our dev rules (npm instead of bun, console.log, etc.)
3. Post the review as a GitHub comment with prioritization (CRITICAL/HIGH/MEDIUM/LOW)
4. If CRITICAL issues: Request Changes. Otherwise: Comment.
5. Write the review result to status/pr-<PR-ID>.md
If no new PRs: HEARTBEAT_OK
```
Approach 2: Webhook-based (reactive, more precise)
For faster response we use GitHub webhooks. A webhook fires on PR events and sends a message to Peter's Discord channel:
```bash
# GitHub Repository → Settings → Webhooks → Add webhook
# Payload URL: https://your-webhook-relay.example.com/github
# Content type: application/json
# Events: Pull requests
```
The webhook relay (a simple Express server) transforms the GitHub event into a Discord message:
```javascript
// webhook-relay/index.js
app.post('/github', (req, res) => {
const event = req.headers['x-github-event'];
const payload = req.body;
if (event === 'pull_request' && payload.action === 'opened') {
const pr = payload.pull_request;
// Discord message to Peter's channel
sendDiscord({
channel: 'dev-code-review',
content: `@Peter New PR: "${pr.title}" from ${pr.user.login}
PR #${pr.number}: ${pr.html_url}
Branch: ${pr.head.ref} → ${pr.base.ref}
Please review.`
});
}
res.status(200).send('ok');
});
```
Peter sees the Discord message, reads the PR, and reviews it.
Security note: The webhook relay needs a public URL — or an internal relay via Tailscale. We run ours on a dedicated Hetzner Nano server (€2/month) behind nginx.
---
Test Generation: The Use Case That Saves the Most Time
The most useful workflow for us isn't the PR review — it's automatic test generation.
In our monorepo, tests are frequently missing for new utility functions. Peter's job: if a PR contains new files without corresponding test files, he writes the tests.
```
Prompt (as extension of the PR review prompt):
If the PR contains new TypeScript files that don't have a test file yet
(convention: *.test.ts next to the source file):
1. Analyze the exported functions/classes
2. Write Bun tests for the most important paths:
- Happy path (normal input, expected output)
- Edge cases (empty input, null/undefined, boundary values)
- Error scenarios (invalid input, missing dependencies)
3. Create the test file directly in the source file's directory
4. Run the tests: bun test <path>
5. If tests fail: analyze the error and fix the tests
6. Post the test file as a suggestion in the PR comment
```
The result: PR authors get not only code feedback but also a test suggestion. In practice, ~70% of these tests are adopted unchanged.
Time saved per PR: ~45 minutes of developer time (writing tests + iterating).
---
Bug Diagnosis: Log Analysis on Demand
Another Peter workflow: bug reports from logs.
When a staging server error occurs, Dimitrios sends a message to Peter's Discord channel:
```
Dimitrios: "@Peter Error on staging since 14:30: [paste stacktrace]"
```
Peter:
```
1. Analyze the stacktrace: which file, which line?
2. git blame the affected line: who changed this last?
3. git log -20 --oneline: which PRs were merged today?
4. Correlate: which PR might have caused the bug?
5. Form hypothesis: "Likely cause: PR #182 (merged 14:12)
removed error handling from getUserById."
6. Fix suggestion as a code snippet
7. Create ClickUp task: "Bug TC-XX: [description], caused by PR #182"
```
In practice this takes 3-8 minutes instead of 20-40 minutes manually.
---
Security: What Peter Can Actually Do
This is the most critical aspect of a coding agent. Peter has shell access and can execute code — that's by design. But without boundaries, that's dangerous.
Container Isolation as a Hard Boundary
Peter runs in a Docker container. His volume mounts only the project directory:
```yaml
# docker-compose.yml
services:
peter:
image: openclaw/openclaw:latest
volumes:
- ./workspaces/peter:/workspace:rw
- /home/sam/projects/humanizing-agents-monorepo:/project:rw
# No access to other projects, no /etc, /home, etc.
environment:
- GITHUB_TOKEN=${GITHUB_TOKEN_PETER}
# Peter has his own, restricted GitHub token (read + PR comment only)
```
Peter's GitHub token has only these permissions:
He cannot push, cannot merge, cannot delete branches. Even if someone manipulates Peter's prompt — the token doesn't have those permissions.
The "No Push" Principle
Peter's SOUL.md says "git push — never push directly." But instructions can be circumvented.
Our technical safeguard: Git hooks in the project repository:
```bash
# .git/hooks/pre-push
#!/bin/bash
# Check if the pusher is Peter
if [ "$GIT_AUTHOR_NAME" = "Peter-CodingAgent" ]; then
echo "BLOCKED: Coding agent is not allowed to push directly"
exit 1
fi
exit 0
```
Two layers of protection: instruction (SOUL.md) + technical block (Git hook). Both have to fail simultaneously for a direct push to happen.
---
Peter's Daily Workflow in Practice
To illustrate: what Peter actually does in a typical day for us.
9:00 AM — first cron check:
Three open PRs since yesterday evening. Peter reviews all three. PR #201: no CRITICAL, two HIGH (missing input validation). PR #202: CRITICAL (SQL string concatenation without escaping). PR #203: tests missing — Peter writes them.
11:00 AM — message from Dimitrios:
"Peter, staging is throwing 500s since the deploy." Peter analyzes the logs, identifies the commit, posts a fix suggestion within 5 minutes.
2:00 PM — dependency cron:
Peter runs `bun outdated`, compares against changelogs, identifies a breaking change in the next major version of a core library. Creates a ClickUp task with migration notes.
5:00 PM — last PR of the day:
A junior developer used three `any` types in TypeScript. Peter comments with an explained type fix suggestion.
Total human code review time for the team that day: ~30 minutes (for final decisions only). Peter's contribution: ~3 hours equivalent review work.
---
What Doesn't Work (and Why)
Honesty: a coding agent isn't a developer replacement.
Peter is bad at:
Peter is good at:
The rule: the more structured the task, the better Peter performs. The more context and judgment required, the more a human needs to be involved.
---
The Complete Setup
The complete configuration for Peter — Docker Compose, SOUL.md, GitHub token setup with minimal permissions, the exact cron job prompts, and the Git hook safeguard — is documented in the OpenClaw Setup Playbook.
The playbook doesn't just show the setup — it shows the iteration: how we refined Peter's prompts over weeks, which review comments were too vague, and how we improved them.
18 chapters, based on real production experience.
Fully available in German too. 🇩🇪
Want to learn more?
Our playbook contains 18 detailed chapters — available in English and German.
Get the Playbook