Son of Anton: building an AI agent that job-hunts while I sleep

I read an entire open-source agent system end to end, then rebuilt it for one mission and made it cheaper. A websocket bus, real-browser connectors, skills instead of sub-agents, and a soul that bans em-dashes.

I spent a long time reading the entire architecture of an open-source agent system called OpenClaw, end to end. The gateway, the connectors, the webhooks, the memory, the way its underlying agent runtime actually thinks. And then I did the obvious thing: I rebuilt it myself, pointed it at exactly one mission, and made it run on a fraction of the tokens. I named it Son of Anton.

Here is what it does. It sits on my LinkedIn, X, and Reddit feeds and watches for companies that are hiring. When it spots one, it researches the person who posted, writes them a cold email in my actual voice, and waits for me to tap approve. Then it sends it, tracks the reply, follows up if they ghost me, and can even fill out the real job application in a real browser, 2FA and all. While I sleep.

It is, with no exaggeration, a tiny version of me that does the part of job hunting I hate. This is how it is built, chapter by chapter.

1. The shape of it

Son of Anton is not one program. It is a little society of them talking over a websocket bus, and the whole thing is a TypeScript monorepo.

flowchart TD
    subgraph SRC["my feeds"]
      LI[LinkedIn]
      X[X]
      RD[Reddit]
      GM[Gmail]
      WA[WhatsApp]
    end
    SRC --> CN["connectors<br/>(real browser sessions)"]
    CN -->|websocket| GW["GATEWAY<br/>a bus that never calls an LLM"]
    CRON([crons poll feeds]) -->|tickets| GW
    YOU([you]) -->|approve / revise / drop| GW
    GW -->|one ticket at a time| MA["MAIN AGENT, the brain<br/>PI runtime · fresh session per ticket"]

The cast:

The Gateway is a bus that never thinks. It owns the work queue and routes messages, and it deliberately never calls an LLM.
The Main Agent is the brain. It runs on PI, a minimalist agent runtime, and it is the only thing in the system that actually reasons.
The connectors are the hands, one per platform, and they drive real logged-in browser sessions.
The crons are the heartbeat. They poll my feeds and my inbox on a schedule so the agent has something to react to.
Me. Nothing goes out without my sign-off.

The clever part is not any single piece, it is that they are decoupled. A connector crashing cannot take down the brain. The brain crashing cannot lose a task. And I can bolt on a whole new platform without touching anything else. That separation is the entire reason a system this sprawling stays sane.

2. Everything is a ticket

The single best decision in the whole thing is that there is exactly one unit of work: the ticket. A hiring post spotted is a ticket. An email arriving is a ticket. A follow-up coming due is a ticket. My approval is a ticket. Everything funnels through the same shape.

The Gateway owns those tickets, and it is proudly dumb. It never makes a decision, it just enforces opinions: one ticket in flight at a time, strict FIFO, and anything that needs my approval gets parked off to the side so it never clogs the line.

flowchart TD
    P([a hiring post in my feed]) --> QU["qualify<br/>real job? does it fit me?"]
    QU --> RS["research<br/>who posted it, find an email"]
    RS --> DR["draft<br/>a cold email in my voice, real numbers"]
    DR --> PK{park for me}
    PK -->|approve| SD["send via Gmail<br/>(the right resume attached)"]
    PK -->|revise| DR
    PK -->|drop| ENDN([dropped])
    SD --> FU([follow up in 7 days if they ghost])

This is where the reliability actually lives. Every ticket carries its lineage (this draft came from that research, which came from that post), so I can trace a single Reddit post all the way through to a sent email. If the brain dies mid-task, a heartbeat notices and the ticket goes back on the queue. And every outbound action is idempotent, which is a fancy way of saying a crash-and-retry can never accidentally send the same email twice. For a system that touches my real inbox, that last property is non-negotiable.

3. The brain, and why it has skills instead of sub-agents

The Main Agent runs on PI, a tiny open-source agent runtime that happens to be the exact same one OpenClaw uses under the hood. I only know that because I read OpenClaw's source, and finding it felt like discovering the cheat codes. Each ticket gets one fresh PI session: the model sees its tools, calls them, reads the results, loops until it is done, and then the session is thrown away. No state bleeds from one task into the next.

Now here is my favorite place where I split from OpenClaw. OpenClaw spreads work across a bunch of specialist sub-agents: a researcher, a writer, a qualifier, each with its own prompts and plumbing. I deleted all of that and replaced it with skills: plain markdown files, each one a written procedure for a job (how to qualify a post, how to research a person, how to write a cold email).

The move that makes it cheap is progressive disclosure. The system prompt only ever holds a one-line summary of each skill, like a table of contents. The instant the agent actually needs the "write a cold email" procedure, it calls loadSkill and the full 200-line body drops into context, just for that one ticket.

flowchart TD
    SP["system prompt, always loaded, tiny:<br/>one line per skill, ~50 tokens each"] --> NEED{agent needs one}
    NEED -->|"loadSkill(write/cold-email)"| BODY["the FULL 200-line procedure loads,<br/>only for this ticket"]
    NEED -.->|the rest stay closed| REST(["the other 99 skills cost nothing"])

So I can have a hundred skills and the agent only pays tokens for the two or three it touches on a given task. It is basically demand paging, the trick operating systems use to keep memory small, except applied to an LLM's attention. One fewer layer of scaffolding than OpenClaw, and a much smaller prompt. And because skills are just files in git, the agent can later improve its own, which we will get to.

4. Spending far fewer tokens than the thing I copied

This was the whole point of the exercise: do OpenClaw's job for a fraction of the cost. A few moves stack up to get there.

First, not every task deserves a smart model. So every call is tagged by its lane: classifying a post is cheap, writing an email is expensive, generating the final prose is its own thing. A local proxy reads that tag, swaps in the right model, and for the cheap lanes it forcibly switches the model's "thinking" off.

flowchart LR
    C[classify a post] --> PX{{proxy}}
    U[quick utility] --> PX
    W[write the email] --> PX
    F[final prose] --> PX
    PX -->|cheap lanes| CH["cheap model, thinking OFF<br/>884 → 4 tokens on one model"]
    PX -->|writing| GD[a good model, short leash]
    PX -->|compose| CX["Codex, low reasoning effort<br/>~70x fewer think tokens"]
    GATE(["a regex catches obvious hiring posts<br/>before any model runs: 0 tokens"]) -.-> PX

That one flag is brutal in a good way. On one of the models, a throwaway classification call went from 884 reasoning tokens down to 4. That is measured, not vibes. The final-prose lane runs Codex at low reasoning effort, which cut its thinking roughly 70x while the output stayed just as correct.

Second, the cheapest call is the one you never make. A little regex catches the obvious "we are hiring, send me your resume" posts before any model runs at all. The easy third of my feed costs zero tokens.

Third, hard limits. Every ticket has a 30,000-token ceiling so a confused agent cannot spiral, and there is a flat $8-a-day budget that quietly parks non-urgent work once it is hit. None of this is exotic. It is just refusing to swing a sledgehammer at every nail, and enforcing that in code instead of hoping for it.

5. The hands: real browsers, not nice APIs

An agent that can only think is useless. Son of Anton has to actually act on LinkedIn, X, Reddit, and Gmail, and the inconvenient truth is that most of these no longer hand you a usable API (Reddit straight up killed self-serve API keys in late 2025). So the connectors drive real, logged-in browser sessions instead.

Each platform has a small command-line tool (one for LinkedIn, one for X, one for Reddit) built on Patchright, an undetected fork of Playwright. They run a persistent Chrome profile so every visit looks like the same human coming back, they scroll with humanized, jittered wheel events instead of teleporting down the page, and they bail the instant they hit a checkpoint instead of hammering into a ban. I never hand them a password. I log in by hand once, and the session sticks around.

The flashiest hand is the application filler, which runs on Skyvern, a browser agent that reads a page the way a person does (a screenshot plus the page structure) and fills the form. The genuinely cute part is the 2FA handling:

flowchart TD
    A([approved: apply to this job]) --> SK[Skyvern opens a real browser, fills the form]
    SK --> Q{a question it<br/>cannot answer?}
    Q -->|yes| ANS[agent answers from my profile, hands it back]
    ANS --> SK
    SK --> TF{a 2FA wall?}
    TF -->|yes| OTP["a Gmail watcher reads the code,<br/>relays it, Skyvern types it in"]
    OTP --> SK
    SK --> DONE([submitted · ~$0.03 · ~10 min · no human])

When a careers portal demands an emailed code, a Gmail watcher reads it out of my inbox, drops it on a tiny local relay, and Skyvern types it in. A full application (login, 2FA, a multi-page form) costs about three cents and ten minutes and exactly zero of my attention.

6. Giving it a memory and a soul

For an agent to feel like a persistent thing instead of a goldfish, it needs memory, and Son of Anton has three layers of it. Plain markdown files for the things a human writes (who I am, what jobs I want, my real achievements). A Postgres database for the things it queries (a little CRM of every person and company it has met, every email sent, every follow-up scheduled). And a semantic memory layer so a lesson learned on one task can resurface on a later one. It even keeps a dated daily log, in its own words, of everything it did that day.

flowchart TD
    L1["layer 1 · identity files, markdown, git-tracked<br/>who I am, my voice, my targets, my real metrics"]
    L2[("layer 2 · Postgres, the queryable brain<br/>people, companies, emails sent, follow-ups due")]
    L3["layer 3 · semantic memory<br/>last time a post looked like this, here is what happened"]
    L1 --> L2 --> L3

The file I love most is SOUL.md. It is the agent's voice constitution: first person, terse, every claim backed by a real number, no hedging, no boasting. And it carries a list of hard bans, which includes, and I promise I am not making this up, no em-dashes, plus a blacklist of AI-slop words like "leverage," "synergy," and "passionate." So my job-hunting agent writes under the exact same rules I am writing this post under right now. Both of us, forbidden from the em-dash. Solidarity.

7. Keeping me in the loop, and letting it improve itself

The one rule I never bent: it cannot send anything without me. Drafts get parked, and I approve, revise, or drop them, either from a little web inbox (with keyboard shortcuts, because I am not a savage) or straight from WhatsApp when I am out. That is not a polite convention, it is enforced in the plumbing. A ticket that needs approval physically cannot proceed on its own.

And every time I revise a draft, that is a signal. The system logs what I changed and why, and once a pattern builds up (say I keep softening the same overeager opener) it spins up a ticket to rewrite the underlying skill, proposes the diff, and waits for me to approve that too. So it is quietly doing RLHF on itself, off my edits. Day 30's skills are sharper than day 1's, tuned by a month of me hitting "revise."

There is a deliberate asymmetry in the safety, too. If the post classifier ever errors out, it keeps the post instead of dropping it, because a missed job lead is gone forever while an extra thing in my review queue costs me five seconds. When the two failure modes are that lopsided, you build the bias right into the architecture.

What I actually got out of it

The thing works. It has sat on my feeds, found real roles, written real cold emails grounded in my real numbers, and sent them the second I tapped approve. It is a genuinely strange feeling to watch a system you built quietly do your networking for you at 2am.

But the job pipeline was never really the prize. The prize was learning that you can take a serious piece of software, read it end to end until you understand why every single piece is shaped the way it is, and then rebuild it leaner for your exact problem. OpenClaw taught me the architecture. Son of Anton is me proving I actually understood it. 🤝