CxOPack
KIT · 02 / 05

CTO Kit

Ship more, rewrite less. Decisions documented. Blast radius scoped.

The CTO Kit is a senior-staff-engineer co-founder that documents your decisions, scopes your MVP, reviews your PRs, and refuses to let you add auth, admin, or analytics before signal. Install once, then let it push back when you drift.

#1. Review, plan, brainstorm

A CTO Kit is only useful if you can articulate what you're building, how, and where the cracks already are. Write these out first.

Answer these first

  1. In one sentence: what's the hypothesis your MVP is testing? (If >1 sentence, the hypothesis is too big.)
  2. Name the 3 biggest architectural decisions you've made in the last 6 months. Are any of them documented? Anywhere?
  3. Last production incident: what broke, how did you find out, how long was recovery, what changed after?
  4. What's one piece of tech debt you feel at least once a week but haven't fixed? Why haven't you?

Brainstorm prompts

  • If you got hit by a bus tomorrow, what's the single hardest thing for a new dev to rebuild? That's your highest-leverage ADR.
  • What have you built that a 3-person team would have bought instead? Could you buy it now?
  • Which part of your codebase do you avoid on Fridays? That's the triage list.

#2. Connect MCPs and tools

Install these MCP servers (or their equivalent integrations) to make the CTO Kit 10× more useful. Not all are required — pick the top 1–2 to start.

GitHub

code-review subagent reads the diff directly. adr skill writes to docs/adr/ in your repo.

Docs
Linear

mvp-scope pulls your feature list and writes back the cut version tagged 'not in MVP'.

Docs
Sentry

tech-debt-triage pulls top error frequencies + user counts to score what actually hurts.

Docs
Vercel

Deploy + rollback context for incident post-mortems. Links errors to specific deployments.

Docs
PostgreSQL (your DB)

build-vs-buy asks it whether you can answer your question with existing tables before building a new feature.

Docs
Don't have MCP set up yet?
MCP is the Model Context Protocol — Anthropic's standard for letting AI tools read from your software. Install guides: Claude Code MCP setup. ChatGPT uses Actions (OpenAPI) instead of MCP — same capabilities, different protocol.

#3. Define your cadence

The CTO Kit works because you run it consistently. Block the weekly ritual in your calendar now — even 15 minutes, non-negotiable, beats 2 hours every 3 months.

Daily

  • Before any merge·2 mincode-review
    Run code-review subagent on the diff. Only merge if Must-fix list is empty.

Weekly

  • Friday afternoon·20 mintech-debt-triage
    Tech debt triage. Pick 1 item to fix next week. Close the rest.

Monthly

  • First Monday·30 min
    Scan your 4 most-used third-party tools. Any worth replacing? Any consolidating?

Ad-hoc

  • Before any decision that is hard to reverse·15 minadr
    Write an ADR. The skill forces one negative consequence + an observable revisit trigger.
  • When scoping a new feature or MVP·30 minmvp-scope
    RICE-score everything, cut to 5, produce the 'explicitly not in MVP' list.

#4. Use the skills

Every asset in the CTO Kit — with when to trigger it, the exact step-by-step, an example in/out, and the common pitfalls. Read them once; refer back as you run the cadence.

adrSkill
Trigger
"write an ADR" / "document this decision"
When to use
Any decision that (a) is hard to reverse, (b) you'd need to explain to a new dev in 6 months, or (c) involves paid dependency or platform lock-in.
Step-by-step
  1. Give the skill the title, context, and at least 2 alternatives you considered.
  2. It writes docs/adr/NNN-title.md (auto-incremented number) in the Michael Nygard format.
  3. Sections: Status, Date, Context, Decision, Consequences (+/−/neutral), Alternatives, Revisit trigger.
  4. It forces at least one negative consequence — if you can't name one, you haven't thought hard enough.
Example input
"We're picking Postgres as primary store. Alternatives: DynamoDB (we tried), Planetscale (expensive)."
Example output
docs/adr/0014-use-postgres-as-primary-store.md with explicit negative consequence ('JSON column queries will be slower than DynamoDB at >1M rows') and revisit trigger ('if p99 > 200ms for 3 consecutive days').
Pitfalls
  • Revisit triggers must be observable. 'If performance becomes an issue' → the skill will ask for a specific threshold.
  • Don't write ADRs for reversible choices. Use a simple decision log entry instead.
mvp-scopeSkill
Trigger
"scope my MVP" / "what should be in v1"
When to use
Every new feature set, every time the scope starts creeping past 2 weeks.
Step-by-step
  1. State the hypothesis in one sentence: If I ship X, then Y will do Z in N days, proven by metric M.
  2. The skill RICE-scores every feature on the list.
  3. Cuts to top 5 by score. If they don't deliver the hypothesis together, adds 1 more; if still not, says the hypothesis is too big.
  4. Produces the 'NOT in MVP' list with the trigger for each item.
  5. Defines kill criteria: if metric < threshold by day N, pivot before adding more features.
Example input
10-feature list for a new landing + onboarding flow
Example output
5 locked features (scored, totaling ~8 dev-days), 'not in MVP' list of 5 with 'revisit when X' triggers, kill criterion: 'if <10% signup-to-activation in week 1, pause and talk to users.'
Pitfalls
  • If the founder lists 'polish' or 'analytics' as features → the skill auto-cuts them.
  • Admin panels: rejected until you have 50+ users. Use the DB directly.
code-reviewSubagent
Trigger
Invoke on a diff or changed-files set. Works with git show, gh pr view, Cursor PR view.
When to use
Before every merge into main. Non-negotiable for anything touching payments or auth.
Step-by-step
  1. Feed the diff to the subagent.
  2. It prioritizes: P1 correctness + blast radius, P2 robustness at boundaries, P3 clarity.
  3. Skips style nits a linter catches.
  4. Output: ✋ Must-fix / 🤔 Should-consider / 👀 Nit / What's good.
Example input
Diff: a new /api/refund endpoint with no auth check, using SQL string interpolation.
Example output
✋ auth missing (unauthenticated refund = financial loss). ✋ SQL injection risk. 🤔 no idempotency key — duplicate refunds possible. What's good: error messages avoid leaking internal IDs.
Pitfalls
  • If the review returns zero 'Must fix' + has no 'What\'s good' bullet — re-prompt. The subagent should always call out a good choice.
  • For very large diffs (>500 lines), split into logical chunks. Otherwise signal-to-noise collapses.
stack-advisorSkill
Trigger
"what should I build this with" / new project
When to use
Starting a new project, rewriting a component, evaluating a migration.
Step-by-step
  1. 5-question intake: strongest lang/framework, product shape, deadline, ops capacity, acceptable lock-ins.
  2. Default boring and proven — novelty requires justification.
  3. Output: stack table, 'Why NOT <the tempting option>', setup checklist, kill criteria.
Example input
"New SaaS, TS bg, 4 weeks, solo, no infra chops"
Example output
Next.js + Supabase + Drizzle + Stripe + Resend + Vercel + Sentry. Reject: AWS primitives, microservices, rolled-own auth. Setup checklist (6 steps, first hour).
Pitfalls
  • 'Best-in-class' of everything — integration tax wipes gains.
  • Cloud-primitives lock-in on day 1 for 1-person team.
build-vs-buySkill
Trigger
"should I build X" / "<SaaS tool> or ship myself"
When to use
Any non-trivial capability where build is an option.
Step-by-step
  1. Score 0-5 on Differentiation / Speed-to-value / Ops burden / Switching cost.
  2. Rule: Diff ≥4 & ops ≤2 → BUILD · Switching ≥4 & Diff ≤2 → BUILD · Diff ≤2 & speed ≥3 → BUY · else BUY.
  3. Common capabilities default to BUY: auth, email, payments, analytics, storage, search, queues, flags.
Example input
"Build auth with Lucia?"
Example output
Diff 1 / speed 4 (Clerk 2h) / ops 5 (security forever) / switching 3. Verdict: BUY Clerk.
Pitfalls
  • "But we can do it better" — at what cost.
  • Building for imagined future scale.
tech-debt-triageSkill
Trigger
"what should I fix" / Friday cadence
When to use
Every Friday afternoon, 20 min.
Step-by-step
  1. Inventory: TODOs, Sentry errors, avoided files.
  2. Score: (pain × frequency) / fix-cost. Max 10 candidates.
  3. Pick 1 to ship next week.
  4. Save to founder-log/tech-debt/YYYY-WW.md.
Example input
(runs on Friday)
Example output
Top: 'Stripe webhook retries flaky' — 4×5/3h = 6.7. Ship next week. Deferred 2. Killed 1 that wasn't really debt.
Pitfalls
  • Big refactor weeks are a trap.
  • Score inflation (everything at pain 5).
Your first win
Run the code-review subagent on your next PR. If it flags something you would've missed, the kit has paid for itself.