Chapter 19 · Projects to Build

Theory means nothing without projects. Do three of these in the next quarter.

Ten project ideas, in rough order of increasing difficulty. Each is scoped to a real learning outcome. All of them are portfolio-worthy if shipped in public.

In plain English. Reading about AI without building is like reading about swimming. You can understand the strokes perfectly and still drown in three feet of water.

How the projects build on each other

flowchart TB
    P1[1. Slack docs bot
RAG basics] --> P3[3. PR reviewer bot]
    P1 --> P6[6. Internal search]
    P2[2. Personal MCP server] --> P3
    P2 --> P4[4. SQL analyst agent]
    P4 --> P5[5. Incident triage agent]
    P3 --> P9[9. Autonomous coding agent]
    P5 --> P9
    P6 --> P10[10. Research agent]
    P7[7. Evals harness] --> P9
    P8[8. Distilled tiny classifier] --> P9
    P9 --> Star((Architect of
agentic systems))
    P10 --> Star

Pick three. Do them. Write a blog post per project. By the end of the quarter, you will be a different engineer.

19.1 "Ask our team's docs" Slack bot

Time: 1–2 weeks. Difficulty: low.

Ingest Confluence, Notion, or a Git-stored set of Markdown docs.
Chunk, embed, store in pgvector.
Expose a Slack slash command /ask that retrieves top-k chunks, passes them to Claude or GPT, and replies with an answer + citations.
Log { query, retrieved_ids, answer, thumbs } for future evals.

Teaches: RAG end-to-end, embeddings, pgvector, hybrid search, structured outputs, feedback logging.

flowchart LR
    S[Slack /ask] --> API[FastAPI]
    API --> PG[(pgvector)]
    API --> LLM[Claude Sonnet]
    PG --> LLM
    LLM --> API --> S

19.2 Personal MCP server for something you touch daily

Time: 1 weekend. Difficulty: low.

Wrap your todo list, RSS feeds, home automation, or anything you use daily in an MCP server.
Expose 3–5 tools (list, add, update, search).
Register it with Claude Desktop or Cowork.
Now the model in those apps can actually read and act on your data.

Teaches: MCP server development, tool design, stdio transport.

19.3 PR reviewer bot

Time: 1 week. Difficulty: low–medium.

GitHub Action that triggers on PRs.
Reads the diff, runs a Claude prompt with your team's style guide in the system prompt.
Posts review comments on specific lines.
Surfaces obvious bugs, missing tests, and deviations from conventions.

Teaches: CI integration, structured outputs mapping to specific lines, prompt-as-policy, prompt caching.

19.4 Natural-language to SQL agent

Time: 2 weeks. Difficulty: medium.

Read-only connection to your staging DB.
Agent takes a question, inspects schema via tools (list_tables, describe_table, sample_rows), generates SQL, runs it, summarizes.
Guardrails: read-only role, LIMIT default, query timeout, row-count caps.
Streamlit or Next.js UI.

Teaches: tool use, schema introspection as context, safety, pagination, evals on SQL correctness.

flowchart LR
    U[User question] --> A[Agent]
    A --> T1[list_tables]
    A --> T2[describe_table]
    A --> T3[sample_rows]
    A --> T4[run_sql_readonly]
    T4 --> DB[(DB)]
    A --> R[Answer + SQL + table]
    R --> U

19.5 Incident triage agent

Time: 3 weeks. Difficulty: medium.

Listens on PagerDuty or Opsgenie webhook.
On incident creation, fans out tool calls to Prometheus, log service (Loki/Datadog), deploy tool, recent PRs.
Summarizes what changed, hypothesizes a cause, drafts an RCA starter.
Posts to the incident Slack channel.

Teaches: multi-tool parallel use, time-series queries, deploy introspection, report generation, on-call empathy.

19.6 "Ask our company" unified search

Time: 3–4 weeks. Difficulty: medium.

Ingest Slack, Confluence, Notion, Google Drive, Linear/Jira.
Per-source parsing, chunking, embedding.
Unified search endpoint with hybrid retrieval and ACL filtering.
Reranker on top-50 → top-5.
Web UI with citations and per-source filters.

Teaches: multi-source ingestion, ACL at query time, rerankers, hybrid search, UI design for AI, feedback loops.

19.7 Evals harness for your team

Time: 1 week. Difficulty: medium. Leverage: enormous.

50–200 canonical test cases (real inputs, expected outcomes) for every prompt your team ships.
promptfoo or pytest runner.
CI integration: block PRs that regress eval scores.
LLM-as-judge for subjective dimensions with a written rubric.
Dashboard of scores over time.

Teaches: eval design, LLM-as-judge patterns, CI/CD for AI, regression discipline.

19.8 Fine-tuned tiny classifier (distillation)

Time: 2 weeks. Difficulty: medium–hard.

Pick a high-volume classification task at your company (ticket routing, content moderation, intent detection).
Generate 10k training examples using a frontier model as the teacher.
Hand-curate or clean ~500 of them carefully.
QLoRA fine-tune a 3–7B open model on a single GPU.
Serve with vLLM behind a feature flag; A/B vs the frontier API.
Watch cost drop 10–100×.

Teaches: distillation, LoRA/QLoRA, vLLM serving, A/B testing, drift monitoring.

19.9 Autonomous coding agent in a sandbox repo

Time: ongoing. Difficulty: hard.

Small, well-tested repo (yours or a toy open-source one).
Ticket queue in a file or issue tracker.
Claude Code or custom loop: picks a ticket, opens a branch, plans, codes, tests, PRs.
You review in the morning.
Iterate on the system prompt, the CLAUDE.md, the tool palette.

Teaches: real agentic engineering, prompt-as-product, human-in-the-loop design, your own appetite for autonomy.

19.10 Multi-hop research agent for a domain you love

Time: 2–3 weeks. Difficulty: hard.

Pick a domain (a sport, a company's financials, a scientific topic).
Build a multi-agent system: planner → researchers (web search + read) → synthesizer → citation checker.
Output: a structured report with citations and confidence levels.
Ship as a tool internally or publicly.

Teaches: multi-agent orchestration, planning, source triangulation, citation design, cost management on long runs.

flowchart TB
    Q[Question] --> P[Planner]
    P --> R1[Researcher 1] --> S1[Summary + citations]
    P --> R2[Researcher 2] --> S2[Summary + citations]
    P --> R3[Researcher 3] --> S3[Summary + citations]
    S1 --> SYN[Synthesizer]
    S2 --> SYN
    S3 --> SYN
    SYN --> CC[Citation checker]
    CC --> O[Report]

19.11 Bonus — teach one of them in public

Write each project up. A blog post, a GitHub README, a short video, a talk at a meetup. Publishing doubles the learning and triples the career return. Your future self will Google themselves and find these posts; so will hiring managers.

19.12 How to pick which three

Choose one project from each of these buckets, not three from the same bucket:

Bucket A: Foundation. Project 1 (Slack bot), 2 (MCP), or 3 (PR reviewer). Teaches core RAG or tool use.
Bucket B: Integration. Project 4 (NL-to-SQL), 5 (incident triage), or 6 (unified search). Teaches you to wire AI to real systems.
Bucket C: Advanced. Project 7 (evals), 8 (fine-tune), 9 (autonomous coding), or 10 (research agent). Teaches a specific depth.

Starting over today, I'd do: Project 1, Project 7, then Project 8 — in that order, over a quarter. RAG for leverage. Evals for durability. Fine-tuning for depth and cost savings.

19.13 Rules for shipping

Scope down ruthlessly. Every project above is intentionally small. If it grows, cut.
Ship the ugly v1. Perfection is the enemy of deployed.
Measure something. Users, queries, thumbs-up rate, latency, cost. Pick a number and watch it.
Document as you go. An AI can draft the README from your commits, but you should edit it.
Stay honest in the post-mortem. What you thought would be hard vs what actually was. This is where your real growth is.