Case study: a 22-unit property manager saves 18 hours a week

A four-week build replaced the manual intake-to-lease pipeline at a small Toronto property management operation. Here's the architecture, the cost, and the part the case studies usually leave out.

Published: 2026-03-26 · Author: Ahmed Heshmat · 8 min read

Key takeaways

  • A four-week build for a 22-unit Toronto property management operation gave the owner roughly 18 hours back per week — across inquiry intake, maintenance triage, lease renewal coordination, and owner reporting.
  • The recovered time was used to acquire two more buildings, not to lay off staff. The owner hired another part-time coordinator three months later.
  • The stack is deliberately boring: FastAPI on a single VPS, Postgres for state, Anthropic SDK for model calls, a thin Buildium client (~220 lines of Python), and a Streamlit operator dashboard. Total codebase under 3,000 lines.
  • The operator owns the source code, the credentials, and the GitHub access on day one. If she fired us tomorrow, the system would keep running. That is the actual deliverable.

The setup

A property manager in Toronto, 22 doors across six small buildings, two staff. The operation was profitable but stretched. The owner was doing 60 hours a week and most of that time was being eaten by the same handful of workflows: tenant inquiries, maintenance triage, lease renewal coordination, and owner reporting at month-end.

She had tried Make.com once, six months earlier. The workflow she built routed Gmail leads to a spreadsheet. It worked for three weeks and then quietly broke when a Gmail thread arrived in a format the workflow didn't expect. She turned it off. By the time we started the audit, the assumption in the office was that "AI doesn't really work for our size yet."

Two weeks of audit, four weeks of build, one week of operator handoff. Here is what we shipped and what it actually cost.

What we built

Four systems. None of them are clever. All of them are inspectable.

1. Inquiry intake to qualified lead, in 6 minutes.

Inquiries arrive across three channels — the website form, a personal Gmail address the owner had been using for years, and the listing platforms the buildings advertise on. We built a single ingestion service (Python, 280 lines) that normalizes inquiries from all three into a common schema, runs a structured Claude call to extract the obvious fields (move-in date, budget, household size, pets), and flags any inquiry that looks like a duplicate of one we've seen in the last 30 days.

The deliverable to the operator is a Slack message in a channel she opens twice a day: structured summary, link to the original message, and one of three suggested next actions. She picks the action and the next step fires. The model is not making the leasing decision. It is removing the 4 minutes of copy-paste-and-classify that every inquiry used to require.

Time spent per inquiry, before: 6–8 minutes. After: 60–90 seconds of decision time. At ~40 inquiries a week, that recovered roughly four hours.

2. Maintenance triage, with a paper trail.

The maintenance workflow was the messiest. Tenants reported issues across email, text message, and (occasionally) a voicemail the owner translated into a Notion task. The classification was inconsistent. The history was incomplete. When something escalated — a recurring leak, say — there was no easy way to look back and see how many tickets the unit had generated in the last six months.

We built a triage service that ingests all three channels, classifies the issue by urgency and category (using a tight prompt, not a fine-tuned model), and writes the result to a Postgres database the operator owns. Urgent items page the on-call vendor automatically. Non-urgent items land in a daily digest. Every ticket has a complete audit trail: original message, model classification, model confidence, vendor dispatched, resolution timestamp.

The classification is wrong about 6% of the time. That 6% is the actual job. The dashboard surfaces every low-confidence classification for human review at the start of the next business day. The 6% has a written policy. The 94% is the demo.

Time recovered: ~6 hours/week. The bigger win wasn't time — it was the audit trail. When the building's insurance carrier asked, in March, for evidence of how the operator had responded to a specific moisture report from October, the answer was in the database in 30 seconds. Before, it would have been a two-hour archaeological dig through Gmail.

3. Lease renewal coordination, 90 days out.

The owner had been managing renewals on a paper calendar. Three months before each lease expiry, she'd send a renewal letter, follow up twice if she heard nothing, and then either negotiate or post the unit for re-leasing. It worked. It was also fragile and depended entirely on her remembering.

We built a renewal cadence service that watches the lease database (Buildium-integrated via a thin Python client we wrote — for the reasoning behind why we built our own rather than using Make's pre-built Buildium connector, see [our tooling decision rule](/blog/make-vs-n8n-vs-custom-claude)), generates renewal correspondence at the 90/60/30-day marks, and routes tenant responses back to her for the decisions that actually require her. She no longer remembers when renewals are due. The system does. She just decides.

Time recovered: ~3 hours/week, with the variance being highest at month-end.

4. Owner reporting, generated.

The fourth system is the smallest and the most appreciated by the owner. At month-end, owners receive a one-page PDF: occupancy, rent collected, maintenance summary, notable events, and a forward look at the next 30 days. Before, the operator was assembling these in Google Docs from Buildium exports and her own Notion notes. Each report took 30–45 minutes. She has eight owners.

The system generates first drafts. She reviews and edits them. About 70% go out without substantive edits. The 30% with edits get her attention, which is correct — those are the owners with situations that need a human paragraph, not a templated one.

Time recovered: ~5 hours, once a month, but felt during the most stressful week.

The numbers

| Item | Before | After |

|---|---|---|

| Hours/week on these four workflows | ~22 | ~4 |

| Maintenance ticket audit trail | Manual reconstruction | Queryable database |

| Lease renewals missed in a quarter | 1–2 | 0 |

| Owner report turnaround | 45 min × 8 = 6 hrs | 1.5 hrs of review |

| Setup cost (one-time) | — | Audit + Build engagement |

| Operating cost (ongoing) | — | ~$340/month in compute + API |

| Operator retainer with us | — | Monthly, 30-day exit clause |

18 hours a week, roughly, depending on the week. About 75 hours a month back in the owner's schedule. She used the first three months of recovered time to actually take Fridays off. The four months after that she used to acquire two more buildings.

That second point matters more than the first. The point of automation is rarely "save money on labour." The point is "free a constrained person to do the high-value work only they can do." This owner is the constrained resource in her own business. Buying back 18 hours of her week is what allowed the operation to grow.

What we won't say

Some things you would normally read in a case study, that you won't read here:

We won't say "AI replaced a part-time employee." It didn't. The owner did not have anyone to replace; she was doing the work herself. She also did not stop hiring. She hired another part-time coordinator three months into the operate phase, because the new doors required it. The automation didn't reduce headcount. It made the headcount she already had work on different things.

This is the part of the case study we are most careful about. We don't build AI systems whose purpose is to cut staff. We build systems that give a constrained operator their evenings back. The economic story isn't "fewer humans." It's "the humans you have working on what humans should work on." That distinction is not cosmetic. It is the difference between deploying AI responsibly and deploying it in a way that quietly hollows out a small business. ([Why we named the company "the system"](/blog/why-we-named-the-company-the-system) goes into the reasoning behind that refusal.)

We won't say the system is "97% accurate." It isn't, in any meaningful sense, because accuracy isn't a single number across four different workflows. What we'll say is: the maintenance classifier is right about 94% of the time, the 6% it gets wrong is reviewed every morning, and the false-positive rate on urgent classifications is under 2%. Those are the numbers the operator actually cares about, because those are the numbers that determine whether the dashboard she relies on is trustworthy.

We won't say "we used GPT-5 with proprietary RAG." We used Claude Sonnet 4.6 with carefully written prompts and a Postgres database. The technology was deliberately boring. The work that mattered was in the operations design — what the system does, what the operator does, where the handoffs are — not in the model selection.

The architecture, in one paragraph

For the engineers reading: FastAPI service on a single VPS (~$24/month), Postgres for state, Anthropic SDK for model calls, a thin Buildium client for property data (handling auth, pagination, and rate-limiting in 220 lines of Python), Slack for operator notifications, and a Streamlit dashboard the operator opens in the morning. The entire codebase is under 3,000 lines. The operator has the GitHub access. The credentials are hers. If she fired us tomorrow, the system would keep running.

That last sentence is the actual deliverable. Everything else is implementation.