OpenAI shows the next step for agents: learning from real tax work
OpenAI and Thrive built a Codex-powered tax agent that worked on 7,000 returns, saved roughly a third of prep time, and improves from expert corrections.

OpenAI has shown one of the more practical versions of an AI agent yet: not a flashy assistant that promises to do everything, but a tax workflow system that helps professionals draft business returns, compute schedules, and learn from real expert corrections.
The project was built with Thrive Holdings for Crete, a network of more than 30 CPA firms. In pilot work across 7,000 tax returns, the agent reportedly saved roughly one-third of manual preparation time and reached up to 97% draft accuracy.
That combination matters. AI agents are often discussed as if the hard part is making them "autonomous." In professional services, the harder part is usually making them useful, reviewable, and boring enough to trust.
This is why the tax example is interesting. It points to a more serious future for agents: systems that do not merely answer questions, but absorb a firm's workflow, expose their reasoning, accept corrections, and improve the process behind the work.
What OpenAI actually built
The agent is designed around tax preparation, especially the messy business-return work that sits between raw documents and a finished filing.
It can help draft returns, prepare supporting schedules, handle calculations, and account for state or local amendments. That is not a small chatbot wrapper over a form. It is a domain workflow with many moving parts: documents, rules, edge cases, firm preferences, review trails, and deadlines.
OpenAI says the system was built using Codex, the same model family and tooling associated with software engineering tasks. That is notable because tax preparation can look surprisingly like programming when you zoom out.
There are inputs, transformations, validation rules, exceptions, review comments, and final artifacts. A human professional still owns the judgment, but much of the mechanical preparation follows repeatable logic.
The better framing is not "AI replaces accountants." The better framing is "AI starts to operate inside the workflow language of accounting firms."
Why tax is a brutal test
Tax work is a strong proving ground because mistakes are not merely annoying. They can cost money, create compliance problems, or damage client trust.
The task also has a difficult shape. A model has to read structured and unstructured documents, apply rules that change by jurisdiction, remember firm-specific conventions, and produce work that a professional can inspect quickly.
That last point is critical. A tool that saves time only when everything is perfect is not enough. In a real firm, the agent has to be useful even when the draft needs review.
The output must make sense to a tax professional. It must be organized, traceable, and easy to correct. If a reviewer spends more time untangling the agent's work than preparing the return manually, the tool loses.
The reported time savings suggest the agent cleared that first practical hurdle. Saving about a third of prep time does not sound like science fiction. It sounds like a product a busy firm might actually keep.
The self-improvement loop
The most important part of the announcement is not the headline accuracy number. It is the learning loop.
OpenAI describes a system that improves from production traces, expert corrections, evaluations, and proposed workflow changes. In plain language: the agent does the work, professionals review it, the system captures what was wrong or inefficient, and the workflow is updated.
That is a different pattern from asking a model the same question again and hoping the next answer is better.
A serious agent needs memory at the process level. It should learn that a certain document type is handled in a specific way, that a firm reviews a calculation with a particular checklist, or that a recurring mistake deserves a rule rather than another reminder.
This is where professional agents become more like living operating procedures. The firm does not only get an assistant. It gets a workflow that can be measured, corrected, and improved.
Why Codex matters here
Codex is usually associated with code, but this project hints at a broader pattern: many business processes are code-adjacent even when they do not look like software.
Tax preparation has conditional logic. So does insurance underwriting, loan review, compliance reporting, healthcare documentation, procurement, and internal audit.
These domains are full of tasks where the work is not creative in the open-ended sense. It is structured judgment: follow the rules, handle exceptions, produce a draft, and make it reviewable.
That is exactly where code-oriented agents may have an advantage. They are built to reason through steps, manipulate structured artifacts, and revise outputs after feedback.
The lesson is not that every office job becomes programming. The lesson is that high-value knowledge work often contains hidden program-like routines. Agents get useful when they can see and improve those routines.
Human review is still the product
The tax agent is not interesting because it removes humans from the loop. It is interesting because it makes the loop more productive.
Human review is not a temporary limitation here. It is part of the product.
A CPA firm does not need an AI system that confidently files complicated returns without oversight. It needs a system that prepares strong drafts, surfaces uncertainty, respects firm standards, and makes expert review faster.
That model is more realistic for regulated and reputation-sensitive industries. The AI handles preparation. The professional handles judgment, accountability, and final sign-off.
This is also where the economics become compelling. If expert time is scarce, the best agent is not the one that pretends experts are unnecessary. It is the one that lets experts spend less time on routine assembly and more time on the decisions that actually need them.
What this means for professional services
Tax is one vertical, but the pattern travels.
Law firms have contract review and discovery workflows. Accounting firms have audit support and month-end close. Consultancies have research, model building, and client-ready deliverables. Banks have compliance reviews. Insurers have claims files.
In each case, the agent has to do more than summarize. It has to enter an existing operating system of documents, approvals, exceptions, and review habits.
The winners will not be the teams with the longest prompt. They will be the teams that turn their expertise into measurable workflows: what good work looks like, what errors matter, where humans intervene, and how corrections are fed back.
That is the quiet but important shift. AI adoption is moving from "give employees a chatbot" to "teach the organization how to improve its own work loops."
The risk of over-reading the numbers
The pilot numbers are impressive, but they should not be treated as universal promises.
Seven thousand returns inside a specific network of firms can tell us that the system worked in that environment. It does not mean every accounting office can plug in the same tool and get the same accuracy or time savings on day one.
Data quality, firm process, document types, reviewer discipline, and integration depth all matter.
That is not a criticism. It is the real lesson. Agents become powerful when they are embedded, measured, and corrected in the place where the work happens.
A generic model may be impressive. A domain agent with feedback from actual professionals is what starts to change the cost structure.
The bottom line
This is one of the clearest signs that AI agents are moving out of demo mode and into operational work.
The story is not that tax professionals disappear. The story is that a firm can start converting repetitive expert workflows into agentic systems that improve with use.
That is a more grounded version of the agent future, and probably the one that matters most.
The next wave of AI in professional services will not be won by the loudest assistant. It will be won by systems that can sit inside real workflows, take correction gracefully, and make skilled people faster without making the work harder to trust.
(Photo: Kelly Sikkema / Unsplash, license.)


