SpecKit vs Superpowers: Same Spec, Two Agentic Workflows

I ran a Forex visualizer spec through both tools and sent the output to independent code review. Superpowers got the domain right. SpecKit got the structure right. Here's what that means in practice.

Peter Puglisi May 14, 2026

I ran the same specification through two agentic workflows and sent both codebases for independent code review. The comparison report came back with a verdict I did not fully expect: one workflow produced clean, well-structured code; the other produced correct code. They failed on opposite axes.

That framing is more useful than any feature table.

The experiment

I built a live Forex market-hours visualizer twice: once with GitHub’s SpecKit, once with obra’s Superpowers. Same requirements. Same tech stack: Next.js App Router, TypeScript, Tailwind, shadcn/ui, Vitest.

Both repos are public:

If you want to go deep on how each workflow is actually used, the repos and the respective documentation are where to do that. This post is about what came out the other end, and when you would choose one over the other.

The independent review assessed correctness, architecture, SOLID/SRP, DRY, YAGNI, and test discipline across both codebases. The summary verdict: “superpowers is more correct; speckit is better factored and better tested. If forced to ship one, ship superpowers and back-port speckit’s structure and test discipline into it.”

That is where I will spend the rest of this post.

What the code actually showed

The decisive finding was in the domain model, and it was not subtle.

The SpecKit output baked session times as fixed UTC integers. London 8:00 to 17:00, New York 13:00 to 22:00, hardcoded. London actually trades 07:00 to 16:00 UTC in summer (BST). New York shifts by an hour too. For a live trading tool, every session is wrong by an hour for months at a time. The timeline cursor drifts. The overlap zones are off. The code looks fine on a January morning and is silently wrong by April.

The Superpowers output modeled each center as local hours and converted through the real timezone at runtime using Intl.DateTimeFormat. Its tests proved it: they asserted Sydney is UTC+11 in January and UTC+10 in July. That is the correct call for the domain.

The hydration story was similar. The SpecKit hook seeded state in the useState initializer, which runs during SSR and on the client with different times and timezones. Guaranteed hydration mismatch on every render. The Superpowers output returned null until a useEffect set the clock, which is the correct pattern for time-dependent UI in Next.js. The SpecKit output also shipped with “Create Next App” boilerplate metadata, uncleaned.

On architecture, the results flipped. The SpecKit codebase is cleanly separated: types, session data, pure logic, a single clock hook, presentational components with no timer coupling. LiveClock takes a ClockState prop and is timer-free. There is even a test asserting it never calls setInterval. That is textbook dependency inversion and it makes the logic trivially testable. I spent some time comparing the code visually from a professional developer’s perspective and SpecKit was the winner here.

The Superpowers output consolidated config and five functions into a single file and coupled the clock tick with session computation. Readable, but less granular. A mild SRP smell in one function that does two jobs in a single two-pass map.

On tests: SpecKit produced six test files covering logic, hooks, and three components. Better discipline. Superpowers produced one test file covering only the timezone math, which happens to be the genuinely hard part. But zero component or hook coverage.

The ideal is SpecKit’s breadth applied to Superpowers’s correct logic. Which is more or less what the recommendation said.

You can find the full code quality review report here.

How the two workflows actually feel different

SpecKit is an artifact you drive. You type /speckit.specify, the agent runs iterative dialogue to clarify the requirements into a structured PRD, you review it, and you move to /speckit.plan. The spec lives in your repo as a markdown file, tracked in version control. The constitution step encodes your standards once and they propagate into every subsequent spec and plan. It is a CLI you command.

Superpowers is more like a methodology the agent inhabits. The brainstorming skill fires before any code is written and will not proceed until the domain is clarified. The TDD skill fires during implementation and deletes code written before tests. The code review skill fires between tasks and blocks progress on critical issues. The skills are mandatory. The agent checks for them before any task and will not skip them.

The other thing that felt different was how each handled domain ambiguity. The Superpowers brainstorming phase kept probing the domain before committing to any implementation. That is probably why the timezone model came out correct. The spec phase is where you discover what you do not know about the problem. SpecKit’s iterative spec dialogue does similar work, but the developer drives the pace and the agent can be redirected. In Superpowers, the agent refuses to be redirected until it has answers.

Systematic debugging: the skill most write-ups miss

The Superpowers library ships a systematic-debugging skill. Most posts bury it in a bullet point. It warrants more than that.

It is a four-phase root cause process: root cause investigation, pattern analysis, hypothesis and testing, implementation. The skill explicitly forbids fixing what you have not understood. Three failed fix attempts trigger an architectural review rather than a fourth attempt at the same approach.

For brownfield work, this changes the calculus. The default agent behavior on a failing test or a broken integration is to look at the error message, make a plausible change, check if it passes, and continue. That works until it does not. When it does not, you have a chain of changes made by different context windows with no documented reasoning, and you are now debugging the debugging.

The systematic-debugging skill forces the agent to build a model of the actual behavior before touching the code. On a complex refactor, that discipline compounds quickly. SpecKit has no equivalent.

When SpecKit earns its place

The constitution is the feature most teams ignore. Encoding security requirements, testing expectations, naming conventions, and interface contracts once, and having them propagated into every spec and plan, is how you stop the same arguments from happening in every PR. That value compounds proportionally to team size.

SpecKit also earns its place in multi-tool teams. It supports 30+ agent integrations. The spec is the shared artifact. The /speckit.taskstoissues command creates GitHub Issues from the task list, which drops into an existing project management workflow without requiring anyone to change their tool. When three developers use Claude Code and two use Copilot, Superpowers is not a viable option. SpecKit is.

The broader pattern: SpecKit suits teams that have established implementation standards and detailed business requirements that need to be documented, reviewed, and agreed upon before implementation starts. The rigor is about specification quality and traceability, not about forcing the agent to follow a sequence.

When Superpowers earns its place

Greenfield projects where the domain needs exploration before the implementation can be correct. The Forex comparison is the clearest example: the timezone handling came out right because the workflow kept asking clarifying questions rather than letting the agent make plausible assumptions and proceed. If the brainstorming phase had not surfaced the DST question before any code was written, I would have had the same hardcoded UTC bug and discovered it weeks later.

Projects where discipline is a genuine constraint rather than an aspiration. TDD enforcement that deletes pre-test code is annoying until you are working on a codebase that has never had tests. The enforcement model does not care what the team aspires to. It cares what the agent does next.

And any work where debugging is the primary activity. The systematic-debugging skill is the sleeper feature on this list.

Up until recently Superpowers was limited to Claude Code but as of the v5.0 series, Superpowers also supports Cursor, Gemini CLI, GitHub Copilot CLI, Codex, and OpenCode. As of writing though I’m not sure how easy it is to switch from Claude Code to Codex for different phases, for example, brainstorming with Claude Code/Opus and implementation using Codex but I know this is something that’s easily done with SpecKit once the tasks files has been generated for a feature.

The quick comparison

	SpecKit	Superpowers
Enforcement model	Structured guidance, skippable	Mandatory, agent will not proceed
Multi-agent support	Yes (30+ providers)	Yes (v5.0+)
Constitution / standards encoding	Yes	No direct equivalent
TDD enforcement	No (This can be enforced by the constitution file however.)	Yes, deletes pre-test code
Git worktree isolation	No	Yes
Systematic debugging	No	Yes, 4-phase root cause process
GitHub Issues integration	Yes	No
Mid-implementation spec updates	Yes	No direct equivalent
Ideal for	Teams with standards, multi-tool environments, spec-review cycles	Greenfield, solo enforcement, brownfield debugging

The verdict

SpecKit produced clean code for my experimental workflow run. Comparitively, Superpowers produced correct code.

If you are working in a team with established standards and stakeholders who need to review specs before implementation, SpecKit’s artifact-based workflow is the right fit. The constitution and the GitHub Issues integration are genuinely useful features that do not have equivalents elsewhere.

If you are starting a project where the domain needs exploration before the implementation can be right, or you need an enforcement model that does not rely on developer discipline at deadline pressure, Superpowers is the better pick. The brainstorming phase is not overhead. It is the mechanism by which the agent discovers what it does not yet know.

The output quality difference I found was real. A well-structured codebase with a domain model that is wrong for half the forex market year is not a win.

GitHub SpecKit: github.com/github/spec-kit

obra/superpowers: github.com/obra/superpowers