Engineering Practitioner Brief / 18 May 2026

Feature Flag Refactor Cost

Feature flags turn a risky refactor into a series of small reversible steps. The new code path is gated behind a flag, enabled gradually for an increasing share of traffic, then made permanent when proven stable. The pattern works well at the per-refactor level but accumulates flag-debt at the system level if the post-rollout cleanup never happens. This page covers both the per-vendor cost of the tooling and the systemic cost of flag-debt.

The Vendor Landscape

The feature-flag market consolidated around five major vendors plus a healthy open-source layer through 2024 and 2025. The vendor choice matters less now than it did three years ago, because OpenFeature (the CNCF feature-flag specification) provides a common API that lets teams switch vendors without rewriting calling code.

Vendor	Starting Price	Differentiator	Self-Host Option
LaunchDarkly	$10/seat/mo	Most mature, deepest enterprise integrations	Yes (Federal tier)
Statsig	Free tier + usage	Bundled experimentation, generous free tier	No
Split (Harness)	Contact	Bundled with Harness CD pipeline	Yes
Flagsmith	$0 self-host	Open source, self-host friendly	Yes (primary)
ConfigCat	$0 to $99/mo	Simple, predictable pricing	No
AWS AppConfig	$0.20/1K requests	AWS-native, IAM integration	Managed
GrowthBook	$0 self-host	Open source, A/B + flags	Yes (primary)

For most teams the cost is in the $0 to $20,000 per year band. The choice is rarely about pricing alone; integration depth, audit-log requirements, regional data residency, and SDK language coverage all matter.

OpenFeature: the End of Vendor Lock-in

OpenFeature is a CNCF specification for feature-flag SDK interfaces. The idea: define a common API for flag evaluation, with vendor-specific providers slotting in behind it. Teams using OpenFeature can switch flag vendors without rewriting calling code; they only need to swap the provider.

As of 2026, OpenFeature has provider implementations for LaunchDarkly, Flagsmith, Split, ConfigCat, GrowthBook, AWS AppConfig, and several others. SDKs exist for Java, Python, Node.js, Go, .NET, PHP, Ruby, Swift, Kotlin, Rust, and several smaller languages.

The economic effect is that switching cost between vendors has dropped to near zero. This has compressed vendor pricing on the high end (enterprise contracts that previously commanded premium pricing on lock-in arguments) and pushed differentiation into the surrounding capabilities (experimentation, observability integration, governance). For teams selecting a vendor in 2026, the recommendation is to wrap all flag calls in OpenFeature SDK from the start, so the vendor choice is reversible.

Per-Refactor Cost Profile

A typical feature-flag-driven refactor of a single function or service follows a predictable cost shape:

Stage	Engineer-Hours	What Happens
Flag creation and gating	1 to 4	New flag in vendor UI, code wrapped in if-flag branch
New implementation	Varies (the actual refactor)	The refactor itself, behind the flag, off in production
Internal rollout	2 to 8	Flag enabled for team members, light monitoring
Gradual user rollout	4 to 16	1% to 10% to 50% to 100% over 1 to 4 weeks
Cleanup (post-100%)	2 to 8	Remove flag check, delete old code path, remove flag
Total flag overhead	9 to 36	On top of the refactor itself

The total flag-pattern overhead is 9 to 36 engineer-hours per refactor, on top of the refactor work itself. For a refactor that would have been 80 to 200 hours, the flag overhead is 5 to 20 percent. The benefit is the ability to revert in seconds without a code deploy if the new path misbehaves in production.

The Flag-Debt Accumulation Problem

The flag pattern works only if the cleanup step actually happens. LaunchDarkly's own customer data (published in their 2023 Feature Flag Lifecycle report) shows that flag-deletion rates lag flag-creation rates across nearly all customer cohorts. The result is flag debt: an accumulation of stale flags whose intended cleanup never happened.

Each stale flag costs in three ways. First, the code behind the flag is dead but compiles; it is the dead-code-cost pattern (see dead code cost). Second, the routing layer that evaluates flags has more decisions to make per request; aggregate latency rises. Third, the flag-management UI has more flags than anyone can remember; new flag creation gets harder because the namespace is cluttered.

A typical mature organisation using a feature-flag system has 30 to 60 percent stale flags. The cost of cleaning them is modest (typically 0.5 to 2 hours per flag once the team commits to the work) but the cleanup tends not to happen because it does not ship a customer-facing improvement.

The standard mitigation is a flag-expiration policy: every flag is created with a TTL (typically 60 to 90 days). When the flag reaches its TTL, the system notifies the owner. Persistent unanswered notifications eventually escalate to a hard-fail (the flag is forcibly removed or set to a defined state). LaunchDarkly, Statsig, and others all support this pattern in their UIs as of 2026.

Build vs Buy

A basic in-house feature-flag system is a small project. A boolean column in a config table, a check in the application code, a simple admin UI to toggle the value. Total build cost: 60 to 120 engineer-hours.

The build-vs-buy decision flips when the production-grade requirements arrive. Per-user targeting. Per-percentage-of-traffic rollout. Audit log of every flag change with attribution. Rollout-safety guards (block flags from being changed during incidents). SDK across multiple languages with consistent semantics. Edge caching for low-latency evaluation. Once the in-house system needs to match a commercial vendor on these capabilities, the build cost rises into the 1,000+ engineer-hour range with ongoing maintenance, which exceeds even Enterprise LaunchDarkly pricing within 18 to 36 months for most teams.

The exceptions are large organisations with specialised needs: tight data-residency requirements, very high request volumes that make per-evaluation vendor pricing untenable, or governance models that require the system to be on-premises with on-staff support. For most teams, the buy decision (often starting with a free tier and upgrading later) is the lower-cost path.

Frequently Asked Questions

What is feature-flag-driven refactoring?

A pattern for shipping a refactor incrementally. The refactored code is gated behind a feature flag. The flag is initially off (old path runs); the engineer enables it for internal traffic, then for 1 percent of users, then 10 percent, then 100 percent. When the new path is fully proven, the old path and the flag are removed in a cleanup commit.

How much does LaunchDarkly cost?

Tiered pricing. LaunchDarkly public pricing in 2026 starts around $10 per seat per month for the Starter tier, with Pro and Enterprise tiers priced on contact. Most mid-sized teams using LaunchDarkly for refactor flags pay in the $2,000 to $20,000 per year range. Volume discounts apply at scale.

What does Statsig cost?

Statsig has a free tier up to a generous monthly event limit, then usage-based pricing for higher volumes. Most teams stay on the free tier or pay $1,000 to $10,000 per year, less than the equivalent LaunchDarkly tier in many cases. Statsig also bundles experimentation features (A/B testing, statistical analysis) that LaunchDarkly charges extra for.

Is build-your-own feature flagging worth it?

Only for very large organizations or for specialized needs. A basic in-house flag system is two engineer-weeks to build, plus ongoing maintenance. The cost of running it at production reliability (UI for flag management, audit log, rollout safety) usually exceeds the vendor cost within 12 to 24 months for any non-trivial team.

What is OpenFeature?

OpenFeature is a vendor-neutral feature-flag specification under the CNCF umbrella. It provides a common SDK API; the actual flag evaluation is provided by a vendor's OpenFeature provider (LaunchDarkly, Flagsmith, Split, ConfigCat, AWS AppConfig). OpenFeature lets you switch vendors without rewriting the calling code, which removes the lock-in argument that vendors used to win on.

What is flag debt?

The accumulation of feature flags that should have been cleaned up but were not. Each stale flag is a code-debt line: dead code behind the false branch, complexity in the routing layer, audit-log noise. LaunchDarkly's own customer data, published in their 2023 Feature Flag Lifecycle report, shows flag-deletion rates significantly below flag-creation rates across nearly all customer cohorts.