Introduction
Shipping features without data is guessing. A/B testing transforms product decisions from opinions into evidence. The checkout button colour debate that consumes hours of meetings gets resolved in a week with actual user behaviour data.
Mobile A/B testing presents unique challenges compared to web. App store review cycles mean you cannot deploy changes instantly. Users update at different times, creating version fragmentation. Network conditions vary wildly, affecting experiment reliability.
This guide covers battle-tested approaches to mobile experimentation that work at scale. We will implement feature flags, design statistically valid experiments, and build the infrastructure to make data-driven decisions confidently.
Why Mobile A/B Testing Differs from Web
The Update Problem
Web deploys take seconds. Mobile deploys take days or weeks:
Web: Deploy → Live in seconds → Test immediately
Mobile: Deploy → App review (1-3 days) → User updates (weeks/months) → Test when enough users update
This delay fundamentally changes experimentation strategy. You cannot test UI changes through code deployments alone—by the time enough users update, market conditions may have changed.
The Solution: Server-Driven Configuration
// Instead of hardcoding UI decisions
const checkoutButtonColor = '#007AFF';
// Use server-driven configuration
interface ExperimentConfig {
checkoutButtonColor: string;
showDiscountBanner: boolean;
pricingTier: 'standard' | 'premium' | 'enterprise';
}
const config = await fetchExperimentConfig(userId);
const checkoutButtonColor = config.checkoutButtonColor;
Server-driven configuration lets you change app behaviour without app updates. The app fetches configuration on launch, enabling real-time experimentation.
Setting Up Fireb
ase Remote Config
Firebase Remote Config is the most accessible entry point for mobile A/B testing. It integrates with Firebase Analytics for experiment analysis and provides generous free tiers.
iOS Implementation
import FirebaseRemoteConfig
class ExperimentManager: ObservableObject {
static let shared = ExperimentManager()
private let remoteConfig = RemoteConfig.remoteConfig()
@Published var checkoutButtonVariant: String = "control"
@Published var showOnboardingV2: Bool = false
@Published var priceDisplayFormat: PriceFormat = .standard
enum PriceFormat: String {
case standard // $29.99
case rounded // $30
case monthly // $2.50/mo
}
private init() {
setupDefaults()
fetchConfig()
}
private func setupDefaults() {
let defaults: [String: NSObject] = [
"checkout_button_variant": "control" as NSObject,
"show_onboarding_v2": false as NSObject,
"price_display_format": "standard" as NSObject,
"feature_social_sharing": false as NSObject,
"min_cart_for_free_shipping": 100 as NSObject
]
remoteConfig.setDefaults(defaults)
// Development settings - fetch frequently during testing
#if DEBUG
let settings = RemoteConfigSettings()
settings.minimumFetchInterval = 0 // No caching in debug
remoteConfig.configSettings = settings
#endif
}
func fetchConfig() {
remoteConfig.fetchAndActivate { [weak self] status, error in
guard let self = self else { return }
if let error = error {
print("Remote config fetch failed: \(error.localizedDescription)")
return
}
DispatchQueue.main.async {
self.updatePublishedValues()
self.logExperimentExposure()
}
}
}
private func updatePublishedValues() {
checkoutButtonVariant = remoteConfig["checkout_button_variant"].stringValue ?? "control"
showOnboardingV2 = remoteConfig["show_onboarding_v2"].boolValue
if let formatString = remoteConfig["price_display_format"].stringValue,
let format = PriceFormat(rawValue: formatString) {
priceDisplayFormat = format
}
}
private func logExperimentExposure() {
// Log that user was exposed to these experiment variants
Analytics.logEvent("experiment_exposure", parameters: [
"checkout_button_variant": checkoutButtonVariant,
"onboarding_version": showOnboardingV2 ? "v2" : "v1",
"price_format": priceDisplayFormat.rawValue
])
}
// Type-safe accessors for feature flags
var freeShippingThreshold: Double {
remoteConfig["min_cart_for_free_shipping"].numberValue?.doubleValue ?? 100.0
}
var isSocialSharingEnabled: Bool {
remoteConfig["feature_social_sharing"].boolValue
}
}
Android Implementation
import com.google.firebase.remoteconfig.FirebaseRemoteConfig
import com.google.firebase.remoteconfig.FirebaseRemoteConfigSettings
import kotlinx.coroutines.flow.MutableStateFlow
import kotlinx.coroutines.flow.StateFlow
import kotlinx.coroutines.tasks.await
class ExperimentManager private constructor() {
private val remoteConfig = FirebaseRemoteConfig.getInstance()
private val _checkoutButtonVariant = MutableStateFlow("control")
val checkoutButtonVariant: StateFlow<String> = _checkoutButtonVariant
private val _showOnboardingV2 = MutableStateFlow(false)
val showOnboardingV2: StateFlow<Boolean> = _showOnboardingV2
private val _priceDisplayFormat = MutableStateFlow(PriceFormat.STANDARD)
val priceDisplayFormat: StateFlow<PriceFormat> = _priceDisplayFormat
enum class PriceFormat(val value: String) {
STANDARD("standard"), // $29.99
ROUNDED("rounded"), // $30
MONTHLY("monthly") // $2.50/mo
}
init {
setupDefaults()
}
private fun setupDefaults() {
val defaults = mapOf(
"checkout_button_variant" to "control",
"show_onboarding_v2" to false,
"price_display_format" to "standard",
"feature_social_sharing" to false,
"min_cart_for_free_shipping" to 100L
)
remoteConfig.setDefaultsAsync(defaults)
// Development settings
if (BuildConfig.DEBUG) {
val configSettings = FirebaseRemoteConfigSettings.Builder()
.setMinimumFetchIntervalInSeconds(0)
.build()
remoteConfig.setConfigSettingsAsync(configSettings)
}
}
suspend fun fetchConfig() {
try {
remoteConfig.fetchAndActivate().await()
updateStateFlows()
logExperimentExposure()
} catch (e: Exception) {
Log.e("ExperimentManager", "Remote config fetch failed", e)
}
}
private fun updateStateFlows() {
_checkoutButtonVariant.value = remoteConfig.getString("checkout_button_variant")
_showOnboardingV2.value = remoteConfig.getBoolean("show_onboarding_v2")
val formatString = remoteConfig.getString("price_display_format")
_priceDisplayFormat.value = PriceFormat.values()
.find { it.value == formatString } ?: PriceFormat.STANDARD
}
private fun logExperimentExposure() {
Firebase.analytics.logEvent("experiment_exposure") {
param("checkout_button_variant", _checkoutButtonVariant.value)
param("onboarding_version", if (_showOnboardingV2.value) "v2" else "v1")
param("price_format", _priceDisplayFormat.value.value)
}
}
val freeShippingThreshold: Double
get() = remoteConfig.getDouble("min_cart_for_free_shipping")
val isSocialSharingEnabled: Boolean
get() = remoteConfig.getBoolean("feature_social_sharing")
companion object {
@Volatile
private var instance: ExperimentManager? = null
fun getInstance(): ExperimentManager {
return instance ?: synchronized(this) {
instance ?: ExperimentManager().also { instance = it }
}
}
}
}
Designing Statistically
Valid Experiments
The most common A/B testing mistake is declaring winners too early. Statistical significance requires adequate sample size.
Sample Size Calculation
Before running any experiment, calculate required sample size:
interface ExperimentDesign {
baselineConversionRate: number; // Current conversion rate
minimumDetectableEffect: number; // Smallest improvement worth detecting
statisticalPower: number; // Usually 0.8 (80%)
significanceLevel: number; // Usually 0.05 (5%)
}
function calculateSampleSize(design: ExperimentDesign): number {
const {
baselineConversionRate: p1,
minimumDetectableEffect: mde,
statisticalPower,
significanceLevel
} = design;
const p2 = p1 * (1 + mde); // Expected conversion rate with improvement
const pooledP = (p1 + p2) / 2;
// Z-scores for power and significance
const zAlpha = 1.96; // For 95% confidence (two-tailed)
const zBeta = 0.84; // For 80% power
// Sample size formula
const numerator = 2 * pooledP * (1 - pooledP) * Math.pow(zAlpha + zBeta, 2);
const denominator = Math.pow(p1 - p2, 2);
return Math.ceil(numerator / denominator);
}
// Example: Testing checkout flow improvement
const experiment: ExperimentDesign = {
baselineConversionRate: 0.03, // 3% current conversion
minimumDetectableEffect: 0.15, // Want to detect 15% improvement
statisticalPower: 0.8,
significanceLevel: 0.05
};
const sampleSizePerVariant = calculateSampleSize(experiment);
console.log(`Need ${sampleSizePerVariant} users per variant`);
// Output: Need approximately 8,500 users per variant
Running Duration Estimation
function estimateExperimentDuration(
requiredSamplePerVariant: number,
dailyActiveUsers: number,
numberOfVariants: number,
exposureRate: number = 1.0 // What % of users see the experiment
): number {
const totalSampleNeeded = requiredSamplePerVariant * numberOfVariants;
const dailyExposures = dailyActiveUsers * exposureRate;
// Account for users who return multiple days (rough estimate)
const uniqueUsersPerDay = dailyExposures * 0.7;
return Math.ceil(totalSampleNeeded / uniqueUsersPerDay);
}
// Example
const daysNeeded = estimateExperimentDuration(
8500, // Sample per variant
5000, // Daily active users
2, // Control + 1 variant
1.0 // 100% exposure
);
console.log(`Experiment needs approximately ${daysNeeded} days`);
// Output: Experiment needs approximately 5 days
Implementing Feature Flags with LaunchDarkly
For more sophisticated experimentation, LaunchDarkly provides advanced targeting, instant kill switches, and enterprise-grade infrastructure.
iOS LaunchDarkly Integration
import LaunchDarkly
class FeatureFlagManager: ObservableObject {
static let shared = FeatureFlagManager()
private var ldClient: LDClient?
@Published var features: FeatureSet = .defaults
struct FeatureSet {
var newCheckoutFlow: Bool
var aiRecommendations: Bool
var darkModeDefault: Bool
var maxCartItems: Int
var promotionalBanner: PromoBanner?
static let defaults = FeatureSet(
newCheckoutFlow: false,
aiRecommendations: false,
darkModeDefault: false,
maxCartItems: 50,
promotionalBanner: nil
)
}
struct PromoBanner: Codable {
let title: String
let subtitle: String
let backgroundColor: String
let ctaText: String
let ctaUrl: String
}
func initialize(user: User) {
let config = LDConfig(mobileKey: "mob-XXXXX-XXXXXXXX")
let ldUser = LDUser(key: user.id)
ldUser.email = user.email
ldUser.custom = [
"subscription_tier": user.subscriptionTier,
"account_age_days": user.accountAgeDays,
"country": user.country,
"app_version": Bundle.main.appVersion
]
LDClient.start(config: config, user: ldUser) { [weak self] in
self?.ldClient = LDClient.get()
self?.setupObservers()
self?.refreshFeatures()
}
}
private func setupObservers() {
// Real-time updates when flags change
ldClient?.observe(keys: [
"new-checkout-flow",
"ai-recommendations",
"dark-mode-default",
"max-cart-items",
"promotional-banner"
], owner: self) { [weak self] _ in
self?.refreshFeatures()
}
}
private func refreshFeatures() {
guard let client = ldClient else { return }
DispatchQueue.main.async {
self.features = FeatureSet(
newCheckoutFlow: client.boolVariation(
forKey: "new-checkout-flow",
defaultValue: false
),
aiRecommendations: client.boolVariation(
forKey: "ai-recommendations",
defaultValue: false
),
darkModeDefault: client.boolVariation(
forKey: "dark-mode-default",
defaultValue: false
),
maxCartItems: client.intVariation(
forKey: "max-cart-items",
defaultValue: 50
),
promotionalBanner: self.decodePromoBanner(
client.jsonVariation(
forKey: "promotional-banner",
defaultValue: LDValue.null
)
)
)
}
}
private func decodePromoBanner(_ value: LDValue) -> PromoBanner? {
guard case .object = value else { return nil }
do {
let data = try JSONEncoder().encode(value)
return try JSONDecoder().decode(PromoBanner.self, from: data)
} catch {
return nil
}
}
// Track conversion events
func trackConversion(event: String, data: [String: Any]? = nil) {
ldClient?.track(key: event, data: data.map { LDValue(dictionaryLiteral: $0) })
}
}
Using Feature Flags in SwiftUI
struct CheckoutView: View {
@ObservedObject var featureFlags = FeatureFlagManager.shared
@StateObject var viewModel: CheckoutViewModel
var body: some View {
VStack {
if featureFlags.features.newCheckoutFlow {
NewCheckoutFlowView(viewModel: viewModel)
.onAppear {
featureFlags.trackConversion(event: "checkout-started", data: [
"variant": "new-flow"
])
}
} else {
LegacyCheckoutView(viewModel: viewModel)
.onAppear {
featureFlags.trackConversion(event: "checkout-started", data: [
"variant": "legacy"
])
}
}
if let banner = featureFlags.features.promotionalBanner {
PromotionalBannerView(banner: banner)
}
}
}
}
struct NewCheckoutFlowView: View {
@ObservedObject var viewModel: CheckoutViewModel
var body: some View {
// New streamlined checkout implementation
ScrollView {
VStack(spacing: 16) {
CartSummaryCard(items: viewModel.cartItems)
ExpressPaymentOptions(
applePay: viewModel.applePayAvailable,
googlePay: viewModel.googlePayAvailable
)
Divider()
AddressSection(address: $viewModel.shippingAddress)
PaymentSection(payment: $viewModel.paymentMethod)
OrderSummary(
subtotal: viewModel.subtotal,
shipping: viewModel.shipping,
tax: viewModel.tax,
total: viewModel.total
)
PlaceOrderButton(action: viewModel.placeOrder)
}
.padding()
}
}
}
Building a Custom Experimentation Platform
For full control, build your own experimentation infrastructure. This approach suits teams with specific requirements or those wanting to avoid vendor lock-in.
Experiment Assignment Service
// experiment-service.ts
interface Experiment {
id: string;
name: string;
variants: Variant[];
targetingRules: TargetingRule[];
startDate: Date;
endDate: Date | null;
status: 'draft' | 'running' | 'paused' | 'completed';
}
interface Variant {
id: string;
name: string;
weight: number; // 0-100, must sum to 100 across variants
config: Record<string, any>;
}
interface TargetingRule {
attribute: string;
operator: 'equals' | 'contains' | 'gt' | 'lt' | 'in';
value: any;
}
interface UserContext {
userId: string;
deviceId: string;
platform: 'ios' | 'android';
appVersion: string;
country: string;
subscriptionTier: string;
accountCreatedAt: Date;
customAttributes: Record<string, any>;
}
class ExperimentService {
private experiments: Map<string, Experiment> = new Map();
private assignments: Map<string, Map<string, string>> = new Map(); // userId -> experimentId -> variantId
async getVariant(
experimentId: string,
user: UserContext
): Promise<Variant | null> {
const experiment = this.experiments.get(experimentId);
if (!experiment || experiment.status !== 'running') {
return null;
}
// Check if user passes targeting rules
if (!this.passesTargeting(experiment.targetingRules, user)) {
return null;
}
// Check for existing assignment
const existingAssignment = this.getExistingAssignment(user.userId, experimentId);
if (existingAssignment) {
return experiment.variants.find(v => v.id === existingAssignment) || null;
}
// Deterministic assignment based on user ID
const variant = this.assignVariant(experiment, user.userId);
// Store assignment
this.storeAssignment(user.userId, experimentId, variant.id);
// Log exposure
await this.logExposure(experiment, variant, user);
return variant;
}
private passesTargeting(rules: TargetingRule[], user: UserContext): boolean {
return rules.every(rule => {
const value = this.getAttribute(user, rule.attribute);
switch (rule.operator) {
case 'equals':
return value === rule.value;
case 'contains':
return String(value).includes(rule.value);
case 'gt':
return Number(value) > Number(rule.value);
case 'lt':
return Number(value) < Number(rule.value);
case 'in':
return Array.isArray(rule.value) && rule.value.includes(value);
default:
return false;
}
});
}
private assignVariant(experiment: Experiment, userId: string): Variant {
// Deterministic hash ensures same user always gets same variant
const hash = this.hashString(`${experiment.id}:${userId}`);
const bucket = hash % 100;
let cumulative = 0;
for (const variant of experiment.variants) {
cumulative += variant.weight;
if (bucket < cumulative) {
return variant;
}
}
// Fallback to first variant
return experiment.variants[0];
}
private hashString(str: string): number {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
private async logExposure(
experiment: Experiment,
variant: Variant,
user: UserContext
): Promise<void> {
const event = {
eventType: 'experiment_exposure',
experimentId: experiment.id,
experimentName: experiment.name,
variantId: variant.id,
variantName: variant.name,
userId: user.userId,
deviceId: user.deviceId,
platform: user.platform,
appVersion: user.appVersion,
timestamp: new Date().toISOString()
};
// Send to analytics pipeline
await this.analyticsClient.track(event);
}
}
Mobile SDK for Custom Platform
// iOS SDK
class CustomExperimentSDK {
static let shared = CustomExperimentSDK()
private let apiClient: ExperimentAPIClient
private var cachedAssignments: [String: ExperimentVariant] = [:]
private var userContext: UserContext?
struct UserContext: Codable {
let userId: String
let deviceId: String
let platform: String
let appVersion: String
let country: String
let subscriptionTier: String
let customAttributes: [String: AnyCodable]
}
struct ExperimentVariant: Codable {
let experimentId: String
let variantId: String
let variantName: String
let config: [String: AnyCodable]
}
func initialize(userId: String, attributes: [String: Any]) {
self.userContext = UserContext(
userId: userId,
deviceId: UIDevice.current.identifierForVendor?.uuidString ?? "",
platform: "ios",
appVersion: Bundle.main.appVersion,
country: Locale.current.region?.identifier ?? "AU",
subscriptionTier: attributes["subscription_tier"] as? String ?? "free",
customAttributes: attributes.mapValues { AnyCodable($0) }
)
// Prefetch all active experiments
Task {
await prefetchExperiments()
}
}
private func prefetchExperiments() async {
guard let context = userContext else { return }
do {
let assignments = try await apiClient.fetchAllAssignments(context: context)
await MainActor.run {
for assignment in assignments {
cachedAssignments[assignment.experimentId] = assignment
}
}
} catch {
print("Failed to prefetch experiments: \(error)")
}
}
func getVariant(experimentId: String) async -> ExperimentVariant? {
// Return cached if available
if let cached = cachedAssignments[experimentId] {
return cached
}
// Fetch from server
guard let context = userContext else { return nil }
do {
let variant = try await apiClient.getVariant(
experimentId: experimentId,
context: context
)
cachedAssignments[experimentId] = variant
return variant
} catch {
print("Failed to get variant for \(experimentId): \(error)")
return nil
}
}
func trackConversion(experimentId: String, event: String, value: Double? = nil) {
guard let variant = cachedAssignments[experimentId],
let context = userContext else { return }
Task {
try? await apiClient.trackConversion(
experimentId: experimentId,
variantId: variant.variantId,
event: event,
value: value,
context: context
)
}
}
}
Analysing Experiment Results
Statistical Significance Calculator
interface ExperimentResults {
control: VariantResults;
treatment: VariantResults;
}
interface VariantResults {
visitors: number;
conversions: number;
conversionRate: number;
}
interface AnalysisResult {
isSignificant: boolean;
pValue: number;
confidenceLevel: number;
relativeUplift: number;
confidenceInterval: [number, number];
recommendation: string;
}
function analyzeExperiment(results: ExperimentResults): AnalysisResult {
const { control, treatment } = results;
// Calculate conversion rates
const p1 = control.conversions / control.visitors;
const p2 = treatment.conversions / treatment.visitors;
// Pooled proportion
const pooledP = (control.conversions + treatment.conversions) /
(control.visitors + treatment.visitors);
// Standard error
const se = Math.sqrt(
pooledP * (1 - pooledP) * (1/control.visitors + 1/treatment.visitors)
);
// Z-score
const zScore = (p2 - p1) / se;
// P-value (two-tailed)
const pValue = 2 * (1 - normalCDF(Math.abs(zScore)));
// Confidence interval for difference
const marginOfError = 1.96 * se;
const difference = p2 - p1;
const confidenceInterval: [number, number] = [
difference - marginOfError,
difference + marginOfError
];
// Relative uplift
const relativeUplift = ((p2 - p1) / p1) * 100;
const isSignificant = pValue < 0.05;
let recommendation: string;
if (!isSignificant) {
recommendation = "No significant difference detected. Continue running the experiment or accept null hypothesis.";
} else if (relativeUplift > 0) {
recommendation = `Treatment shows ${relativeUplift.toFixed(1)}% improvement. Consider rolling out to all users.`;
} else {
recommendation = `Treatment shows ${Math.abs(relativeUplift).toFixed(1)}% decline. Recommend keeping control.`;
}
return {
isSignificant,
pValue,
confidenceLevel: (1 - pValue) * 100,
relativeUplift,
confidenceInterval,
recommendation
};
}
// Normal CDF approximation
function normalCDF(x: number): number {
const a1 = 0.254829592;
const a2 = -0.284496736;
const a3 = 1.421413741;
const a4 = -1.453152027;
const a5 = 1.061405429;
const p = 0.3275911;
const sign = x < 0 ? -1 : 1;
x = Math.abs(x) / Math.sqrt(2);
const t = 1.0 / (1.0 + p * x);
const y = 1.0 - (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t * Math.exp(-x * x);
return 0.5 * (1.0 + sign * y);
}
// Usage example
const results: ExperimentResults = {
control: {
visitors: 10000,
conversions: 300,
conversionRate: 0.03
},
treatment: {
visitors: 10000,
conversions: 360,
conversionRate: 0.036
}
};
const analysis = analyzeExperiment(results);
console.log(analysis);
// {
// isSignificant: true,
// pValue: 0.023,
// confidenceLevel: 97.7,
// relativeUplift: 20,
// confidenceInterval: [0.001, 0.011],
// recommendation: "Treatment shows 20.0% improvement. Consider rolling out to all users."
// }
Common A/B Testing Mistakes
Mistake 1: Peeking at Results Too Early
// BAD: Checking daily and stopping when "significant"
const earlyResults = {
day1: { pValue: 0.15, significant: false },
day2: { pValue: 0.08, significant: false },
day3: { pValue: 0.04, significant: true }, // "Let's ship it!"
};
// This inflates false positive rate dramatically
// The 5% significance level assumes you check ONCE at the end
// GOOD: Pre-commit to duration and check only at the end
interface ExperimentProtocol {
minimumDuration: number; // days
minimumSampleSize: number;
checkDate: Date;
earlyStoppingRules: EarlyStoppingRule[];
}
interface EarlyStoppingRule {
type: 'harm' | 'overwhelming_success';
threshold: number;
minimumSample: number;
}
const protocol: ExperimentProtocol = {
minimumDuration: 14,
minimumSampleSize: 17000,
checkDate: new Date('2026-09-01'),
earlyStoppingRules: [
// Only stop early if treatment is causing clear harm
{ type: 'harm', threshold: -0.20, minimumSample: 5000 }
]
};
Mistake 2: Multiple Comparisons Without Correction
// Testing 5 variants without correction
const variants = ['A', 'B', 'C', 'D', 'E'];
const significanceLevel = 0.05;
// Probability of at least one false positive:
// 1 - (1 - 0.05)^4 = 18.5% (comparing each to control)
// Solution: Bonferroni correction
const correctedSignificance = significanceLevel / (variants.length - 1);
// 0.05 / 4 = 0.0125
// Better solution: Sequential testing or Benjamini-Hochberg
function benjaminiHochberg(pValues: number[], alpha: number = 0.05): boolean[] {
const sorted = pValues
.map((p, i) => ({ p, i }))
.sort((a, b) => a.p - b.p);
const m = pValues.length;
const significant = new Array(m).fill(false);
for (let k = m; k >= 1; k--) {
const threshold = (k / m) * alpha;
if (sorted[k - 1].p <= threshold) {
// This and all smaller p-values are significant
for (let j = 0; j < k; j++) {
significant[sorted[j].i] = true;
}
break;
}
}
return significant;
}
Mistake 3: Ignoring Novelty and Learning Effects
// Users interact differently with new features initially
interface TimeSeriesAnalysis {
experimentId: string;
dailyMetrics: DailyMetric[];
}
interface DailyMetric {
date: Date;
controlConversionRate: number;
treatmentConversionRate: number;
sampleSize: number;
}
function detectNoveltyEffect(data: TimeSeriesAnalysis): boolean {
const metrics = data.dailyMetrics;
if (metrics.length < 7) return false;
// Compare first week to subsequent weeks
const firstWeek = metrics.slice(0, 7);
const subsequent = metrics.slice(7);
const firstWeekUplift = average(
firstWeek.map(d => (d.treatmentConversionRate - d.controlConversionRate) / d.controlConversionRate)
);
const subsequentUplift = average(
subsequent.map(d => (d.treatmentConversionRate - d.controlConversionRate) / d.controlConversionRate)
);
// If uplift decreased significantly, novelty effect likely
const decline = (firstWeekUplift - subsequentUplift) / firstWeekUplift;
return decline > 0.3; // More than 30% decline suggests novelty effect
}
// Recommendation: Run experiments for at least 2-3 weeks to capture true long-term behaviour
Best Practices Summary
Experiment Design Checklist
- Define success metrics before starting
- Calculate required sample size based on minimum detectable effect
- Set experiment duration and commit to it
- Document targeting rules and exclusions
- Implement proper randomisation (deterministic by user ID)
- Log all exposures for accurate analysis
Technical Implementation Checklist
- Cache experiment assignments locally to ensure consistency
- Handle offline gracefully with sensible defaults
- Track exposure events separately from conversion events
- Implement kill switches for rapid rollback
- Version your experiment configs for reproducibility
- Monitor experiment health (sample ratio mismatch, data quality)
Analysis Checklist
- Wait for full sample size before analysing
- Check for sample ratio mismatch (should be close to expected split)
- Segment analysis (iOS vs Android, new vs returning users)
- Consider practical significance, not just statistical
- Document learnings for future experiments
Conclusion
A/B testing transforms product development from opinion-driven to evidence-based. The technical implementation is straightforward—Firebase Remote Config gets you started in an afternoon. The challenge lies in discipline: designing experiments properly, waiting for statistical significance, and acting on results even when they contradict assumptions.
Start with high-impact experiments on critical user flows: onboarding, checkout, and retention features. Build experimentation infrastructure incrementally. Track experiment velocity as a team metric—the more experiments you run, the faster you learn.
The best product teams treat every feature as a hypothesis to validate, not a solution to ship. A/B testing provides the feedback loop that makes continuous improvement possible.
Building an app that needs experimentation infrastructure? We have helped dozens of Australian startups implement data-driven product development. Contact us to discuss your experimentation strategy.