Introduction

Shipping features without data is guessing. A/B testing transforms product decisions from opinions into evidence. The checkout button colour debate that consumes hours of meetings gets resolved in a week with actual user behaviour data.

Mobile A/B testing presents unique challenges compared to web. App store review cycles mean you cannot deploy changes instantly. Users update at different times, creating version fragmentation. Network conditions vary wildly, affecting experiment reliability.

This guide covers battle-tested approaches to mobile experimentation that work at scale. We will implement feature flags, design statistically valid experiments, and build the infrastructure to make data-driven decisions confidently.

Why Mobile A/B Testing Differs from Web

The Update Problem

Web deploys take seconds. Mobile deploys take days or weeks:

Web:      Deploy → Live in seconds → Test immediately
Mobile:   Deploy → App review (1-3 days) → User updates (weeks/months) → Test when enough users update

This delay fundamentally changes experimentation strategy. You cannot test UI changes through code deployments alone—by the time enough users update, market conditions may have changed.

The Solution: Server-Driven Configuration

// Instead of hardcoding UI decisions
const checkoutButtonColor = '#007AFF';

// Use server-driven configuration
interface ExperimentConfig {
  checkoutButtonColor: string;
  showDiscountBanner: boolean;
  pricingTier: 'standard' | 'premium' | 'enterprise';
}

const config = await fetchExperimentConfig(userId);
const checkoutButtonColor = config.checkoutButtonColor;

Server-driven configuration lets you change app behaviour without app updates. The app fetches configuration on launch, enabling real-time experimentation.

Setting Up Fireb

Setting Up Firebase Remote Config Infographic ase Remote Config

Firebase Remote Config is the most accessible entry point for mobile A/B testing. It integrates with Firebase Analytics for experiment analysis and provides generous free tiers.

iOS Implementation

import FirebaseRemoteConfig

class ExperimentManager: ObservableObject {
    static let shared = ExperimentManager()

    private let remoteConfig = RemoteConfig.remoteConfig()

    @Published var checkoutButtonVariant: String = "control"
    @Published var showOnboardingV2: Bool = false
    @Published var priceDisplayFormat: PriceFormat = .standard

    enum PriceFormat: String {
        case standard // $29.99
        case rounded  // $30
        case monthly  // $2.50/mo
    }

    private init() {
        setupDefaults()
        fetchConfig()
    }

    private func setupDefaults() {
        let defaults: [String: NSObject] = [
            "checkout_button_variant": "control" as NSObject,
            "show_onboarding_v2": false as NSObject,
            "price_display_format": "standard" as NSObject,
            "feature_social_sharing": false as NSObject,
            "min_cart_for_free_shipping": 100 as NSObject
        ]

        remoteConfig.setDefaults(defaults)

        // Development settings - fetch frequently during testing
        #if DEBUG
        let settings = RemoteConfigSettings()
        settings.minimumFetchInterval = 0 // No caching in debug
        remoteConfig.configSettings = settings
        #endif
    }

    func fetchConfig() {
        remoteConfig.fetchAndActivate { [weak self] status, error in
            guard let self = self else { return }

            if let error = error {
                print("Remote config fetch failed: \(error.localizedDescription)")
                return
            }

            DispatchQueue.main.async {
                self.updatePublishedValues()
                self.logExperimentExposure()
            }
        }
    }

    private func updatePublishedValues() {
        checkoutButtonVariant = remoteConfig["checkout_button_variant"].stringValue ?? "control"
        showOnboardingV2 = remoteConfig["show_onboarding_v2"].boolValue

        if let formatString = remoteConfig["price_display_format"].stringValue,
           let format = PriceFormat(rawValue: formatString) {
            priceDisplayFormat = format
        }
    }

    private func logExperimentExposure() {
        // Log that user was exposed to these experiment variants
        Analytics.logEvent("experiment_exposure", parameters: [
            "checkout_button_variant": checkoutButtonVariant,
            "onboarding_version": showOnboardingV2 ? "v2" : "v1",
            "price_format": priceDisplayFormat.rawValue
        ])
    }

    // Type-safe accessors for feature flags
    var freeShippingThreshold: Double {
        remoteConfig["min_cart_for_free_shipping"].numberValue?.doubleValue ?? 100.0
    }

    var isSocialSharingEnabled: Bool {
        remoteConfig["feature_social_sharing"].boolValue
    }
}

Android Implementation

import com.google.firebase.remoteconfig.FirebaseRemoteConfig
import com.google.firebase.remoteconfig.FirebaseRemoteConfigSettings
import kotlinx.coroutines.flow.MutableStateFlow
import kotlinx.coroutines.flow.StateFlow
import kotlinx.coroutines.tasks.await

class ExperimentManager private constructor() {

    private val remoteConfig = FirebaseRemoteConfig.getInstance()

    private val _checkoutButtonVariant = MutableStateFlow("control")
    val checkoutButtonVariant: StateFlow<String> = _checkoutButtonVariant

    private val _showOnboardingV2 = MutableStateFlow(false)
    val showOnboardingV2: StateFlow<Boolean> = _showOnboardingV2

    private val _priceDisplayFormat = MutableStateFlow(PriceFormat.STANDARD)
    val priceDisplayFormat: StateFlow<PriceFormat> = _priceDisplayFormat

    enum class PriceFormat(val value: String) {
        STANDARD("standard"),  // $29.99
        ROUNDED("rounded"),    // $30
        MONTHLY("monthly")     // $2.50/mo
    }

    init {
        setupDefaults()
    }

    private fun setupDefaults() {
        val defaults = mapOf(
            "checkout_button_variant" to "control",
            "show_onboarding_v2" to false,
            "price_display_format" to "standard",
            "feature_social_sharing" to false,
            "min_cart_for_free_shipping" to 100L
        )

        remoteConfig.setDefaultsAsync(defaults)

        // Development settings
        if (BuildConfig.DEBUG) {
            val configSettings = FirebaseRemoteConfigSettings.Builder()
                .setMinimumFetchIntervalInSeconds(0)
                .build()
            remoteConfig.setConfigSettingsAsync(configSettings)
        }
    }

    suspend fun fetchConfig() {
        try {
            remoteConfig.fetchAndActivate().await()
            updateStateFlows()
            logExperimentExposure()
        } catch (e: Exception) {
            Log.e("ExperimentManager", "Remote config fetch failed", e)
        }
    }

    private fun updateStateFlows() {
        _checkoutButtonVariant.value = remoteConfig.getString("checkout_button_variant")
        _showOnboardingV2.value = remoteConfig.getBoolean("show_onboarding_v2")

        val formatString = remoteConfig.getString("price_display_format")
        _priceDisplayFormat.value = PriceFormat.values()
            .find { it.value == formatString } ?: PriceFormat.STANDARD
    }

    private fun logExperimentExposure() {
        Firebase.analytics.logEvent("experiment_exposure") {
            param("checkout_button_variant", _checkoutButtonVariant.value)
            param("onboarding_version", if (_showOnboardingV2.value) "v2" else "v1")
            param("price_format", _priceDisplayFormat.value.value)
        }
    }

    val freeShippingThreshold: Double
        get() = remoteConfig.getDouble("min_cart_for_free_shipping")

    val isSocialSharingEnabled: Boolean
        get() = remoteConfig.getBoolean("feature_social_sharing")

    companion object {
        @Volatile
        private var instance: ExperimentManager? = null

        fun getInstance(): ExperimentManager {
            return instance ?: synchronized(this) {
                instance ?: ExperimentManager().also { instance = it }
            }
        }
    }
}

Designing Statistically

Valid Experiments

The most common A/B testing mistake is declaring winners too early. Statistical significance requires adequate sample size.

Sample Size Calculation

Before running any experiment, calculate required sample size:

interface ExperimentDesign {
  baselineConversionRate: number;  // Current conversion rate
  minimumDetectableEffect: number; // Smallest improvement worth detecting
  statisticalPower: number;        // Usually 0.8 (80%)
  significanceLevel: number;       // Usually 0.05 (5%)
}

function calculateSampleSize(design: ExperimentDesign): number {
  const {
    baselineConversionRate: p1,
    minimumDetectableEffect: mde,
    statisticalPower,
    significanceLevel
  } = design;

  const p2 = p1 * (1 + mde); // Expected conversion rate with improvement
  const pooledP = (p1 + p2) / 2;

  // Z-scores for power and significance
  const zAlpha = 1.96;  // For 95% confidence (two-tailed)
  const zBeta = 0.84;   // For 80% power

  // Sample size formula
  const numerator = 2 * pooledP * (1 - pooledP) * Math.pow(zAlpha + zBeta, 2);
  const denominator = Math.pow(p1 - p2, 2);

  return Math.ceil(numerator / denominator);
}

// Example: Testing checkout flow improvement
const experiment: ExperimentDesign = {
  baselineConversionRate: 0.03,  // 3% current conversion
  minimumDetectableEffect: 0.15, // Want to detect 15% improvement
  statisticalPower: 0.8,
  significanceLevel: 0.05
};

const sampleSizePerVariant = calculateSampleSize(experiment);
console.log(`Need ${sampleSizePerVariant} users per variant`);
// Output: Need approximately 8,500 users per variant

Running Duration Estimation

function estimateExperimentDuration(
  requiredSamplePerVariant: number,
  dailyActiveUsers: number,
  numberOfVariants: number,
  exposureRate: number = 1.0 // What % of users see the experiment
): number {
  const totalSampleNeeded = requiredSamplePerVariant * numberOfVariants;
  const dailyExposures = dailyActiveUsers * exposureRate;

  // Account for users who return multiple days (rough estimate)
  const uniqueUsersPerDay = dailyExposures * 0.7;

  return Math.ceil(totalSampleNeeded / uniqueUsersPerDay);
}

// Example
const daysNeeded = estimateExperimentDuration(
  8500,   // Sample per variant
  5000,   // Daily active users
  2,      // Control + 1 variant
  1.0     // 100% exposure
);
console.log(`Experiment needs approximately ${daysNeeded} days`);
// Output: Experiment needs approximately 5 days

Implementing Feature Flags with LaunchDarkly

For more sophisticated experimentation, LaunchDarkly provides advanced targeting, instant kill switches, and enterprise-grade infrastructure.

iOS LaunchDarkly Integration

import LaunchDarkly

class FeatureFlagManager: ObservableObject {
    static let shared = FeatureFlagManager()

    private var ldClient: LDClient?

    @Published var features: FeatureSet = .defaults

    struct FeatureSet {
        var newCheckoutFlow: Bool
        var aiRecommendations: Bool
        var darkModeDefault: Bool
        var maxCartItems: Int
        var promotionalBanner: PromoBanner?

        static let defaults = FeatureSet(
            newCheckoutFlow: false,
            aiRecommendations: false,
            darkModeDefault: false,
            maxCartItems: 50,
            promotionalBanner: nil
        )
    }

    struct PromoBanner: Codable {
        let title: String
        let subtitle: String
        let backgroundColor: String
        let ctaText: String
        let ctaUrl: String
    }

    func initialize(user: User) {
        let config = LDConfig(mobileKey: "mob-XXXXX-XXXXXXXX")

        let ldUser = LDUser(key: user.id)
        ldUser.email = user.email
        ldUser.custom = [
            "subscription_tier": user.subscriptionTier,
            "account_age_days": user.accountAgeDays,
            "country": user.country,
            "app_version": Bundle.main.appVersion
        ]

        LDClient.start(config: config, user: ldUser) { [weak self] in
            self?.ldClient = LDClient.get()
            self?.setupObservers()
            self?.refreshFeatures()
        }
    }

    private func setupObservers() {
        // Real-time updates when flags change
        ldClient?.observe(keys: [
            "new-checkout-flow",
            "ai-recommendations",
            "dark-mode-default",
            "max-cart-items",
            "promotional-banner"
        ], owner: self) { [weak self] _ in
            self?.refreshFeatures()
        }
    }

    private func refreshFeatures() {
        guard let client = ldClient else { return }

        DispatchQueue.main.async {
            self.features = FeatureSet(
                newCheckoutFlow: client.boolVariation(
                    forKey: "new-checkout-flow",
                    defaultValue: false
                ),
                aiRecommendations: client.boolVariation(
                    forKey: "ai-recommendations",
                    defaultValue: false
                ),
                darkModeDefault: client.boolVariation(
                    forKey: "dark-mode-default",
                    defaultValue: false
                ),
                maxCartItems: client.intVariation(
                    forKey: "max-cart-items",
                    defaultValue: 50
                ),
                promotionalBanner: self.decodePromoBanner(
                    client.jsonVariation(
                        forKey: "promotional-banner",
                        defaultValue: LDValue.null
                    )
                )
            )
        }
    }

    private func decodePromoBanner(_ value: LDValue) -> PromoBanner? {
        guard case .object = value else { return nil }

        do {
            let data = try JSONEncoder().encode(value)
            return try JSONDecoder().decode(PromoBanner.self, from: data)
        } catch {
            return nil
        }
    }

    // Track conversion events
    func trackConversion(event: String, data: [String: Any]? = nil) {
        ldClient?.track(key: event, data: data.map { LDValue(dictionaryLiteral: $0) })
    }
}

Using Feature Flags in SwiftUI

struct CheckoutView: View {
    @ObservedObject var featureFlags = FeatureFlagManager.shared
    @StateObject var viewModel: CheckoutViewModel

    var body: some View {
        VStack {
            if featureFlags.features.newCheckoutFlow {
                NewCheckoutFlowView(viewModel: viewModel)
                    .onAppear {
                        featureFlags.trackConversion(event: "checkout-started", data: [
                            "variant": "new-flow"
                        ])
                    }
            } else {
                LegacyCheckoutView(viewModel: viewModel)
                    .onAppear {
                        featureFlags.trackConversion(event: "checkout-started", data: [
                            "variant": "legacy"
                        ])
                    }
            }

            if let banner = featureFlags.features.promotionalBanner {
                PromotionalBannerView(banner: banner)
            }
        }
    }
}

struct NewCheckoutFlowView: View {
    @ObservedObject var viewModel: CheckoutViewModel

    var body: some View {
        // New streamlined checkout implementation
        ScrollView {
            VStack(spacing: 16) {
                CartSummaryCard(items: viewModel.cartItems)

                ExpressPaymentOptions(
                    applePay: viewModel.applePayAvailable,
                    googlePay: viewModel.googlePayAvailable
                )

                Divider()

                AddressSection(address: $viewModel.shippingAddress)

                PaymentSection(payment: $viewModel.paymentMethod)

                OrderSummary(
                    subtotal: viewModel.subtotal,
                    shipping: viewModel.shipping,
                    tax: viewModel.tax,
                    total: viewModel.total
                )

                PlaceOrderButton(action: viewModel.placeOrder)
            }
            .padding()
        }
    }
}

Building a Custom Experimentation Platform

For full control, build your own experimentation infrastructure. This approach suits teams with specific requirements or those wanting to avoid vendor lock-in.

Experiment Assignment Service

// experiment-service.ts
interface Experiment {
  id: string;
  name: string;
  variants: Variant[];
  targetingRules: TargetingRule[];
  startDate: Date;
  endDate: Date | null;
  status: 'draft' | 'running' | 'paused' | 'completed';
}

interface Variant {
  id: string;
  name: string;
  weight: number; // 0-100, must sum to 100 across variants
  config: Record<string, any>;
}

interface TargetingRule {
  attribute: string;
  operator: 'equals' | 'contains' | 'gt' | 'lt' | 'in';
  value: any;
}

interface UserContext {
  userId: string;
  deviceId: string;
  platform: 'ios' | 'android';
  appVersion: string;
  country: string;
  subscriptionTier: string;
  accountCreatedAt: Date;
  customAttributes: Record<string, any>;
}

class ExperimentService {
  private experiments: Map<string, Experiment> = new Map();
  private assignments: Map<string, Map<string, string>> = new Map(); // userId -> experimentId -> variantId

  async getVariant(
    experimentId: string,
    user: UserContext
  ): Promise<Variant | null> {
    const experiment = this.experiments.get(experimentId);

    if (!experiment || experiment.status !== 'running') {
      return null;
    }

    // Check if user passes targeting rules
    if (!this.passesTargeting(experiment.targetingRules, user)) {
      return null;
    }

    // Check for existing assignment
    const existingAssignment = this.getExistingAssignment(user.userId, experimentId);
    if (existingAssignment) {
      return experiment.variants.find(v => v.id === existingAssignment) || null;
    }

    // Deterministic assignment based on user ID
    const variant = this.assignVariant(experiment, user.userId);

    // Store assignment
    this.storeAssignment(user.userId, experimentId, variant.id);

    // Log exposure
    await this.logExposure(experiment, variant, user);

    return variant;
  }

  private passesTargeting(rules: TargetingRule[], user: UserContext): boolean {
    return rules.every(rule => {
      const value = this.getAttribute(user, rule.attribute);

      switch (rule.operator) {
        case 'equals':
          return value === rule.value;
        case 'contains':
          return String(value).includes(rule.value);
        case 'gt':
          return Number(value) > Number(rule.value);
        case 'lt':
          return Number(value) < Number(rule.value);
        case 'in':
          return Array.isArray(rule.value) && rule.value.includes(value);
        default:
          return false;
      }
    });
  }

  private assignVariant(experiment: Experiment, userId: string): Variant {
    // Deterministic hash ensures same user always gets same variant
    const hash = this.hashString(`${experiment.id}:${userId}`);
    const bucket = hash % 100;

    let cumulative = 0;
    for (const variant of experiment.variants) {
      cumulative += variant.weight;
      if (bucket < cumulative) {
        return variant;
      }
    }

    // Fallback to first variant
    return experiment.variants[0];
  }

  private hashString(str: string): number {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      const char = str.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }

  private async logExposure(
    experiment: Experiment,
    variant: Variant,
    user: UserContext
  ): Promise<void> {
    const event = {
      eventType: 'experiment_exposure',
      experimentId: experiment.id,
      experimentName: experiment.name,
      variantId: variant.id,
      variantName: variant.name,
      userId: user.userId,
      deviceId: user.deviceId,
      platform: user.platform,
      appVersion: user.appVersion,
      timestamp: new Date().toISOString()
    };

    // Send to analytics pipeline
    await this.analyticsClient.track(event);
  }
}

Mobile SDK for Custom Platform

// iOS SDK
class CustomExperimentSDK {
    static let shared = CustomExperimentSDK()

    private let apiClient: ExperimentAPIClient
    private var cachedAssignments: [String: ExperimentVariant] = [:]
    private var userContext: UserContext?

    struct UserContext: Codable {
        let userId: String
        let deviceId: String
        let platform: String
        let appVersion: String
        let country: String
        let subscriptionTier: String
        let customAttributes: [String: AnyCodable]
    }

    struct ExperimentVariant: Codable {
        let experimentId: String
        let variantId: String
        let variantName: String
        let config: [String: AnyCodable]
    }

    func initialize(userId: String, attributes: [String: Any]) {
        self.userContext = UserContext(
            userId: userId,
            deviceId: UIDevice.current.identifierForVendor?.uuidString ?? "",
            platform: "ios",
            appVersion: Bundle.main.appVersion,
            country: Locale.current.region?.identifier ?? "AU",
            subscriptionTier: attributes["subscription_tier"] as? String ?? "free",
            customAttributes: attributes.mapValues { AnyCodable($0) }
        )

        // Prefetch all active experiments
        Task {
            await prefetchExperiments()
        }
    }

    private func prefetchExperiments() async {
        guard let context = userContext else { return }

        do {
            let assignments = try await apiClient.fetchAllAssignments(context: context)

            await MainActor.run {
                for assignment in assignments {
                    cachedAssignments[assignment.experimentId] = assignment
                }
            }
        } catch {
            print("Failed to prefetch experiments: \(error)")
        }
    }

    func getVariant(experimentId: String) async -> ExperimentVariant? {
        // Return cached if available
        if let cached = cachedAssignments[experimentId] {
            return cached
        }

        // Fetch from server
        guard let context = userContext else { return nil }

        do {
            let variant = try await apiClient.getVariant(
                experimentId: experimentId,
                context: context
            )
            cachedAssignments[experimentId] = variant
            return variant
        } catch {
            print("Failed to get variant for \(experimentId): \(error)")
            return nil
        }
    }

    func trackConversion(experimentId: String, event: String, value: Double? = nil) {
        guard let variant = cachedAssignments[experimentId],
              let context = userContext else { return }

        Task {
            try? await apiClient.trackConversion(
                experimentId: experimentId,
                variantId: variant.variantId,
                event: event,
                value: value,
                context: context
            )
        }
    }
}

Analysing Experiment Results

Statistical Significance Calculator

interface ExperimentResults {
  control: VariantResults;
  treatment: VariantResults;
}

interface VariantResults {
  visitors: number;
  conversions: number;
  conversionRate: number;
}

interface AnalysisResult {
  isSignificant: boolean;
  pValue: number;
  confidenceLevel: number;
  relativeUplift: number;
  confidenceInterval: [number, number];
  recommendation: string;
}

function analyzeExperiment(results: ExperimentResults): AnalysisResult {
  const { control, treatment } = results;

  // Calculate conversion rates
  const p1 = control.conversions / control.visitors;
  const p2 = treatment.conversions / treatment.visitors;

  // Pooled proportion
  const pooledP = (control.conversions + treatment.conversions) /
                  (control.visitors + treatment.visitors);

  // Standard error
  const se = Math.sqrt(
    pooledP * (1 - pooledP) * (1/control.visitors + 1/treatment.visitors)
  );

  // Z-score
  const zScore = (p2 - p1) / se;

  // P-value (two-tailed)
  const pValue = 2 * (1 - normalCDF(Math.abs(zScore)));

  // Confidence interval for difference
  const marginOfError = 1.96 * se;
  const difference = p2 - p1;
  const confidenceInterval: [number, number] = [
    difference - marginOfError,
    difference + marginOfError
  ];

  // Relative uplift
  const relativeUplift = ((p2 - p1) / p1) * 100;

  const isSignificant = pValue < 0.05;

  let recommendation: string;
  if (!isSignificant) {
    recommendation = "No significant difference detected. Continue running the experiment or accept null hypothesis.";
  } else if (relativeUplift > 0) {
    recommendation = `Treatment shows ${relativeUplift.toFixed(1)}% improvement. Consider rolling out to all users.`;
  } else {
    recommendation = `Treatment shows ${Math.abs(relativeUplift).toFixed(1)}% decline. Recommend keeping control.`;
  }

  return {
    isSignificant,
    pValue,
    confidenceLevel: (1 - pValue) * 100,
    relativeUplift,
    confidenceInterval,
    recommendation
  };
}

// Normal CDF approximation
function normalCDF(x: number): number {
  const a1 =  0.254829592;
  const a2 = -0.284496736;
  const a3 =  1.421413741;
  const a4 = -1.453152027;
  const a5 =  1.061405429;
  const p  =  0.3275911;

  const sign = x < 0 ? -1 : 1;
  x = Math.abs(x) / Math.sqrt(2);

  const t = 1.0 / (1.0 + p * x);
  const y = 1.0 - (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t * Math.exp(-x * x);

  return 0.5 * (1.0 + sign * y);
}

// Usage example
const results: ExperimentResults = {
  control: {
    visitors: 10000,
    conversions: 300,
    conversionRate: 0.03
  },
  treatment: {
    visitors: 10000,
    conversions: 360,
    conversionRate: 0.036
  }
};

const analysis = analyzeExperiment(results);
console.log(analysis);
// {
//   isSignificant: true,
//   pValue: 0.023,
//   confidenceLevel: 97.7,
//   relativeUplift: 20,
//   confidenceInterval: [0.001, 0.011],
//   recommendation: "Treatment shows 20.0% improvement. Consider rolling out to all users."
// }

Common A/B Testing Mistakes

Mistake 1: Peeking at Results Too Early

// BAD: Checking daily and stopping when "significant"
const earlyResults = {
  day1: { pValue: 0.15, significant: false },
  day2: { pValue: 0.08, significant: false },
  day3: { pValue: 0.04, significant: true }, // "Let's ship it!"
};

// This inflates false positive rate dramatically
// The 5% significance level assumes you check ONCE at the end

// GOOD: Pre-commit to duration and check only at the end
interface ExperimentProtocol {
  minimumDuration: number;  // days
  minimumSampleSize: number;
  checkDate: Date;
  earlyStoppingRules: EarlyStoppingRule[];
}

interface EarlyStoppingRule {
  type: 'harm' | 'overwhelming_success';
  threshold: number;
  minimumSample: number;
}

const protocol: ExperimentProtocol = {
  minimumDuration: 14,
  minimumSampleSize: 17000,
  checkDate: new Date('2026-09-01'),
  earlyStoppingRules: [
    // Only stop early if treatment is causing clear harm
    { type: 'harm', threshold: -0.20, minimumSample: 5000 }
  ]
};

Mistake 2: Multiple Comparisons Without Correction

// Testing 5 variants without correction
const variants = ['A', 'B', 'C', 'D', 'E'];
const significanceLevel = 0.05;

// Probability of at least one false positive:
// 1 - (1 - 0.05)^4 = 18.5%  (comparing each to control)

// Solution: Bonferroni correction
const correctedSignificance = significanceLevel / (variants.length - 1);
// 0.05 / 4 = 0.0125

// Better solution: Sequential testing or Benjamini-Hochberg
function benjaminiHochberg(pValues: number[], alpha: number = 0.05): boolean[] {
  const sorted = pValues
    .map((p, i) => ({ p, i }))
    .sort((a, b) => a.p - b.p);

  const m = pValues.length;
  const significant = new Array(m).fill(false);

  for (let k = m; k >= 1; k--) {
    const threshold = (k / m) * alpha;
    if (sorted[k - 1].p <= threshold) {
      // This and all smaller p-values are significant
      for (let j = 0; j < k; j++) {
        significant[sorted[j].i] = true;
      }
      break;
    }
  }

  return significant;
}

Mistake 3: Ignoring Novelty and Learning Effects

// Users interact differently with new features initially
interface TimeSeriesAnalysis {
  experimentId: string;
  dailyMetrics: DailyMetric[];
}

interface DailyMetric {
  date: Date;
  controlConversionRate: number;
  treatmentConversionRate: number;
  sampleSize: number;
}

function detectNoveltyEffect(data: TimeSeriesAnalysis): boolean {
  const metrics = data.dailyMetrics;
  if (metrics.length < 7) return false;

  // Compare first week to subsequent weeks
  const firstWeek = metrics.slice(0, 7);
  const subsequent = metrics.slice(7);

  const firstWeekUplift = average(
    firstWeek.map(d => (d.treatmentConversionRate - d.controlConversionRate) / d.controlConversionRate)
  );

  const subsequentUplift = average(
    subsequent.map(d => (d.treatmentConversionRate - d.controlConversionRate) / d.controlConversionRate)
  );

  // If uplift decreased significantly, novelty effect likely
  const decline = (firstWeekUplift - subsequentUplift) / firstWeekUplift;

  return decline > 0.3; // More than 30% decline suggests novelty effect
}

// Recommendation: Run experiments for at least 2-3 weeks to capture true long-term behaviour

Best Practices Summary

Experiment Design Checklist

  1. Define success metrics before starting
  2. Calculate required sample size based on minimum detectable effect
  3. Set experiment duration and commit to it
  4. Document targeting rules and exclusions
  5. Implement proper randomisation (deterministic by user ID)
  6. Log all exposures for accurate analysis

Technical Implementation Checklist

  1. Cache experiment assignments locally to ensure consistency
  2. Handle offline gracefully with sensible defaults
  3. Track exposure events separately from conversion events
  4. Implement kill switches for rapid rollback
  5. Version your experiment configs for reproducibility
  6. Monitor experiment health (sample ratio mismatch, data quality)

Analysis Checklist

  1. Wait for full sample size before analysing
  2. Check for sample ratio mismatch (should be close to expected split)
  3. Segment analysis (iOS vs Android, new vs returning users)
  4. Consider practical significance, not just statistical
  5. Document learnings for future experiments

Conclusion

A/B testing transforms product development from opinion-driven to evidence-based. The technical implementation is straightforward—Firebase Remote Config gets you started in an afternoon. The challenge lies in discipline: designing experiments properly, waiting for statistical significance, and acting on results even when they contradict assumptions.

Start with high-impact experiments on critical user flows: onboarding, checkout, and retention features. Build experimentation infrastructure incrementally. Track experiment velocity as a team metric—the more experiments you run, the faster you learn.

The best product teams treat every feature as a hypothesis to validate, not a solution to ship. A/B testing provides the feedback loop that makes continuous improvement possible.


Building an app that needs experimentation infrastructure? We have helped dozens of Australian startups implement data-driven product development. Contact us to discuss your experimentation strategy.