Crash recovery architecture
Overview
Content Analytics uses PersistentHitQueue to protect against data loss during the batching window (0-5 seconds). Events are written to disk immediately when tracked. On next app launch, any persisted events are recovered from disk into memory for processing, then cleared from disk (no data loss - events are safely in memory before disk cleanup).
How It Works
Copied to your clipboardUser tracks event└─> Event added to memory + disk (crash-safe)│└─> Batching (0-5 seconds)│└─> Flush triggered│├─> Process accumulated events├─> Calculate aggregated metrics└─> Dispatch to Edge Network (Edge guarantees delivery)
Architecture Components
BatchCoordinator
Responsibilities:
- Manages batching logic (count threshold and time-based flush)
- Writes incoming events to disk immediately via
PersistentHitQueue - Maintains in-memory event counters
- Triggers flush when threshold reached (10 events or 5 seconds)
- Coordinates between
DirectHitProcessorandContentAnalyticsOrchestrator
Key Methods:
Copied to your clipboardfun addAssetEvent(event: Event)├─> assetHitProcessor.accumulateEvent(event) // Add to memory├─> persistEventImmediately(event, queue) // Write to disk└─> checkAndFlushIfNeeded() // Check thresholdssuspend fun performFlush()├─> val events = assetHitProcessor.processAccumulatedEvents()└─> [Orchestrator processes events → dispatches to Edge]└─> Edge guarantees delivery from here
Copied to your clipboardfunc addAssetEvent(_ event: Event)├─> assetHitProcessor.accumulateEvent(event) // Add to memory├─> persistEventImmediately(event, to: queue) // Write to disk└─> checkAndFlushIfNeeded() // Check thresholdsfunc performFlush()├─> let events = assetHitProcessor.processAccumulatedEvents()└─> [Orchestrator processes events → dispatches to Edge]└─> Edge guarantees delivery from here
Copied to your clipboardfun addAssetEvent(event: Event)├─> assetHitProcessor.accumulateEvent(event) // Add to memory├─> persistEventImmediately(event, queue) // Write to disk└─> checkAndFlushIfNeeded() // Check thresholdssuspend fun performFlush()├─> val events = assetHitProcessor.processAccumulatedEvents()└─> [Orchestrator processes events → dispatches to Edge]└─> Edge guarantees delivery from here
Copied to your clipboardfunc addAssetEvent(_ event: Event)├─> assetHitProcessor.accumulateEvent(event) // Add to memory├─> persistEventImmediately(event, to: queue) // Write to disk└─> checkAndFlushIfNeeded() // Check thresholdsfunc performFlush()├─> let events = assetHitProcessor.processAccumulatedEvents()└─> [Orchestrator processes events → dispatches to Edge]└─> Edge guarantees delivery from here
DirectHitProcessor
Responsibilities:
- Implements
HitProcessingprotocol forPersistentHitQueueintegration - Accumulates events in memory for fast batching
- On recovery: loads events from disk into memory, then clears disk (no data loss)
Event Lifecycle:
Copied to your clipboardoverride suspend fun processHit(entity: DataEntity): Boolean├─> Decode event from disk├─> Accumulate in memory (if not already present)└─> return true → clear from disk (event now in memory)
Copied to your clipboardfunc processHit(entity: DataEntity, completion: (Bool) -> Void)├─> Decode event from disk├─> Accumulate in memory (if not already present)└─> completion(true) → clear from disk (event now in memory)
Copied to your clipboardoverride suspend fun processHit(entity: DataEntity): Boolean├─> Decode event from disk├─> Accumulate in memory (if not already present)└─> return true → clear from disk (event now in memory)
Copied to your clipboardfunc processHit(entity: DataEntity, completion: (Bool) -> Void)├─> Decode event from disk├─> Accumulate in memory (if not already present)└─> completion(true) → clear from disk (event now in memory)
PersistentHitQueue (AEPServices)
Provides:
- Two separate queues:
asset.eventsandexperience.events - SQLite-backed persistence (survives crashes, force-quit, background termination)
- Automatic processing via
beginProcessing() - Thread-safe operations
Storage:
- Events encoded as JSON via
Event: Codable - Each event wrapped with type metadata (
assetorexperience) - Unique identifier:
event.id.uuidString
Detailed Timeline Example
Copied to your clipboardTime │ Event │ Memory │ Disk │ Safe?───────┼──────────────────────────────────────┼────────┼──────┼───────00.00s │ User views Asset A │ ✓ │ ✓ │ ✅ YES00.01s │ Event written to disk │ ✓ │ ✓ │ ✅ YES00.50s │ User clicks Asset B │ ✓ │ ✓ │ ✅ YES01.00s │ User clicks Asset B │ ✓ │ ✓ │ ✅ YES│ [Batching window - events on disk] │ │ │02.00s │ Timer fires → Flush triggered │ ✓ │ ✓ │ ✅ YES02.01s │ Process accumulated events │ ✓ │ ✓ │ ✅ YES02.02s │ Calculate metrics (1 view, 2 clicks) │ ✓ │ ✓ │ ✅ YES02.03s │ Dispatch to Edge Network │ ✗ │ ✗ │ ✅ YES*│ (*Edge guarantees delivery) │ │ │Legend:✓ = Present✗ = Not present
Events stay on disk during the entire batching window. Once events are handed off to Edge, their persistence takes over.
Crash Scenarios
Scenario 1: Crash During Batching (0-5s window)
Copied to your clipboardStatus: Events in memory + diskCrash: ⚡ App terminated└─> Memory lost ✗└─> Disk persists ✓Recovery on Next Launch:1. PersistentHitQueue.beginProcessing() starts2. DirectHitProcessor.processHit() called for each persisted event3. Events accumulated in memory, cleared from disk4. Normal batch processing resumesResult: ✅ ZERO DATA LOSS
Scenario 2: Crash During Flush
Copied to your clipboardStatus: Events being processedCrash: ⚡ App terminated mid-dispatch└─> Memory lost ✗└─> Events may still be on disk if not yet processedRecovery on Next Launch:1. Any remaining events on disk are recovered2. Re-accumulated and dispatched on next flushResult: ✅ ZERO DATA LOSS (possible duplicate if crash after Edge dispatch)
Scenario 3: Crash After Edge Dispatch
Copied to your clipboardStatus: Events dispatched to EdgeCrash: ⚡ App terminated└─> Disk already cleared during processHit()└─> Edge has the eventsResult: ✅ ZERO DATA LOSS - Edge guarantees delivery
Edge Network Handoff
Once events are dispatched to Edge extension:
Copied to your clipboardContentAnalytics → runtime.dispatch(event) → Event Hub → Edge Extension└─> Edge.PersistentHitQueue└─> Network retries└─> Exponential backoff
Handoff Point: After eventDispatcher.dispatch() completes, Edge extension owns persistence.
Edge Guarantees: Once Edge receives the event, it handles persistence, retries, and delivery confirmation.
Metrics Calculation
Metrics are derived from events, not stored separately:
Copied to your clipboard// On flush (ContentAnalyticsOrchestrator.kt)private fun buildAssetMetricsCollection(events: List<Event>): AssetMetricsCollection {val groupedEvents = events.groupBy { it.assetKey ?: "" }val metricsMap = mutableMapOf<String, AssetMetrics>()for ((key, events) in groupedEvents) {val views = events.count { it.interactionType == InteractionType.VIEW }val clicks = events.count { it.interactionType == InteractionType.CLICK }metricsMap[key] = AssetMetrics(viewCount = views, clickCount = clicks, ...)}return AssetMetricsCollection(metricsMap)}
Copied to your clipboard// On flush (ContentAnalyticsOrchestrator.swift)func buildAssetMetricsCollection(from events: [Event]) -> AssetMetricsCollection {let groupedEvents = Dictionary(grouping: events) { $0.assetKey ?? "" }var metricsMap: [String: AssetMetrics] = [:]for (key, events) in groupedEvents {let views = events.filter { $0.interactionType == .view }.countlet clicks = events.filter { $0.interactionType == .click }.countmetricsMap[key] = AssetMetrics(viewCount: views, clickCount: clicks, ...)}return AssetMetricsCollection(metrics: metricsMap)}
Copied to your clipboard// On flush (ContentAnalyticsOrchestrator.kt)private fun buildAssetMetricsCollection(events: List<Event>): AssetMetricsCollection {val groupedEvents = events.groupBy { it.assetKey ?: "" }val metricsMap = mutableMapOf<String, AssetMetrics>()for ((key, events) in groupedEvents) {val views = events.count { it.interactionType == InteractionType.VIEW }val clicks = events.count { it.interactionType == InteractionType.CLICK }metricsMap[key] = AssetMetrics(viewCount = views, clickCount = clicks, ...)}return AssetMetricsCollection(metricsMap)}
Copied to your clipboard// On flush (ContentAnalyticsOrchestrator.swift)func buildAssetMetricsCollection(from events: [Event]) -> AssetMetricsCollection {let groupedEvents = Dictionary(grouping: events) { $0.assetKey ?? "" }var metricsMap: [String: AssetMetrics] = [:]for (key, events) in groupedEvents {let views = events.filter { $0.interactionType == .view }.countlet clicks = events.filter { $0.interactionType == .click }.countmetricsMap[key] = AssetMetrics(viewCount: views, clickCount: clicks, ...)}return AssetMetricsCollection(metrics: metricsMap)}
This avoids state sync issues. Just events are counted on flush. If the app crashes, the restored events give the same metrics.
Configuration
Copied to your clipboard{"contentanalytics.batchingEnabled": true,"contentanalytics.maxBatchSize": 10,"contentanalytics.batchFlushInterval": 2000}
Parameters:
maxBatchSize: Event count threshold (default: 10)batchFlushInterval: Timer interval for periodic flush in milliseconds (default: 2000 ms = 2s). Max wait time is derived from this (2.5× = 5000 ms).batchingEnabled: Set tofalsefor immediate dispatch (no batching)
Performance Characteristics
| Operation | Time | Notes |
|---|---|---|
Event persistence | ~1-2ms | SQLite write |
Event recovery | ~5-10ms | SQLite read on launch |
Batch flush | ~10-20ms | Metrics calculation + Edge dispatch |
Memory per event | ~2KB | Event object + metadata |
Disk per event | ~1-2KB | JSON encoding |
Memory Usage: With default batch size (10), worst-case memory is ~20-40KB (negligible).
Network Efficiency: Batching reduces Edge Network calls by 10x for high-volume tracking.
Thread Safety
All operations use Kotlin coroutines with Mutex for thread-safe access:
Android
Copied to your clipboard// BatchCoordinatorprivate val scope = CoroutineScope(Dispatchers.IO + SupervisorJob())private val stateMutex = kotlinx.coroutines.sync.Mutex()// DirectHitProcessorprivate val mutex = Mutex()// All state mutations wrapped in mutex.withLock { }
Copied to your clipboard// BatchCoordinatorprivate val scope = CoroutineScope(Dispatchers.IO + SupervisorJob())private val stateMutex = kotlinx.coroutines.sync.Mutex()// DirectHitProcessorprivate val mutex = Mutex()// All state mutations wrapped in mutex.withLock { }
Testing Crash Recovery
Test 1: Crash During Batching
- Track 5 asset events
- DO NOT wait for flush timer
- Force-quit app (⌘+Q or kill process)
- Relaunch app
- Track 5 more asset events
- Wait 2 seconds for flush
- Verify: 1 Edge event with 10 aggregated interactions
Test 2: Crash During Flush
- Track 10 asset events (triggers immediate flush)
- Set breakpoint in
sendToEdge() - Force-quit app at breakpoint
- Relaunch app
- Wait 5 seconds
- Verify: Events re-dispatched (possible duplicate)
Test 3: Background Termination
- Track events
- Background app
- OS terminates app (memory pressure)
- Relaunch app
- Verify: Events recovered and dispatched
Implementation Details
Key Files
BatchCoordinator.swift- Batching logic and persistence coordinationDirectHitProcessor.swift- Crash recovery and event accumulationContentAnalyticsOrchestrator.swift- Metrics calculation and Edge dispatchPersistentHitQueue(AEPServices) - SQLite-backed queue
Thread Safety
- All operations use serial dispatch queues
batchQueue(BatchCoordinator) - batch operationsqueue(DirectHitProcessor) - hit processing
Data Flow
Copied to your clipboardEvent tracked└─> BatchCoordinator.addAssetEvent()├─> DirectHitProcessor.accumulateEvent() [memory]├─> PersistentHitQueue.queue() [disk]└─> checkAndFlushIfNeeded()└─> performFlush()└─> DirectHitProcessor.processAccumulatedEvents()└─> Orchestrator.processAssetEvents()└─> EventDispatcher.dispatch() [→ Edge]
Callback Chain Architecture
The SDK uses a callback chain to decouple components while maintaining type safety:
Copied to your clipboard┌─────────────────────────────────────────────────────────────────────────────┐│ INITIALIZATION PHASE │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ContentAnalyticsFactory.createOrchestrator() ││ │ ││ ├─> Creates BatchCoordinator(assetQueue, experienceQueue, state) ││ │ └─> DirectHitProcessor initialized with no-op callbacks ││ │ ││ ├─> Creates ContentAnalyticsOrchestrator(batchCoordinator, ...) ││ │ ││ └─> Wires callbacks: batchCoordinator.setCallbacks( ││ assetCallback: orchestrator.processAssetEvents, ││ experienceCallback: orchestrator.processExperienceEvents ││ ) ││ │└─────────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────────┐│ RUNTIME DATA FLOW │├─────────────────────────────────────────────────────────────────────────────┤│ ││ User calls ContentAnalytics.trackAssetInteraction() ││ │ ││ v ││ ┌──────────────────┐ ││ │ BatchCoordinator │ ││ │ addAssetEvent() │──────────────────────────────────────────┐ ││ └────────┬─────────┘ │ ││ │ │ ││ v v ││ ┌────────────────────┐ ┌─────────────────────┐ ││ │ DirectHitProcessor │ │ PersistentHitQueue │ ││ │ accumulateEvent() │ │ queue() [disk] │ ││ │ [memory buffer] │ └─────────────────────┘ ││ └────────┬───────────┘ ││ │ ││ │ (on flush trigger: count >= 10 or timer >= 2s) ││ v ││ ┌────────────────────────────┐ ││ │ DirectHitProcessor │ ││ │ processAccumulatedEvents() │ ││ └────────┬───────────────────┘ ││ │ ││ │ invokes processingCallback([events]) ││ v ││ ┌─────────────────────────────────┐ ││ │ ContentAnalyticsOrchestrator │ ││ │ processAssetEvents([events]) │ ││ │ ├─> Group by asset key │ ││ │ ├─> Calculate metrics │ ││ │ └─> Build XDM payload │ ││ └────────┬────────────────────────┘ ││ │ ││ v ││ ┌───────────────────┐ ││ │ EdgeEventDispatcher│ ││ │ dispatch() │──────────────> Edge Network ││ └───────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘
Callbacks avoid circular dependencies - BatchCoordinator doesn't need to import Orchestrator. Also this makes testing easier since you can inject mocks.
Logging
Enable verbose logging to debug crash recovery:
Copied to your clipboardLog.setLogLevel(.trace)
Look for:
Copied to your clipboard[BATCH_PROCESSOR] Accumulated ASSET event | ID: <uuid>[BATCH_PROCESSOR] Recovered event from disk | Type: asset | ID: <uuid>[BATCH_PROCESSOR] Processing 5 asset events
Comparison with Edge Extension
| Feature | Content Analytics | Edge Extension |
|---|---|---|
Pre-dispatch persistence | ✅ YES (0-5s) | ❌ NO |
Batching | ✅ YES | ❌ NO |
Post-dispatch persistence | ✅ Edge's queue | ✅ PersistentHitQueue |
Network retries | ✅ Edge handles | ✅ Exponential backoff |
Crash recovery during batch | ✅ FULL | N/A |
Content Analytics batches events for 0-5 seconds before dispatch. Without disk persistence during that window, crashes would lose data. Edge dispatches immediately so it doesn't need this.
Known Limitations
- No dispatch confirmation: Extensions cannot receive callbacks from Edge to confirm receipt
- Possible duplicates: Crash during Edge dispatch may cause duplicate events (Edge deduplication handles this)
- Memory overhead: Events held in memory + disk during batching (minimal: ~40KB)
