Hindsight
2025Creator
Overview
Hindsight is a cross-platform desktop application for gaze tracking, screen recording, and webcam capture. It records what users look at on screen by combining real-time eye tracking with synchronized video recording, producing timestamped gaze data alongside the captured footage. The application ships with multiple gaze estimation models, streams video directly to disk for constant memory usage, and supports cloud uploads to Azure Blob Storage and AWS S3.
Built with Electron and TypeScript, Hindsight runs on macOS, Windows, and Linux with platform-specific installers for each. It features a deep link protocol (hindsight://) for third-party integration, webhook event notifications, and a setup wizard that guides users through calibration and configuration.
The core challenge was getting accurate gaze estimation to run in real-time inside a browser-based renderer process, while simultaneously recording multiple video streams without blowing up memory. The solution was a service-oriented architecture with streaming I/O, pluggable gaze providers, and a centralized event bus.
Architecture
Hindsight follows Electron's two-process model — a Node.js main process for system-level operations and a Chromium renderer process for the UI and computer vision pipeline. All services communicate through an event-driven architecture with a dependency injection container for loose coupling.
Gaze Estimation Models
Gaze tracking is the hardest part of the application. Getting it right means combining face detection, landmark extraction, and gaze regression — all running in real-time within a browser. Hindsight implements multiple approaches behind a pluggable provider interface, so the estimation strategy can be swapped without touching the rest of the application.
WebGazer — Ridge Regression
The primary gaze provider. WebGazer uses a combination of BlazeFace for face detection, a custom eye patch CNN for feature extraction, and ridge regression for mapping eye features to screen coordinates. Users calibrate by clicking 16 screen positions (5 clicks each, 80 total), building a per-session regression model that adapts to their head position and screen geometry.
WebGazer runs entirely in the browser via TensorFlow.js. The ML models (BlazeFace, Facemesh) are served from a local HTTP server on port 47821 to avoid Chromium's security restrictions on file:// access — no --disable-web-security flags needed.
Iris Gaze Estimator — Pose-Invariant Features
A custom estimator inspired by the 3DGazeNet architecture. Instead of using raw eye patches, it computes iris-to-eye-corner ratios from MediaPipe's 478-point face landmarks. These ratios are pose-invariant — they stay stable when the user tilts or rotates their head — making the gaze estimate more robust to natural movement.
The feature vector is 6-dimensional: left and right iris ratios (X/Y) plus head pose (X/Y). An OLS regression model is fitted during calibration with blending between left and right eye estimates. The result is smoother tracking with less head-position sensitivity than pure eye patch approaches.
MediaPipe Face Landmarker — 478 Points + Blendshapes
The face landmark service uses Google's MediaPipe Vision tasks (WASM-based) to extract 478 3D face landmarks and 52 facial expression blendshapes in real-time. This feeds both the Iris Gaze Estimator (for computing iris ratios) and the heatmap system (for face presence detection). A separate FaceDistance module estimates how far the user is from the screen using inter-pupillary distance, which is used to scale gaze sensitivity.
Streaming Architecture
Recording sessions can run for hours. Buffering video in memory would be catastrophic — a 1-hour screen recording at decent quality easily exceeds 4GB. Hindsight solves this with a streaming writer that pipes video chunks directly to disk as they arrive, maintaining constant memory usage regardless of recording duration.
StreamingWriter: Opens a file handle via IPC, queues incoming video chunks, and writes them sequentially to disk. A write queue prevents ordering issues, and exponential backoff retry logic (max 3 retries) handles transient I/O failures.
Dual-Stream Recording: Webcam and screen are captured by separate MediaRecorder instances, each with its own StreamingWriter. Codec selection cascades through VP9, VP8, and generic WebM depending on platform support.
Session Folders: Each recording creates a timestamped folder containing the webcam video, screen video, gaze data, and recording metadata (watcher name, content, screen dimensions, pixel ratio).
Gaze Visualization
Raw gaze coordinates are useful for analysis, but not for real-time feedback. Hindsight includes two complementary visualization systems.
Heatmap Overlay
Grid-based intensity map with configurable blur radius. Supports multiple decay modes (none, slow, fast) so users can see either cumulative attention or recent focus. Separate heatmap instances run for the screen and YouTube tabs.
Region Classifier
Divides the screen into a 5×5 grid (25 regions) and classifies each gaze sample into its corresponding region with timestamps. This produces discrete attention data — which quadrant of the screen got the most focus and when — useful for downstream analysis.
Cloud Storage Backends
Recordings can be automatically uploaded when a session ends. Two cloud providers are fully implemented, with GCP stubbed for future work.
Azure Blob Storage
- • SAS token or connection string auth
- • Uploads all session files individually
- • Real-time progress events via IPC
AWS S3
- • Auto-detects bucket region
- • Combines WebM files into ZIP
- • Multi-stage: prepare → compress → upload
GCP Storage
- • Configuration type defined
- • Bucket + credentials interface ready
- • Implementation pending
Cross-Platform Design
Electron provides the cross-platform foundation, but the platform differences go deeper than just packaging. Camera permissions, deep link registration, tray icon handling, and installer behavior all vary by OS.
macOS
macOS requires explicit entitlements for camera, microphone, and JIT compilation access. A custom entitlements.mac.plist grants these permissions. Deep links register via app.on('open-url'). Distribution includes DMG (drag-to-install), ZIP, and a curl-based installer script that bypasses Gatekeeper quarantine — useful for enterprise deployments where users can't approve unsigned apps through System Preferences.
Windows
Windows builds use NSIS for the installer, which handles registry entries for the deep link protocol and start menu shortcuts. Deep links arrive as command-line arguments rather than OS events, so the app parses process.argv on startup and on second-instance activation. Auto-updates are supported through electron-builder's update mechanism.
Linux
Linux targets both AppImage (portable, no install required) and DEB (system package for Debian/Ubuntu). No special entitlements are needed — camera access is handled by the compositor. The electron-builder configuration ensures glibc compatibility across distributions.
Key Design Decisions
1. Local Model Server
ML models (BlazeFace, Facemesh, Iris, WebGazer) are served over localhost:47821 instead of loading from the filesystem. This avoids disabling Chromium's web security while keeping everything local. The model server runs in the main process, and the renderer loads models via standard HTTP requests — same as any web app loading from a CDN, except the CDN is on localhost.
2. Provider Pattern for Gaze
The IGazeProvider interface defines a contract for gaze implementations: initialize, start/stop tracking, calibrate. WebGazer is the current provider, but the interface makes it straightforward to plug in hardware eye trackers (Tobii, etc.) without modifying the rest of the application. The GazeService facade abstracts this away from consumers.
3. Event-Driven Services
All services extend a shared EventEmitter base class and communicate via events (ready, start, complete, error). The AppController acts as a central mediator — it wires services together without them knowing about each other. This keeps the dependency graph shallow and makes it possible to test services in isolation.
4. Single Instance Lock
Only one instance of Hindsight can run at a time. If a second instance launches (e.g., from a deep link), the existing instance receives focus and processes the deep link URL. This prevents resource conflicts from duplicate camera access and ensures the system tray stays consistent.
5. Wizard State Machine
First-time setup is guided by a state machine with two tracks: quick (account → loading → calibration) and advanced (account → loading → viewer → app → calibration → accuracy → storage → overview). The state machine ensures users can't skip critical steps like calibration while letting power users configure every detail.
Tech Stack
Application
- • Electron 28 (Chromium + Node.js)
- • TypeScript 5.3
- • esbuild for bundling
- • Plain CSS (no framework)
Computer Vision
- • WebGazer 3.4 (ridge regression)
- • MediaPipe Vision (WASM, face landmarks)
- • TensorFlow.js (BlazeFace, Facemesh)
- • Custom Iris Gaze Estimator
Cloud & Integration
- • Azure Blob Storage SDK
- • AWS S3 SDK
- • Webhook event notifications
- • Deep link protocol (hindsight://)
Packaging & Testing
- • electron-builder (DMG, NSIS, AppImage, DEB)
- • Vitest + happy-dom
- • ASAR packaging for security
- • Separate tsconfigs per process
Installers & Distribution
electron-builder produces platform-native installers from a single codebase. Each platform gets the installer format its users expect.
- • DMG with drag-to-Applications
- • ZIP archive
- • curl script (quarantine-free)
- • NSIS installer (.exe)
- • Start menu + registry entries
- • Auto-update support
- • AppImage (portable)
- • DEB package (Debian/Ubuntu)
- • glibc-compatible builds
Build targets are invoked via npm run dist:mac, dist:win, dist:linux, or dist:all for a full multi-platform release. The app is packaged with ASAR for source protection, and source maps and declaration files are excluded from the bundle.
Integration: Deep Links & Webhooks
Hindsight is designed to be triggered and monitored by external systems. The deep link protocol allows other applications to launch recordings with pre-configured settings, while webhooks push recording events to any HTTP endpoint.
Deep Links
The hindsight:// protocol can pass source selection parameters and trigger actions. On macOS this registers via the OS URL handler; on Windows/Linux via command-line args on the existing single instance.
Webhooks
The WebhookService sends event notifications when recordings start, stop, or complete uploading. This enables integration with pipelines that need to process recordings as they become available — for example, triggering gaze analysis when a session finishes uploading to S3.
Development Story
Hindsight started as a simple screen recorder with gaze overlay, but the scope grew quickly once it became clear that accurate, calibration-based gaze tracking in a browser was actually feasible. The first version used WebGazer directly with minimal architecture — everything in one file. As features accumulated (cloud uploads, multiple recording modes, heatmaps, deep links), the codebase was restructured into the current MVC-inspired service architecture with a mediator pattern.
The hardest engineering problem was the streaming writer. Early versions buffered video chunks in memory and wrote them all at once when recording stopped. This worked for short recordings but crashed the app on anything over 15 minutes. The streaming approach — opening a file handle on recording start and piping chunks as they arrive — solved the memory issue entirely. The write queue was added later after discovering that Node.js file writes aren't guaranteed to complete in order under heavy load.
The Iris Gaze Estimator came from frustration with WebGazer's sensitivity to head movement. Users would calibrate, then shift in their chair and lose accuracy. The iris-ratio approach, inspired by 3DGazeNet, computes features that are inherently stable under head rotation — trading some raw precision for much better robustness in real-world conditions.