Home » "Science" » Evolution of an Equation: From Superluminal Shadows to Private Vector Search

Evolution of an Equation: From Superluminal Shadows to Private Vector Search

TL;DR

A simple geometry demo—a shadow edge outrunning light—grew into a portable similarity system.

  • v1 (Origin): Multiplicative, interpretable similarity born from a shadow’s “gain × angles × rate.”

  • v2.1 (Canonical): One ARD-Mahalanobis quadratic + bounded exponential, wrapped in a production GP/BO stack.

  • v3.0 (FHE): Two-stage private vector search—low-rotation CKKS recall, then plaintext re-rank for fidelity.
    Code + OSF below.

In 2024 I posted a thought experiment: a point light, a rod, a tilted screen. The shadow edge on the screen can sweep faster than c because no object or signal travels along that edge—it’s just a geometric locus moving across a surface.

I wrote it roughly as:

shadow speed equation

i.e. (projected lever-arm) × (angular rate), scaled by the geometry. The point wasn’t to reinvent relativity; it was to make apparent superluminality intuitive and to capture the structure: scale × distance × orientation × dynamics.

2) Abstraction: from optics to embeddings

The geometry rhymed with vector similarity. So I transplanted the pieces:

  • (state strength)

  • distance to a reference/centroid

  • L2+y2
  • orientation/importance angles

  • ω → dynamics/time weight
  • extra feature-wise alignments with weights wi

Version 1 became an interpretable, multiplicative θ‴: a product of factors whose meanings were obvious. That made it portable: trading, behavior/safety, creative rhythm analytics, hypersonic sensing, governance/HFT experiments—you name it. It also surfaced a few pain points: manual weights, scale coupling, and post-hoc normalization.

3) Consolidation: θ‴ v2.1 (the canonical form)

I refactored the multiplicative score into one ARD-Mahalanobis distance over dimensionless, soft-clipped features, then used a bounded exponential similarity:

1: Encode & normalize
Continuous passthrough, periodic (optionally ) , categorical via learned embeddings rescaled to unit variance, plus context/time as features. Normalize by per-feature (IQR-based) and soft-clip deltas with a tanh clamp to keep gradients tame. Freeze after one guarded re-estimation (prevents drift).

2: Distance & similarity

θ‴ v2.1 (the canonical form)

No cosine stacks, no extra , no post-hoc . Angles and dynamics live as features, and ARD learns what matters.

3: Production GP/BO stack
Deterministic runs (seeded fantasies, pinned threads), jitter escalation, heavy-tail options (Student-t/Huber), constraints via PoF, qNEI ↔ qEI hysteresis on observed noise, async + dedup + replicates, OOS validation, full logging.

Why it matters: same intuition, clean math (PSD-safe), learned importance instead of hand-tuning, and operational guarantees that scale beyond demos.

4) Privacy by design: θ‴ 3.0 (FHE)

The canonical quadratic is also the cheapest thing to do under CKKS. Version 3.0 turns that into a two-stage private vector search:

  • Stage A — Encrypted recall (CKKS): rank the entire corpus using an ARD-weighted quadratic in a reduced space () with low-rotation packing. Optional client-norm piggyback removes the ciphertext×ciphertext multiply. No bootstrapping.

  • Stage B — Plaintext re-rank: reorder the top-K (e.g., 64–256) with either the legacy multiplicative razor (maximum fidelity) or v2.1 again—or a small domain model.

This split gives better Recall@K than encrypted L2 at the same HE budget, while keeping latency predictable and ops deterministic. It’s the natural endgame of the equation’s evolution: cheap private recall, sharp public precision.

5) What we actually brought to the table

Not a brand-new equation—a system:

  • A design pattern (strength × distance × orientation × dynamics) that stays interpretable across domains.

  • A canonical core (v2.1): ARD-Mahalanobis over dimensionless, soft-clipped features—stable, learnable, PSD-friendly.

  • A production BO engine around it (constraints, heavy-tails, hysteresis, async, reproducibility, audit logs).

  • A privacy-preserving search architecture (3.0): low-rotation CKKS recall + plaintext re-rank, no bootstrapping, with acceptance gates tied to Recall@K.

  • Cross-domain recipes: materials (nickelates, twistronics, SDEs), SEO/ads, systematic trading, safety/behavior, sensing, creative analytics.

6) Resources (timeline)

7) What’s next

  • Benchmarks: publish Recall@K vs encrypted L2 (3.0), and uplift vs cosine/L2 baselines (v2.1) across 2–3 datasets.

  • Cookbooks: one runnable notebook per domain (materials, SEO/ads, trading), plus HE rotation/key budgets for .