Systemic Risk Monitor for Frontier AI Labs

📐 Methodology

Overview

The Fault Line provides a transparent, evidence-based assessment of structural vulnerabilities facing frontier AI laboratories. Rather than predicting outcomes, it monitors and quantifies dependencies and risk factors that could affect an organization's ability to operate, scale, or adapt.

Each lab receives a Fragility Score from 0–10, where higher scores indicate greater systemic fragility. Scores are derived from a simple checklist of binary indicators, each supported by publicly verifiable news events and sources.

Scoring Formula

The total fragility score is calculated as:

Total Score = (Compute + Cloud + Policy + Demand + Societal Impact + Talent & Governance) − Resilience

Each dimension contributes 0–2 points of fragility, except Resilience which reduces the total score by demonstrating risk mitigation.

The final score is clamped to the range 0–10.

Confidence-Weighted Variant

Alongside the binary score, Fault Line displays a confidence-weighted score. Events classified with low confidence carry 0.5× weight; medium confidence carries 0.75×; high confidence carries full weight. This variant helps users understand the uncertainty range — if the weighted score diverges significantly from the binary score, many indicators rest on weaker evidence.

Dimensions Explained

💾 Compute & Chips Dependence (0–2 pts)

Measures reliance on specific hardware vendors and exposure to supply chain disruptions. Labs with single-vendor GPU strategies or documented supply constraints score higher.

☁️ Cloud Concentration (0–2 pts)

Measures dependency on hyperscaler partnerships for training and inference infrastructure. Exclusive partnerships, deep integrations, and high switching costs contribute to fragility.

🏛️ Policy & Geopolitical Exposure (0–2 pts)

Measures sensitivity to regulatory action, export controls, and political shifts. Labs operating across jurisdictions with pending regulations or active investigations score higher.

📈 Demand & Commercialization (0–2 pts)

Measures revenue sustainability and market position risks. Signals of demand weakness, customer churn, or capex/opex overhang relative to adoption increase this score.

🛡️ Resilience Moves (0–2 pts, inverted)

Measures proactive risk mitigation. Multi-sourcing strategies, diversified infrastructure, long-term contracts, and demonstrated redundancy reduce the total fragility score.

🌍 Societal Impact (0–2 pts)

Measures signals of workforce displacement, misinformation amplification, privacy/surveillance concerns, power concentration, and safety incidents attributed to AI deployment.

👥 Talent & Governance (0–2 pts)

Measures leadership instability, key departures, board dysfunction, and organizational governance risks. The historical record contains 15+ high-impact events of this type — the Altman firing, Leike departure, Sutskever exit, and similar episodes — that the original five-dimension checklist could not score.

Checklist Items

Each dimension is scored based on specific, observable indicators:

A) Compute & Chips Dependence

A1
Single GPU Vendor Lock-in
Evidence of tight coupling to a single GPU vendor (e.g., NVIDIA-only training strategy)
+1 pt
A2
Supply Constraints Impact
Evidence of supply constraints or delivery risk impacting roadmap or operations
+1 pt

B) Cloud Concentration

B1
Single Hyperscaler Dependence
Primary dependence on one hyperscaler for training/inference infrastructure
+1 pt
B2
Platform Lock-in Signals
Switching costs, exclusivity signals, or deep integration with single provider
+1 pt

C) Policy & Geopolitical Exposure

C1
Export Control Exposure
Exposure to export controls or cross-border restrictions affecting compute/chips
+1 pt
C2
Regulatory Sensitivity
High sensitivity to regulatory action (antitrust, safety regulation, procurement bans)
+1 pt

D) Demand & Commercialization

D1
Demand Weakness Signals
Credible signals of demand weakness or monetization challenges
+1 pt
D2
Capex/Opex Overhang
Evidence of capex/opex overhang relative to adoption (overbuild, runway strain)
+1 pt

E) Resilience Moves (Mitigations)

E1
Multi-sourcing Demonstrated
Demonstrated multi-sourcing or diversification (multi-cloud, alternative accelerators)
−1 pt
E2
Risk Reduction Actions
Concrete risk-reduction actions (long-term contracts, redundancy, modular deployment)
−1 pt

F) Societal Impact

F1
Workforce & Societal Disruption
Evidence of significant workforce displacement, misinformation amplification, or privacy/surveillance concerns attributed to AI deployment
+1 pt
F2
Power Concentration & Safety Incidents
Evidence of market power concentration, safety incidents, or actions increasing centralization without oversight
+1 pt

G) Talent & Governance

G1
Key Leadership Departure
Departure of CEO, CTO, chief scientist, or other critical leadership figure creating organizational instability
+1 pt
G2
Board & Governance Instability
Board dysfunction, governance disputes, organizational restructuring, or mass talent exodus signaling instability
+1 pt

Evidence Requirements

Each checklist item must be supported by:

Items without recent supporting evidence automatically expire and are no longer counted in the score.

Decay Window

Evidence expires after 180 days by default unless reaffirmed by a new event. This ensures the tracker reflects current conditions rather than historical snapshots.

When an item is supported by multiple events, the most recent event date determines the expiration.

Contested Items

If contradictory evidence exists for a checklist item (e.g., both signals of lock-in and diversification), the item is marked as "contested" and requires manual review before affecting the score.

Contested items are displayed in the UI with both supporting and contradicting evidence visible.

Data Sources

Fault Line ingests from multiple source tiers to catch signals at different stages:

Data Pipeline

The tracker updates automatically via the following process:

  1. Ingestion: RSS feeds and curated sources are checked for new articles
  2. Classification: Articles are mapped to labs, dimensions, and checklist items
  3. Deduplication: Duplicate events are detected and merged
  4. Scoring: Checklist states are updated and scores recalculated
  5. Publication: JSON data files are committed to the repository

The pipeline runs daily via GitHub Actions. Manual event submissions are accepted via GitHub Issues.

Limitations & Caveats

This tracker has important limitations:

Users should treat scores as a starting point for analysis, not definitive assessments.

Contributing

This project welcomes contributions. You can help by:

All contributions are subject to review to maintain data quality and methodological consistency.