Methodology
Overview
The Fault Line provides a transparent, evidence-based assessment of structural vulnerabilities facing frontier AI laboratories. Rather than predicting outcomes, it monitors and quantifies dependencies and risk factors that could affect an organization's ability to operate, scale, or adapt.
Each lab receives a Fragility Score from 0–10, where higher scores indicate greater systemic fragility. Scores are derived from a simple checklist of binary indicators, each supported by publicly verifiable news events and sources.
Scoring Formula
The total fragility score is calculated as:
Each dimension contributes 0–2 points of fragility, except Resilience which reduces the total score by demonstrating risk mitigation.
The final score is clamped to the range 0–10.
Confidence-Weighted Variant
Alongside the binary score, Fault Line displays a confidence-weighted score. Events classified with low confidence carry 0.5× weight; medium confidence carries 0.75×; high confidence carries full weight. This variant helps users understand the uncertainty range — if the weighted score diverges significantly from the binary score, many indicators rest on weaker evidence.
Dimensions Explained
💾 Compute & Chips Dependence (0–2 pts)
Measures reliance on specific hardware vendors and exposure to supply chain disruptions. Labs with single-vendor GPU strategies or documented supply constraints score higher.
☁️ Cloud Concentration (0–2 pts)
Measures dependency on hyperscaler partnerships for training and inference infrastructure. Exclusive partnerships, deep integrations, and high switching costs contribute to fragility.
🏛️ Policy & Geopolitical Exposure (0–2 pts)
Measures sensitivity to regulatory action, export controls, and political shifts. Labs operating across jurisdictions with pending regulations or active investigations score higher.
📈 Demand & Commercialization (0–2 pts)
Measures revenue sustainability and market position risks. Signals of demand weakness, customer churn, or capex/opex overhang relative to adoption increase this score.
🛡️ Resilience Moves (0–2 pts, inverted)
Measures proactive risk mitigation. Multi-sourcing strategies, diversified infrastructure, long-term contracts, and demonstrated redundancy reduce the total fragility score.
🌍 Societal Impact (0–2 pts)
Measures signals of workforce displacement, misinformation amplification, privacy/surveillance concerns, power concentration, and safety incidents attributed to AI deployment.
👥 Talent & Governance (0–2 pts)
Measures leadership instability, key departures, board dysfunction, and organizational governance risks. The historical record contains 15+ high-impact events of this type — the Altman firing, Leike departure, Sutskever exit, and similar episodes — that the original five-dimension checklist could not score.
Checklist Items
Each dimension is scored based on specific, observable indicators:
A) Compute & Chips Dependence
B) Cloud Concentration
C) Policy & Geopolitical Exposure
D) Demand & Commercialization
E) Resilience Moves (Mitigations)
F) Societal Impact
G) Talent & Governance
Evidence Requirements
Each checklist item must be supported by:
- At least one verifiable news event with a source URL
- Date within the decay window (default: 180 days)
- Confidence rating (low/medium/high) based on source reliability
Items without recent supporting evidence automatically expire and are no longer counted in the score.
Decay Window
Evidence expires after 180 days by default unless reaffirmed by a new event. This ensures the tracker reflects current conditions rather than historical snapshots.
When an item is supported by multiple events, the most recent event date determines the expiration.
Contested Items
If contradictory evidence exists for a checklist item (e.g., both signals of lock-in and diversification), the item is marked as "contested" and requires manual review before affecting the score.
Contested items are displayed in the UI with both supporting and contradicting evidence visible.
Data Sources
Fault Line ingests from multiple source tiers to catch signals at different stages:
- Tier 1 — Regulatory & Institutional: SEC EDGAR company filings (Microsoft, Alphabet, Meta), FTC press releases, DOJ press releases, EU AI Office publications. These catch regulatory signals before they reach mainstream news.
- Tier 1 — Lab Blogs: Official blogs from OpenAI, Anthropic, Google DeepMind, and Meta AI.
- Tier 1–2 — Tech & Science News: TechCrunch, VentureBeat, Reuters, MIT Technology Review, Nature, Ars Technica, The Verge, Wired, and Science Magazine.
Data Pipeline
The tracker updates automatically via the following process:
- Ingestion: RSS feeds and curated sources are checked for new articles
- Classification: Articles are mapped to labs, dimensions, and checklist items
- Deduplication: Duplicate events are detected and merged
- Scoring: Checklist states are updated and scores recalculated
- Publication: JSON data files are committed to the repository
The pipeline runs daily via GitHub Actions. Manual event submissions are accepted via GitHub Issues.
Limitations & Caveats
This tracker has important limitations:
- Public information only: Private deals, internal metrics, and unreported events are not captured
- Binary indicators: Nuance and degree are not well represented by pass/fail items
- Lag: News may lag actual events by days or weeks
- Selection bias: Source selection affects what events are captured
- Not predictive: Fragility scores measure exposure, not likelihood of negative outcomes
Users should treat scores as a starting point for analysis, not definitive assessments.
Contributing
This project welcomes contributions. You can help by:
- Submitting news events via GitHub Issues
- Proposing new checklist items or methodology improvements
- Reviewing and validating existing event classifications
- Improving the data pipeline or frontend code
All contributions are subject to review to maintain data quality and methodological consistency.