# The Awareness Fund — Integrated Thesis Overview

**Working title**: theawarenessfund.com
**Document version**: v0.1 (DRAFT THESIS — pre-formal-launch)
**Date**: 2026-05-18
**Partners (co-owners)**: Corey Cottrell · Russell Korus · Jordannah Korus
**Operating partners**:
- **ACG (A-C-Gee)** — AI civilization shipping substrate-discipline as IP — [ai-civ.com](https://ai-civ.com)
- **Pyonair** — quantitative-engineering partner (Stacey Engle + Apex, Pyonair's AI counterpart); owns the back-test engine + PPO Neural Network code

---

> **DRAFT THESIS — pre-formal-launch. This document is for partner and prospective-LP discussion only. It is NOT a solicitation, NOT a private placement memorandum, and NOT investment advice. No fund vehicle yet exists. No price targets or return projections appear in this document by design. Past performance does not predict future results.**

---

## Executive Summary

The Awareness Fund is a 5-vertical, picks-and-shovels equity strategy targeting the **infrastructure** of the AI capex super-cycle — the supply chain underneath, not the applications on top.

- **The shift**: Trillions of dollars of capex are migrating from human knowledge work into AI infrastructure today, and we believe a second wave will migrate from AI infrastructure into robotic physical labor on a roughly 5-year horizon. (Confidence: HIGH on direction; MEDIUM-LOW on point estimates and timing.)
- **The architecture**: Exposure across **5 verticals** — Mining (raw materials) → Energy (electrons) → Chip-fabs (the machines that make the silicon) → AI-Endpoint (data centers, networking, cooling) → SPACE-infra (launch, satellites, defense-prime space exposure, orbital compute optionality). **No model labs. No application-layer AI. No hyperscaler concentration plays.**
- **The discipline**: ACG owns the **ingestion + competing-hypothesis + anti-fabrication substrate**. Pyonair owns the **back-test engine + PPO Neural Network**. The division of labor IS the operational moat — neither party is asked to do what they cannot prove they can do.
- **The test**: A 24-month back-test (2024-05-18 → 2026-05-18), free-tier data first, evaluated against a **4-tier benchmark ladder** (S&P 500 TR / equal-weight S&P / NASDAQ-100 / custom sector basket). The strategy is "interesting" only if it beats **all four tiers** on risk-adjusted basis. Every variation gets published — including losers. (See §5 below + `projects/awareness-fund/capital-markets/sp-comparative-spec.md`.)
- **The honest counter-thesis**: It is entirely possible that the S&P 500 already prices the AI capex shift and that no fund-level alpha exists. This is hypothesis **H6** and is the existential question for the fund. The back-test is designed to disprove it, not to confirm what we already believe.
- **The decision frame for the LP**: This document is the SETUP for the back-test, not the back-test results. The next document an LP receives from us will contain back-test outputs. If our setup-discipline doesn't earn page-2 read from you, the back-test results never will. We would rather you close politely now than waste both our time later.

---

## §1 — The Capex-Shift Thesis

**Claim**: The largest capex reallocation since the railway-to-electric transition is underway. Hyperscalers reported ~$240B of capex in 2024 — dominantly data-center build-out (HIGH; company 10-Ks, EDGAR, Q4 2024). Bottoms-up estimates of incremental US data-center power demand range from +90 GW (DOE/LBNL conservative) to +400 GW (industry-bullish) by 2030 (MEDIUM; DOE/LBNL 2024 update, EPRI 2024 white paper). On the high end, this approaches "+1 terawatt within a decade" — the figure several partners use shorthand. (We frame as 10-year envelope, not 3-5 year, for sourceable defensibility.)

**The 5 verticals at one glance**:

| Vertical | The bottleneck | Cycle length | One-sentence why-now |
|----------|----------------|---------------|----------------------|
| Mining | Discovery-to-pour lead time 8–12 yr; copper, uranium, rare earths supply inelastic at AI-demand timescales | 8–12 yr (late-cycle) | Capex is at multi-year highs but well below 2012 super-cycle peak — room to run; USGS Annual Reports (HIGH); MP Materials 10-K FY2024 (HIGH). |
| Energy | Grid build-out cannot match AI compute demand; transformer lead times stretched from ~50 weeks (2019) to 120+ weeks (2024) | 5–10 yr | Microsoft signed PPA for Three Mile Island restart (Sep 2024 — first US nuclear-restart-for-hyperscaler, HIGH); Talen sold campus capacity to AWS (Mar 2024, HIGH); Dominion 2024 IRP forecasts 85% Virginia load growth by 2039 (HIGH). |
| Chip-fabs | EUV monopoly (ASML), 4-firm WFE oligopoly (~85% market share), CoWoS advanced-packaging sold out through 2026 | 2–3 yr | ASML EUR 28.3B revenue 2024 with ~18-month tool lead times (HIGH, 20-F); CHIPS Act $27.6B finalized in 2024 (HIGH, Commerce Dept). |
| AI-Endpoint | Data-center rack power-density jumped from ~10 kW (2020) to 80+ kW (2024 AI racks); liquid cooling is the new bottleneck | <1 yr | Vertiv FY2024 record year on liquid cooling (HIGH, 10-K); Equinix $8.7B FY2024 revenue with hyperscale JVs growing faster than retail colo (HIGH, 10-K); Arista 35% revenue concentration in two hyperscalers (HIGH, 10-K disclosure). |
| SPACE-Infra | Terrestrial grid + cooling + latency budgets are pushing R&D toward orbital compute; SpaceX Falcon 9 drove launch cost from ~$10K/kg to roughly $2-3K/kg (claimed) | 5–10 yr (with 20+ yr Dyson-swarm optionality) | Lonestar Holdings + Ramon Space hold publicly-announced DoD/NASA orbital-compute contracts (MEDIUM — contracts announced, NOT yet deployed); SpaceX 134 successful Falcon launches in 2024 vs <40 industry-total a decade prior (HIGH, FAA dashboard). |

**The structural feature**: the 5 verticals are **sequentially gated**. Chip-fab equipment must exist before chips exist; chips before AI endpoints are useful; AI endpoints must be powered (energy) and made of materials (mining); SPACE-infra is the deferred-but-inevitable safety valve for terrestrial physical limits. We are not betting on any single vertical. We are betting on the **infrastructure stack** the model-lab AI race requires to continue.

**The hedge property**: if the model labs commoditize AI margins (price wars, open-weight catch-up, DeepSeek-class efficiency gains), picks-and-shovels actually *benefits* — more inference deployments = more endpoints = more power = more materials. We are betting on **AI deployment**, not **AI margin capture**. (Confidence: MEDIUM on the hedge mechanism; HIGH on the directional asymmetry.)

For per-source receipts on every claim above, see `projects/awareness-fund/capital-markets/thesis-substantiation.md` §1–§5.

---

## §2 — The 5-Vertical Picks-and-Shovels Architecture

The full candidate ticker universe (73 names total) lives in `projects/awareness-fund/capital-markets/ticker-universe-spec.md`. Below are the most-defensible representative tickers per vertical, with the explicit counter-argument that could falsify each vertical's thesis.

### Vertical 1 — Mining (raw materials)

**Two-sentence thesis-link**: AI hardware and grid build-out is materials-intensive — copper for interconnect and grid, nickel for batteries, uranium for clean baseload, rare earths for motors and magnets, lithium for storage, silver for PV. Mining capex cycles are 8–12 years from discovery to first pour, so supply elasticity is near-zero on the timescales AI demand needs.

**Representative tickers (7 of 15)**:

| Ticker | Name | Sub-segment | Why included |
|--------|------|-------------|--------------|
| FCX | Freeport-McMoRan | Copper majors (US) | Largest US-listed copper pure-play; tightest AI-grid bottleneck |
| SCCO | Southern Copper | Copper majors (LatAm) | Lowest-cost copper producer globally; long-life reserves |
| RIO | Rio Tinto | Diversified majors | Lithium (Rincon, Jadar) + copper + iron-ore triple play |
| MP | MP Materials | Rare earths (US-domestic) | Only US-domestic mine-to-magnet operator; DoD-aligned |
| CCJ | Cameco | Uranium (Tier-1) | Largest publicly-traded uranium producer; nuclear-baseload-for-AI-DC |
| ALB | Albemarle | Lithium specialty | Largest US-listed lithium producer; battery-grade chemicals |
| PICK | iShares Global Metals & Mining ETF | Diversified basket | Beta benchmark for the vertical |

**Sub-segment breakdown**: copper majors (FCX, SCCO, TECK, RIO, BHP) · rare earths (MP, LYC.AX) · uranium (CCJ, URA, URNM) · lithium + specialty (ALB, SQM, NTR) · diversified (NEM, PICK).

**The honest counter-argument**: commodity-cycle noise typically dominates AI-attribution signal at minimum 2:1 — mining equities have rallied repeatedly on "AI will need copper" and given back gains. Picks-and-shovels framing helps but does not eliminate cycle risk. The 24-month back-test window is **too short to validate** a true late-cycle mining play — recommend tracking as "watch" not "weight" until 2027 data. (See `projects/awareness-fund/research/competing-hypotheses.md` H3.)

### Vertical 2 — Energy (powering the AI economy)

**Two-sentence thesis-link**: US grid needs ~+1 terawatt of new generation over a 10-year horizon to support AI data-center demand (MEDIUM-confidence synthesis from DOE/LBNL 2024 + EPRI 2024). Picks-and-shovels = utility owners of generation/transmission near hyperscaler sites + independent power producers with nuclear/gas baseload + grid-equipment makers (transformers, switchgear, HVDC).

**Representative tickers (7 of 15)**:

| Ticker | Name | Sub-segment | Why included |
|--------|------|-------------|--------------|
| CEG | Constellation Energy | IPP — nuclear pure-play | Largest US nuclear fleet; Three Mile Island restart catalyst (MSFT PPA) |
| VST | Vistra Corp | IPP — nuclear + gas | Largest beneficiary of hyperscaler PPAs (Constellation-pattern) |
| D | Dominion Energy | Utility (Virginia) | Virginia = 70% of US data-center inventory; load-growth pure play |
| NEE | NextEra Energy | Utility + renewables | Largest US utility + largest renewables developer |
| GEV | GE Vernova | Grid + gas-turbine OEM | Spin from GE; gas-turbine + grid-transmission + wind triple play |
| ETN | Eaton | Electrical equipment | Switchgear + data-center electrical infrastructure leader |
| HUBB | Hubbell | Electrical components | T&D + utility-grade connectors; grid build-out beneficiary |

**Sub-segment breakdown**: IPPs (CEG, VST, NRG, TLN) · utilities (NEE, DUK, SO, D) · grid OEMs (GEV, ETN, ABBN.SW, SIEGY, HUBB) · LNG adjacency (LNG) · uranium fuel cycle (URA — vertical overlap with Mining).

**The honest counter-argument**: utility valuations are already pricing this. Vistra and Constellation returned 200-400% in 2024-2025 on AI-energy narrative — entry now is late. Regulatory drag (state PUC approvals) slows utility re-rating. Solar+battery cost curves may flip the storyline before bottleneck binds. Hyperscaler self-build (Stargate, Meta-nuclear partnerships) can bypass public utilities entirely. (See `competing-hypotheses.md` H2.)

### Vertical 3 — Chip-fabs + supply chain

**Two-sentence thesis-link**: The picks-and-shovels of AI compute are NOT the model labs and NOT even the chip designers — they are the **equipment that makes the chips** (lithography, deposition, etch, metrology) and the **specialty materials** (photoresists, gases, wafers, advanced packaging substrates). This is a 4-company oligopoly at the high end (ASML, AMAT, LRCX, KLAC) plus a Japanese duopoly in materials.

**Representative tickers (7 of 15)**:

| Ticker | Name | Sub-segment | Why included |
|--------|------|-------------|--------------|
| ASML | ASML Holding | Lithography — monopoly | Only EUV-lithography supplier globally; literal AI-bottleneck monopoly |
| AMAT | Applied Materials | Deposition + etch | Largest WFE maker; broadest tool portfolio |
| LRCX | Lam Research | Etch + deposition | #2 in etch; memory + advanced-logic exposure |
| KLAC | KLA Corp | Metrology + inspection | Process-control monopoly; advanced-node yield dependency |
| TSM | Taiwan Semi (ADR) | Foundry — pure-play | Largest pure-play foundry; advanced-node AI-chip manufacturer |
| ENTG | Entegris | Specialty materials | Wafer-handling + advanced-process fluids monopolist segments |
| SOXX | iShares Semiconductor ETF | Sector basket | Beta benchmark |

**Sub-segment breakdown**: WFE oligopoly (ASML, AMAT, LRCX, KLAC) · test equipment (TER, 6857.T Advantest) · foundry pure-play (TSM) · IDM with US fabs (INTC) · Japan specialty materials (4063.T Shin-Etsu, 4183.T Mitsui, TOELY Tokyo Electron) · advanced packaging + small-caps (ENTG, AEHR, ONTO).

**Explicit exclusion**: NVDA, AMD, AVGO, MRVL are **chip designers**, not picks-and-shovels — they are **downstream buyers of the supply chain**, not the **supply chain itself**. SOXX captures them as benchmark overlap. If partners want chip-designer exposure, that is a **separate sleeve decision**, not picks-and-shovels.

**The honest counter-argument**: WITHIN the supply chain, the concentrated oligopoly (NVDA design + TSM fab) has captured disproportionate margin vs equipment-makers — NVDA gross margins 75%+ vs ASML 50% vs TSM 53%. Recent 2-year history strongly supports concentration. The 24-month back-test may not reveal the tail-risk that the diversification thesis is hedging against. (See `competing-hypotheses.md` H4.)

### Vertical 4 — AI-Endpoint (data centers, picks-and-shovels only)

**Two-sentence thesis-link**: AI-endpoint = the **physical buildings** + **cooling** + **networking** + **power distribution** where AI compute happens. Explicitly excludes model labs (no Anthropic-proxy, no OpenAI-IPO speculation, no application-layer AI) and hyperscalers (MSFT/GOOGL/AMZN/META are mixed-margin businesses where DC capex is one line item among many).

**Representative tickers (7 of 15)**:

| Ticker | Name | Sub-segment | Why included |
|--------|------|-------------|--------------|
| EQIX | Equinix | DC-REIT (retail colo) | Largest interconnection-dense colo; AI-inference-edge proxy |
| DLR | Digital Realty | DC-REIT (wholesale + retail) | Largest DC-REIT by power |
| VRT | Vertiv | DC infrastructure (power + cooling) | Pure-play DC infrastructure OEM; liquid cooling leader |
| ANET | Arista Networks | DC networking (Ethernet fabric) | Hyperscaler Ethernet-switching leader; alternative to NVDA NVLink |
| COHR | Coherent Corp | Optical components | 800G/1.6T optical transceivers for AI-DC |
| JCI | Johnson Controls | Building HVAC + DC cooling | Building-systems + commercial HVAC; DC-cooling adjacency |
| DTCR | Global X Data Center ETF | DC-REIT basket | Basket benchmark |

**Sub-segment breakdown**: DC-REITs (EQIX, DLR, IRM) · DC infrastructure OEMs (VRT, NVT) · networking + optical (ANET, CSCO, CIEN, COHR, LITE) · servers + IP (SMCI [⚠️ see below], ARM) · cooling adjacency (JCI) · physical-AI bridge optionality (SYM) · basket (DTCR).

**⚠️ Red flag, named openly**: **SMCI** carried an explicit accounting overhang in 2024 — Hindenburg short report (Aug 2024), Ernst & Young auditor resignation (Oct 2024), DOJ subpoena reported, delayed 10-K. We include SMCI in the candidate universe so the back-test can quantify the impact of including/excluding it, but partners should treat it as a name-to-trade-with-caution and likely exclude from any concentrated variation. (See `thesis-substantiation.md` §4 risks table.)

**The honest counter-argument**: data-center capex may be near peak. AWS/GCP/Azure capex-to-revenue ratios are at multi-decade highs. Hyperscaler depreciation is accelerating (8-yr → 6-yr useful-life revisions). DeepSeek-class efficiency improvements can compress training-compute demand. The 2001 telecom-capex analog (Cisco -85% peak-to-trough) is the cautionary tale. Concentrated names like DLR/EQIX/SMCI are most exposed to this. (See `competing-hypotheses.md` H5.)

### Vertical 5 — SPACE-Infra (orbital data centers + the infrastructure to put them there)

**Two-sentence thesis-link**: Terrestrial AI-endpoints are running into hard physical ceilings — grid power, water for cooling, fiber latency budgets, real-estate adjacency to demand. Picks-and-shovels here is commercial launch + reusability, satellite manufacturers + comms-backhaul, defense primes holding DoD/SDA/NASA space-systems contracts (mixed-beta caveat), rad-hardened space-grade compute, and space-solar / Dyson-swarm frontier (20+ year optionality only).

**Representative tickers (7 of 13)**:

| Ticker | Name | Sub-segment | Why included |
|--------|------|-------------|--------------|
| RKLB | Rocket Lab USA | Launch + reusability | Largest publicly-traded launch operator; Neutron reusable-medium-lift |
| IRDM | Iridium Communications | Satellite comms (LEO) | Operational global LEO constellation; DoD-aligned |
| ASTS | AST SpaceMobile | Direct-to-cell satellite | Only public pure-play in cellular-from-space; AT&T/Verizon/Vodafone |
| LMT | Lockheed Martin | Defense prime — space | SDA Tranche, NASA Orion, GPS-III (mixed beta, ~10-30% space exposure) |
| MRCY | Mercury Systems | Rad-hardened compute | Closest public pure-play in space-grade processors |
| KTOS | Kratos Defense | Ground stations + small launch | OpenSpace virtualized ground systems |
| UFO | Procure Space ETF | Diversified basket | Sector benchmark — ~30 holdings |

**Sub-segment breakdown**: launch (RKLB) · satellite comms (IRDM, VSAT, SATS, ASTS) · defense primes with space optionality (LMT, NOC, RTX, BA) · rad-hardened compute + ground (MRCY, KTOS) · baskets (UFO, ARKX).

**Explicit exclusions** (this table is load-bearing for honesty — every LP we pitch will ask "what about SpaceX/Tesla/Maxar?"):

| Excluded exposure | Why excluded |
|-------------------|--------------|
| **SpaceX (private)** | Not publicly tradable; no direct equity exposure exists for retail or fund vehicles. If SpaceX IPOs, it becomes the cornerstone of this vertical. Until then: ZERO direct SpaceX exposure. |
| **TSLA as "Musk-proxy"** | TSLA is a vehicle and energy-storage company that happens to share a CEO with SpaceX. Owning TSLA for SpaceX exposure imports brand-discount, EV-cycle beta, and FSD-narrative volatility unrelated to picks-and-shovels space-infra. If partners want Musk-beta, that is a separate sleeve. |
| **MAXR (Maxar)** | Taken private by Advent International in May 2023 ($6.4B all-cash) — no longer tradable. |
| **ASTR (Astra Space)** | Delisted from Nasdaq in July 2024 — no longer tradable. |
| **Lonestar Holdings, Ramon Space** | Private orbital-compute operators with announced DoD-SDA / NASA contracts (2024). No public equity exposure. If either IPOs, would be highest-conviction pure-play orbital-compute name. Today: track contract-announcements for thesis-evidence only. |
| **Aitech, BAE Space, Cobham** | Private or non-tradable rad-hardened-compute / space-electronics specialists. MRCY is the closest tradable approximation. |
| **CSPP / Caelus / space-solar concept-stage operators** | All concept-stage / feasibility-study tier; no operating revenue. Caltech SSPP June 2023 in-orbit beaming demonstration was watts-class, not kilowatts. ESA SOLARIS is feasibility-only. Not investable today. |

**On the Dyson-swarm horizon**: framed exclusively as **20+ year optionality**. Treat the narrative as a direction the universe is moving (collect energy where it is abundant), not a near-term investable thesis. **Do NOT base any back-test weighting on space-solar deployment timelines.** The closest tradable proxies are the defense primes (NOC, LMT) and the launch-cost-curve compressor (RKLB) — both already in the basket on other grounds.

**The honest counter-argument**: the single most important commercial space company (SpaceX) is not tradable. Defense-prime proxies are mixed-beta (10-30% space revenue typical) — outperformance from LMT/NOC/RTX/BA may reflect defense-cycle dynamics, not space-specific tailwinds. Orbital compute density today is <0.001% of terrestrial; cost-of-cooling-in-vacuum, rad-hardening overhead, and on-orbit servicing remain hard physical gates against near-term scale. ASTS has limited price history (April 2021 SPAC merger) — back-test windows >36 months will have insufficient ASTS coverage.

---

## §3 — What Could Make This Thesis Wrong

The first task of an investor-grade thesis is to name its failure modes. We track **six competing hypotheses** in `projects/awareness-fund/research/competing-hypotheses.md`, four of which are pro-thesis (different paths to the same conclusion) and two of which are anti-thesis (paths by which the fund's premise fails). All confidence levels are PRELIMINARY pending the 24-month back-test.

| # | Hypothesis | Direction | Preliminary confidence | Falsification test |
|---|-----------|-----------|------------------------|---------------------|
| H1 | Diversified picks-and-shovels outperforms concentrated hyperscaler bets | Pro-thesis | MEDIUM-HIGH | Equal-weighted 4-vertical basket Sharpe > top-5 hyperscaler basket Sharpe over 24mo |
| H2 | Energy bottleneck is THE moat (utilities / grid-OEM / nuclear-IPPs dominate) | Pro-thesis | MEDIUM-HIGH | Energy basket Sharpe > broad-AI Sharpe AND FERC interconnect-queue depth is a statistically significant return predictor |
| H3 | Mining is the late-cycle play (2027-2030 inflection) | Pro-thesis (delayed) | LOW-MEDIUM | **Cannot fully test in 24mo** — proxy test: do mining equities show higher beta to AI-capex announcements than to copper futures? |
| H4 | Chip-fab oligopoly favors NVDA/TSM pure-play over equipment-makers (concentration within stack beats diversification) | Counter-thesis (within stack) | MEDIUM | 5-name pure-play basket (NVDA, TSM, AMD, AVGO, MRVL) Sharpe > 5-name equipment basket (ASML, AMAT, LRCX, KLAC, TER) |
| H5 | AI-endpoint capex peaks within 18 months — DC build cycle near top | Counter-thesis | LOW-MEDIUM | AI-endpoint basket shows ≥2 sequential quarters of negative returns concurrent with hyperscaler capex-guidance cuts |
| H6 | The thesis is correct but the S&P 500 already prices it — no fund-level alpha exists | **Anti-thesis (market efficiency)** | MEDIUM | **THE EXISTENTIAL ONE.** Strategy info ratio vs SPY < 0.3 after fees ⇒ fund killed. Info ratio > 0.7 ⇒ H6 falsified. |

**H6 is the single most important test in this entire document.** Every variation we run MUST be compared to SPY total return on risk-adjusted basis. If we cannot beat SPY after fees, the fund has no investor-rational reason to exist and Russell, Jordannah, Corey (with Pyonair input) will say so out loud before any LP says it to us.

**H1 vs H4 is the architectural question** — diversified picks-and-shovels OR concentrated pure-play. The back-test runs both as separate portfolio constructions to discriminate. They are partly contradictory by design.

For per-hypothesis Evidence FOR / Evidence AGAINST / Falsification Test specifications, see `competing-hypotheses.md`.

---

## §4 — How We'll Prove It (Or Refute It)

### The back-test, in one paragraph

A 24-month back-test (2024-05-18 → 2026-05-18) with a 12-month warm-up window, daily-bar OHLC, NYSE calendar, USD-denominated, full point-in-time data discipline (`as_of_ts` ≠ `event_ts`), monthly rebalance default with weekly and quarterly variations tested, transaction-cost model of 5 bps liquid / 15 bps small-cap / +2 bps per 1% ADV market-impact / +10 bps foreign-listed extra cost. Every back-test run produces a manifest, a trade log, a daily NAV series, a position log, a source-data SHA256, a code-version git hash, and the §5 metric set. Stored under `projects/awareness-fund/backtest-runs/YYYY-MM-DD-runID/`. Full spec: `projects/awareness-fund/capital-markets/backtest-protocol-spec.md`.

### The 4-tier benchmark ladder

A back-test variation is "interesting" only if it beats **all four tiers** on risk-adjusted basis. From `sp-comparative-spec.md` §2:

| Tier | Benchmark | Question it answers |
|------|-----------|----------------------|
| T1 | S&P 500 Total Return (`^SP500TR`) | Do we beat the default passive choice? |
| T2 | Equal-weight S&P 500 (`RSP`) | Do we beat the broad market after stripping out the mega-cap AI-tilt already captured? |
| T3 | NASDAQ-100 (`QQQ`) | Do we beat the tech-heavy benchmark that overweights the AI applications layer? |
| T4 | Custom sector basket (25% XLU + 25% XLB + 25% SOXX + 25% DTCR) | Do we beat a naïve picks-and-shovels passive replication? |

Beating T1 only = could be sector beta. Beating T1+T2+T3+T4 on risk-adjusted basis = stock-selection + theme-curation alpha.

### Definitional ladder for "outperformance" (weakest to strongest)

| Claim | Evidence required |
|-------|--------------------|
| L1 | Strategy_CAGR > SPX_CAGR over window |
| L2 | Strategy_Sharpe > SPX_Sharpe AND Strategy_Sortino > SPX_Sortino |
| L3 | L2 holds vs all 4 tiers |
| L4 | L3 holds AND multi-factor alpha (Fama-French 3 + Momentum) is positive |
| L5 | L4 holds AND t-stat on alpha > 2 (or non-zero bootstrap 95% CI) |

**Target**: L3 minimum (beats benchmark ladder on risk-adjusted basis). L4 = "this is a real strategy." L5 = "this is a real strategy with credible statistical inference." We do **not** expect L5 with 24 monthly observations — statistical power is honestly low at this back-test window and we say so in every report. Longer back-test window (5y+) is upgrade path with Polygon paid data.

### Division of labor

| Function | Owner | Rationale |
|----------|-------|-----------|
| Ingestion design + execution | **ACG** (research-lead) | jina-reader + LLM extraction + anti-fabrication-pre-flight discipline |
| Competing-hypothesis framework | **ACG** (research-lead) | scientific-method + critical-thinking skills |
| Ticker universe + back-test protocol | **ACG** (capital-markets-lead) | Already shipped; owns the protocol |
| Back-test engine (vectorized) | **Pyonair** (Stacey + Apex, Pyonair's AI counterpart) | They have it; ACG does not |
| PPO / RL model | **Pyonair** | They are shipping the code |
| Macro / regime overlay | **ACG** (research-lead, V8 ingestion) | FRED is free and scriptable |
| Investor-grade reporting | **ACG** (business-lead pipeline) | Blog + landing-page pipeline works |
| Source-attribution audit | **ACG** (research-lead) | Verifier-as-substrate discipline |

**The division of labor IS the operational moat.** Neither party is asked to do what they cannot prove they can do. ACG has not run a back-test before; we say so out loud, we hand the engine to the team that has, and we own the substrate-discipline layer that we have demonstrated repeatedly in `vendor-substrate-discipline-scorecard` and related public artifacts at ai-civ.com.

### Reportable failure modes (we publish even when ugly)

Per `sp-comparative-spec.md` §6, we publish:

- "Beat S&P 500 cumulative but lost on Sharpe" → strategy added risk
- "Beat by mega-cap concentration" → not stock-selection alpha
- "Beat in one vertical, dragged by another" → asymmetric thesis-validation
- "Beat early, gave back late" → momentum-driven, not durable
- ALL variations published, including losers. ALL benchmarks published, including the unfavorable ones. Survivorship-bias caveats appear in every artifact, not in footnotes.

---

## §5 — Data Strategy

We run free-tier first, then escalate only if the back-test materially needs the upgrade. Detailed in `projects/awareness-fund/research/ingestion-variations-spec.md`; condensed here.

| Layer | Variations | Cost | Why |
|-------|-----------|------|-----|
| **Foundation** | V1 (Yahoo/Stooq daily OHLCV) + V8 (FRED macro / rates / liquidity) | $0 | Free, robust, all back-tests need this |
| **Earnings-signal core** | V2 (SEC EDGAR 10-K/Q corpus) + V5 (LLM-extracted capex / guidance / capital-allocation) | $0–$100/mo (LLM compute) | The capex-shift thesis IS a capex-signal thesis. Filings ARE the data. |
| **Vertical-specific add** | V4 (EIA + FERC + ISO grid data) + V7 (TSMC monthly revenue, ASML/AMAT/LRCX/KLAC book-to-bill, SEMI data) | $0–$200/mo | Highest near-term thesis-conviction verticals (Energy + Chip-fab) |
| **Phase 2** | V3 (earnings-call transcripts) + V6 (mining commodity feeds) | $500/mo (AlphaSense) or $0 (scrape) | Add if Phase 1 underperforms on signal-density |
| **Defer** | V9 (Reddit/Stocktwits sentiment) + V10 (Federal Register / CHIPS Act / IRA policy events) | $0 | V9 too noisy for thesis-grade; V10 hard to systematize |

**Composite cost estimate**: $0–$200/mo (free APIs + LLM compute for V5). **Composite build effort**: 2–4 engineer-weeks (mostly V2 + V5 NER/extraction pipeline).

**Cross-variation quality controls** (mandatory, from `ingestion-variations-spec.md`):
1. Look-ahead-bias audit on every series (`as_of_ts` ≠ `event_ts`).
2. Survivorship-bias correction (universe includes delisted/acquired tickers).
3. Anti-fabrication pre-flight on every LLM-extracted number (V2/V3/V5 carry highest fabrication risk).
4. Source-of-evidence column on every feature row (`source_url` + `extracted_ts`).
5. Date-of-source discipline — every source explicitly dated.

**Honest gap**: we do not yet have a survivorship-bias-corrected ticker universe. CRSP-like dataset is $500+/mo academic tier and is in the v2 upgrade path, not v1.

---

## §6 — The Substrate-Discipline Difference

This is the section that explains why ACG's involvement should matter to a sophisticated LP.

Most fund-management organizations cannot tell you, in detail, why they chose the data sources they chose, what hypotheses they ruled out and why, what their false-positive rate is on LLM-extracted numbers, how they audit their own work, or what would make them publicly retract a claim. The Awareness Fund's substrate-vendor-of-record (ACG) ships exactly this kind of operational rubric **publicly** at [ai-civ.com](https://ai-civ.com) — including:

- **Vendor Substrate-Discipline Scorecard** (10-dimension rubric) — the same operational discipline used to evaluate external vendors is applied to ACG's own work, with explicit named gaps where ACG underperforms its own standard. (Published 2026-05-17.)
- **Scientific-method skill** + **critical-thinking skill** (federation-IP, downloadable) — operational decision-substrate for separating claim from evidence, surfacing hidden assumptions, and detecting self-grading.
- **Anti-fabrication pre-flight** discipline — mandatory before any LLM-extracted number enters production. Stage 5 freshness-gate catches stale-data fabrication. (v1.1 shipped 2026-05-14.)
- **Transcription-not-paraphrase** discipline — verbatim preservation of human-spoken words for any chapter, customer-facing acknowledgment, or human-words-passing-through transformation. Failure-mode discipline at the language layer.
- **Cross-grading-as-substrate** — every claim entered as "integrated" requires verification receipt (grep, stat, or git-diff) or `legacy_pre_amendment` flag. Structural, not aspirational. (v1.1 schema shipped 2026-05-14.)
- **System > Symptom** doctrine — when something breaks, the fix is to the system that allowed it, not to the symptom. Codified after multiple operational incidents.

**The meta-thesis**: an LP investing in The Awareness Fund is also investing in a fund whose substrate-vendor-of-record uses the same operational discipline the LP wishes their existing PE managers used. The substrate-discipline IS the operational moat against the kind of self-deception that destroys most quantitative strategies.

**The LP-readable claim**: the same vendor-substrate-discipline-scorecard ACG publishes publicly is the rubric we have been asked to apply to ourselves on this fund's back-test work. We will publish that self-assessment when the back-test results ship. If it is ugly, we will publish that too. The discipline of publishing the ugly self-grade is, itself, the alpha.

For the public artifacts, see ai-civ.com/blog/ (substrate-discipline-scorecard post, federation-IP downloads).

---

## §7 — What Happens Next

| Step | Owner | Trigger / dependency |
|------|-------|----------------------|
| 1. Pyonair PPO code arrives | Pyonair (Stacey + Apex) | Pending — assumed-coming, not delivered |
| 2. ACG kicks off ingestion pipeline (V1+V8+V2+V5) | ACG (research-lead + mind-lead) | Independent of step 1; can start now |
| 3. Back-test engine scaffolded at `projects/awareness-fund/backtest/` | ACG (mind-lead) once Pyonair engine arrives, else mind-lead writes minimal vectorized harness | Step 1 OR independent fallback |
| 4. Variation harness (V1–V10 from `ingestion-variations-spec.md`) runnable from CLI | ACG (mind-lead) + capital-markets-lead acceptance tests | Step 2 + Step 3 |
| 5. First back-test results — full §4 metric set, all 4-tier benchmark ladder | capital-markets-lead | Step 4 |
| 6. Substrate-discipline self-assessment published alongside results | ACG (business-lead) | Step 5 |
| 7. Partner walk-through with Corey + Russell + Jordannah (with Pyonair input) | All | Step 6 |
| 8. Iteration on variations + overlays per partner feedback | All | Ongoing |
| 9. First LP conversations (using THIS document + back-test results) | Russell, with Corey + Jordannah (+ Pyonair input) | Step 7 |

**Total v1 dev estimate from `backtest-protocol-spec.md` §8**: 8–11 active days, all on free-tier data.

**Acceptance criteria for v1 back-test completion** (`backtest-protocol-spec.md` §9):
1. All variations run end-to-end on free-tier data
2. All variations reproduce identical results from a clean re-run (deterministic seeding)
3. All §4 metrics computed and present in output report
4. Source-data hashes + code-version hashes embedded in every run
5. Survivorship-bias caveat explicit in every output
6. "DRAFT TEMPLATE — REQUIRES EXPERT REVIEW BEFORE PRODUCTION USE" disclaimer on every artifact
7. Partner-readable markdown report generated automatically
8. A blind expert-reviewer (e.g., a portfolio manager friend of Russell's) can read the protocol + report and reproduce the headline number to ±10bps

---

## §8 — Honest Gaps + Open Questions

We name these openly because diligence-grade LPs will discover them anyway. We would rather be the source than be caught.

| Gap | Substance | Mitigation / path |
|-----|-----------|--------------------|
| **ACG has no native back-test engine** | We have a yfinance-class CLI ingester, generic ingestion plumbing (`tools/ago/ingest/`), and strong LLM-extraction discipline — but **zero equity portfolio back-test capability** in the repo today (audited 2026-05-18, `backtest-protocol-spec.md` §0). | Pyonair owns this. The division of labor IS the design. If Pyonair PPO code does not arrive in a reasonable timeframe, ACG (mind-lead) will write a minimal vectorized harness as fallback — but Pyonair-built is preferred. |
| **No transaction-cost model beyond a simple linear-impact stub** | Critical for honest Sharpe / info ratio numbers at fund-AUM scale. | A 5/15/+2/+10 bps cost model is specified (`backtest-protocol-spec.md` §4.4); calibrate from realized spreads in Polygon paid data if upgraded. |
| **No survivorship-bias-corrected ticker universe** | All current scraping is "what's listed today." Historical delistings are missing. | Manual delisting enumeration for the 24-month window (research-lead's deliverable). CRSP-like dataset ($500+/mo academic tier) is v2 upgrade path. |
| **Works (federation's financial-civ) is currently DOWN** | Works (Kimi K2.6) is the sister-civ with deepest financial-domain depth. Factor-construction critique is unavailable until restart. | Hengshi (Qwen) is healthy and can serve as cross-grading peer for research outputs. Works restart pending. |
| **TG/blog instrumentation lag** | Subscriber open/click instrumentation for ai-civ.com blog posts is not yet wired. Cannot measure LP-funnel response to publicly-shipped substrate-discipline IP today. | Slot-4 instrumentation target carried from 2026-05-17; unmet as of this document's date. |
| **Pyonair PPO code has not yet arrived** | All RL/PPO research on the ACG side is preparatory; integration will happen when code lands. | Asynchronous timeline. Independent ACG work (ingestion + protocol + universe) is ungated and proceeds. |
| **Time-zone alignment with Pyonair team unknown** | Async coordination assumed. | TGIM substrate (cross-civ task platform) is the standing wire if cadence becomes a friction point. |
| **No live brokerage / execution layer** | Back-test analytics only. Live execution is fund-back-end work outside ACG's scope. | Russell + Pyonair territory. |
| **No risk-management / compliance / regulatory framework** | When fund formally launches, this is a hard external dependency. | Counsel + compliance vendor to be retained at fund-formation stage. Out of scope for this thesis-overview. |
| **24-month back-test window is too short to validate H3 (late-cycle mining)** | Mining capex cycle is 8–12 years; equity markets typically anticipate by 18-30 months. The thesis we believe most strongly in (H3 supports it) is the one the back-test can least validate. | Proxy test in v1 (mining-equity-beta-to-AI-capex-announcements). Full validation requires 2028+ data. |
| **Statistical power at 24 monthly observations is honestly low** | t-stats and bootstrap CIs are reported as substrate for partner discussion, NOT as inference-grade evidence. | Treat as descriptive, not inferential. Honest framing in every report. Longer back-test window (5y+) is Polygon paid upgrade path. |

---

## §9 — Disclaimers (full)

This document is a **DRAFT THESIS OVERVIEW** for pre-launch partner and prospective-LP discussion only. It is **NOT** a solicitation to invest, **NOT** a private placement memorandum, **NOT** investment advice, and **NOT** an offer of securities. No fund vehicle yet exists. No price targets, return projections, or performance forecasts appear in this document by design.

Past performance does not predict future results. Back-tests are hypothetical, derived from historical price data, and do not reflect actual trading, real-world transaction costs, taxes, or fund-management fees. Survivorship bias, selection bias, and look-ahead bias may be material despite the controls described in §5. Forward-looking statements derived from third-party sources carry their authors' biases and should be independently verified before any investment decision.

Investing in equities (including any future Awareness Fund vehicle) involves risk, including risk of total loss. Concentrated thematic strategies carry higher volatility than diversified passive index strategies. The capex-shift thesis described in this document may be wrong (see §3, particularly H6). The information in this document is current as of 2026-05-18 and is subject to revision without notice.

All quantitative claims are confidence-tagged HIGH/MEDIUM/LOW per the rubric in `projects/awareness-fund/capital-markets/thesis-substantiation.md`. LPs and partners must conduct their own due diligence and consult independent legal, tax, and investment counsel before making any commitment.

---

## §10 — Document Map (for the LP who wants to drill down)

| If you want to verify... | Read |
|--------------------------|------|
| The full 73-ticker candidate universe with sub-segment tags + data-tier requirements | `projects/awareness-fund/capital-markets/ticker-universe-spec.md` |
| The back-test design (time window, data sources, rebalance rules, transaction costs, risk overlays, reproducibility requirements) | `projects/awareness-fund/capital-markets/backtest-protocol-spec.md` |
| The S&P comparative methodology (4-tier benchmark ladder, L1-L5 outperformance ladder, attribution decomposition, statistical-significance discipline) | `projects/awareness-fund/capital-markets/sp-comparative-spec.md` |
| Per-vertical sourced observations with confidence tags and explicit risks (including SMCI accounting overhang, SpaceX-private exclusion, defense-prime mixed-beta caveats) | `projects/awareness-fund/capital-markets/thesis-substantiation.md` |
| The 10 candidate ingestion variations with cost/coverage/risk-of-failure per variation | `projects/awareness-fund/research/ingestion-variations-spec.md` |
| The 6 competing hypotheses (including H6 anti-thesis) with falsification tests | `projects/awareness-fund/research/competing-hypotheses.md` |
| The honest audit of what ACG can deliver vs what we must source from partners | `projects/awareness-fund/research/capability-inventory.md` |

---

**Document status**: v0.1 DRAFT — synthesized from 8 parallel-shipped spec sheets (2026-05-18). Awaiting partner review before any LP distribution. Authored by ACG business-lead. Co-owners (humans): Corey Cottrell (ACG) · Russell Korus (AiCIV Inc / Keel) · Jordannah Korus (Korus Consulting Inc). Operating partner: Pyonair (Stacey Engle, with Apex — Pyonair's AI counterpart — collaborating on substrate-side coordination).

**For partner questions, comments, or counter-evidence**: route through Corey for the ACG-side; route through Russell or Jordannah for the Korus-side; route through Stacey for the Pyonair-side. Cross-grading welcomed and expected.
