The Awareness Fund · full thesis
Draft Pre-formal-launch. Not a solicitation. Not a PPM. Not investment advice. No fund vehicle exists yet. v0.1 — 2026-05-18.

The Awareness Fund — Integrated Thesis Overview

Working title: theawarenessfund.com Document version: v0.1 (DRAFT THESIS — pre-formal-launch) Date: 2026-05-18 Partners (co-owners): Corey Cottrell · Russell Korus · Jordannah Korus Operating partners: - ACG (A-C-Gee) — AI civilization shipping substrate-discipline as IP — ai-civ.com - Pyonair — quantitative-engineering partner (Stacey Engle + Apex, Pyonair's AI counterpart); owns the back-test engine + PPO Neural Network code


DRAFT THESIS — pre-formal-launch. This document is for partner and prospective-LP discussion only. It is NOT a solicitation, NOT a private placement memorandum, and NOT investment advice. No fund vehicle yet exists. No price targets or return projections appear in this document by design. Past performance does not predict future results.


Executive Summary

The Awareness Fund is a 5-vertical, picks-and-shovels equity strategy targeting the infrastructure of the AI capex super-cycle — the supply chain underneath, not the applications on top.


§1 — The Capex-Shift Thesis

Claim: The largest capex reallocation since the railway-to-electric transition is underway. Hyperscalers reported ~$240B of capex in 2024 — dominantly data-center build-out (HIGH; company 10-Ks, EDGAR, Q4 2024). Bottoms-up estimates of incremental US data-center power demand range from +90 GW (DOE/LBNL conservative) to +400 GW (industry-bullish) by 2030 (MEDIUM; DOE/LBNL 2024 update, EPRI 2024 white paper). On the high end, this approaches "+1 terawatt within a decade" — the figure several partners use shorthand. (We frame as 10-year envelope, not 3-5 year, for sourceable defensibility.)

The 5 verticals at one glance:

Vertical The bottleneck Cycle length One-sentence why-now
Mining Discovery-to-pour lead time 8–12 yr; copper, uranium, rare earths supply inelastic at AI-demand timescales 8–12 yr (late-cycle) Capex is at multi-year highs but well below 2012 super-cycle peak — room to run; USGS Annual Reports (HIGH); MP Materials 10-K FY2024 (HIGH).
Energy Grid build-out cannot match AI compute demand; transformer lead times stretched from ~50 weeks (2019) to 120+ weeks (2024) 5–10 yr Microsoft signed PPA for Three Mile Island restart (Sep 2024 — first US nuclear-restart-for-hyperscaler, HIGH); Talen sold campus capacity to AWS (Mar 2024, HIGH); Dominion 2024 IRP forecasts 85% Virginia load growth by 2039 (HIGH).
Chip-fabs EUV monopoly (ASML), 4-firm WFE oligopoly (~85% market share), CoWoS advanced-packaging sold out through 2026 2–3 yr ASML EUR 28.3B revenue 2024 with ~18-month tool lead times (HIGH, 20-F); CHIPS Act $27.6B finalized in 2024 (HIGH, Commerce Dept).
AI-Endpoint Data-center rack power-density jumped from ~10 kW (2020) to 80+ kW (2024 AI racks); liquid cooling is the new bottleneck <1 yr Vertiv FY2024 record year on liquid cooling (HIGH, 10-K); Equinix $8.7B FY2024 revenue with hyperscale JVs growing faster than retail colo (HIGH, 10-K); Arista 35% revenue concentration in two hyperscalers (HIGH, 10-K disclosure).
SPACE-Infra Terrestrial grid + cooling + latency budgets are pushing R&D toward orbital compute; SpaceX Falcon 9 drove launch cost from ~$10K/kg to roughly $2-3K/kg (claimed) 5–10 yr (with 20+ yr Dyson-swarm optionality) Lonestar Holdings + Ramon Space hold publicly-announced DoD/NASA orbital-compute contracts (MEDIUM — contracts announced, NOT yet deployed); SpaceX 134 successful Falcon launches in 2024 vs <40 industry-total a decade prior (HIGH, FAA dashboard).

The structural feature: the 5 verticals are sequentially gated. Chip-fab equipment must exist before chips exist; chips before AI endpoints are useful; AI endpoints must be powered (energy) and made of materials (mining); SPACE-infra is the deferred-but-inevitable safety valve for terrestrial physical limits. We are not betting on any single vertical. We are betting on the infrastructure stack the model-lab AI race requires to continue.

The hedge property: if the model labs commoditize AI margins (price wars, open-weight catch-up, DeepSeek-class efficiency gains), picks-and-shovels actually benefits — more inference deployments = more endpoints = more power = more materials. We are betting on AI deployment, not AI margin capture. (Confidence: MEDIUM on the hedge mechanism; HIGH on the directional asymmetry.)

For per-source receipts on every claim above, see projects/awareness-fund/capital-markets/thesis-substantiation.md §1–§5.


§2 — The 5-Vertical Picks-and-Shovels Architecture

The full candidate ticker universe (73 names total) lives in projects/awareness-fund/capital-markets/ticker-universe-spec.md. Below are the most-defensible representative tickers per vertical, with the explicit counter-argument that could falsify each vertical's thesis.

Vertical 1 — Mining (raw materials)

Two-sentence thesis-link: AI hardware and grid build-out is materials-intensive — copper for interconnect and grid, nickel for batteries, uranium for clean baseload, rare earths for motors and magnets, lithium for storage, silver for PV. Mining capex cycles are 8–12 years from discovery to first pour, so supply elasticity is near-zero on the timescales AI demand needs.

Representative tickers (7 of 15):

Ticker Name Sub-segment Why included
FCX Freeport-McMoRan Copper majors (US) Largest US-listed copper pure-play; tightest AI-grid bottleneck
SCCO Southern Copper Copper majors (LatAm) Lowest-cost copper producer globally; long-life reserves
RIO Rio Tinto Diversified majors Lithium (Rincon, Jadar) + copper + iron-ore triple play
MP MP Materials Rare earths (US-domestic) Only US-domestic mine-to-magnet operator; DoD-aligned
CCJ Cameco Uranium (Tier-1) Largest publicly-traded uranium producer; nuclear-baseload-for-AI-DC
ALB Albemarle Lithium specialty Largest US-listed lithium producer; battery-grade chemicals
PICK iShares Global Metals & Mining ETF Diversified basket Beta benchmark for the vertical

Sub-segment breakdown: copper majors (FCX, SCCO, TECK, RIO, BHP) · rare earths (MP, LYC.AX) · uranium (CCJ, URA, URNM) · lithium + specialty (ALB, SQM, NTR) · diversified (NEM, PICK).

The honest counter-argument: commodity-cycle noise typically dominates AI-attribution signal at minimum 2:1 — mining equities have rallied repeatedly on "AI will need copper" and given back gains. Picks-and-shovels framing helps but does not eliminate cycle risk. The 24-month back-test window is too short to validate a true late-cycle mining play — recommend tracking as "watch" not "weight" until 2027 data. (See projects/awareness-fund/research/competing-hypotheses.md H3.)

Vertical 2 — Energy (powering the AI economy)

Two-sentence thesis-link: US grid needs ~+1 terawatt of new generation over a 10-year horizon to support AI data-center demand (MEDIUM-confidence synthesis from DOE/LBNL 2024 + EPRI 2024). Picks-and-shovels = utility owners of generation/transmission near hyperscaler sites + independent power producers with nuclear/gas baseload + grid-equipment makers (transformers, switchgear, HVDC).

Representative tickers (7 of 15):

Ticker Name Sub-segment Why included
CEG Constellation Energy IPP — nuclear pure-play Largest US nuclear fleet; Three Mile Island restart catalyst (MSFT PPA)
VST Vistra Corp IPP — nuclear + gas Largest beneficiary of hyperscaler PPAs (Constellation-pattern)
D Dominion Energy Utility (Virginia) Virginia = 70% of US data-center inventory; load-growth pure play
NEE NextEra Energy Utility + renewables Largest US utility + largest renewables developer
GEV GE Vernova Grid + gas-turbine OEM Spin from GE; gas-turbine + grid-transmission + wind triple play
ETN Eaton Electrical equipment Switchgear + data-center electrical infrastructure leader
HUBB Hubbell Electrical components T&D + utility-grade connectors; grid build-out beneficiary

Sub-segment breakdown: IPPs (CEG, VST, NRG, TLN) · utilities (NEE, DUK, SO, D) · grid OEMs (GEV, ETN, ABBN.SW, SIEGY, HUBB) · LNG adjacency (LNG) · uranium fuel cycle (URA — vertical overlap with Mining).

The honest counter-argument: utility valuations are already pricing this. Vistra and Constellation returned 200-400% in 2024-2025 on AI-energy narrative — entry now is late. Regulatory drag (state PUC approvals) slows utility re-rating. Solar+battery cost curves may flip the storyline before bottleneck binds. Hyperscaler self-build (Stargate, Meta-nuclear partnerships) can bypass public utilities entirely. (See competing-hypotheses.md H2.)

Vertical 3 — Chip-fabs + supply chain

Two-sentence thesis-link: The picks-and-shovels of AI compute are NOT the model labs and NOT even the chip designers — they are the equipment that makes the chips (lithography, deposition, etch, metrology) and the specialty materials (photoresists, gases, wafers, advanced packaging substrates). This is a 4-company oligopoly at the high end (ASML, AMAT, LRCX, KLAC) plus a Japanese duopoly in materials.

Representative tickers (7 of 15):

Ticker Name Sub-segment Why included
ASML ASML Holding Lithography — monopoly Only EUV-lithography supplier globally; literal AI-bottleneck monopoly
AMAT Applied Materials Deposition + etch Largest WFE maker; broadest tool portfolio
LRCX Lam Research Etch + deposition #2 in etch; memory + advanced-logic exposure
KLAC KLA Corp Metrology + inspection Process-control monopoly; advanced-node yield dependency
TSM Taiwan Semi (ADR) Foundry — pure-play Largest pure-play foundry; advanced-node AI-chip manufacturer
ENTG Entegris Specialty materials Wafer-handling + advanced-process fluids monopolist segments
SOXX iShares Semiconductor ETF Sector basket Beta benchmark

Sub-segment breakdown: WFE oligopoly (ASML, AMAT, LRCX, KLAC) · test equipment (TER, 6857.T Advantest) · foundry pure-play (TSM) · IDM with US fabs (INTC) · Japan specialty materials (4063.T Shin-Etsu, 4183.T Mitsui, TOELY Tokyo Electron) · advanced packaging + small-caps (ENTG, AEHR, ONTO).

Explicit exclusion: NVDA, AMD, AVGO, MRVL are chip designers, not picks-and-shovels — they are downstream buyers of the supply chain, not the supply chain itself. SOXX captures them as benchmark overlap. If partners want chip-designer exposure, that is a separate sleeve decision, not picks-and-shovels.

The honest counter-argument: WITHIN the supply chain, the concentrated oligopoly (NVDA design + TSM fab) has captured disproportionate margin vs equipment-makers — NVDA gross margins 75%+ vs ASML 50% vs TSM 53%. Recent 2-year history strongly supports concentration. The 24-month back-test may not reveal the tail-risk that the diversification thesis is hedging against. (See competing-hypotheses.md H4.)

Vertical 4 — AI-Endpoint (data centers, picks-and-shovels only)

Two-sentence thesis-link: AI-endpoint = the physical buildings + cooling + networking + power distribution where AI compute happens. Explicitly excludes model labs (no Anthropic-proxy, no OpenAI-IPO speculation, no application-layer AI) and hyperscalers (MSFT/GOOGL/AMZN/META are mixed-margin businesses where DC capex is one line item among many).

Representative tickers (7 of 15):

Ticker Name Sub-segment Why included
EQIX Equinix DC-REIT (retail colo) Largest interconnection-dense colo; AI-inference-edge proxy
DLR Digital Realty DC-REIT (wholesale + retail) Largest DC-REIT by power
VRT Vertiv DC infrastructure (power + cooling) Pure-play DC infrastructure OEM; liquid cooling leader
ANET Arista Networks DC networking (Ethernet fabric) Hyperscaler Ethernet-switching leader; alternative to NVDA NVLink
COHR Coherent Corp Optical components 800G/1.6T optical transceivers for AI-DC
JCI Johnson Controls Building HVAC + DC cooling Building-systems + commercial HVAC; DC-cooling adjacency
DTCR Global X Data Center ETF DC-REIT basket Basket benchmark

Sub-segment breakdown: DC-REITs (EQIX, DLR, IRM) · DC infrastructure OEMs (VRT, NVT) · networking + optical (ANET, CSCO, CIEN, COHR, LITE) · servers + IP (SMCI [⚠️ see below], ARM) · cooling adjacency (JCI) · physical-AI bridge optionality (SYM) · basket (DTCR).

⚠️ Red flag, named openly: SMCI carried an explicit accounting overhang in 2024 — Hindenburg short report (Aug 2024), Ernst & Young auditor resignation (Oct 2024), DOJ subpoena reported, delayed 10-K. We include SMCI in the candidate universe so the back-test can quantify the impact of including/excluding it, but partners should treat it as a name-to-trade-with-caution and likely exclude from any concentrated variation. (See thesis-substantiation.md §4 risks table.)

The honest counter-argument: data-center capex may be near peak. AWS/GCP/Azure capex-to-revenue ratios are at multi-decade highs. Hyperscaler depreciation is accelerating (8-yr → 6-yr useful-life revisions). DeepSeek-class efficiency improvements can compress training-compute demand. The 2001 telecom-capex analog (Cisco -85% peak-to-trough) is the cautionary tale. Concentrated names like DLR/EQIX/SMCI are most exposed to this. (See competing-hypotheses.md H5.)

Vertical 5 — SPACE-Infra (orbital data centers + the infrastructure to put them there)

Two-sentence thesis-link: Terrestrial AI-endpoints are running into hard physical ceilings — grid power, water for cooling, fiber latency budgets, real-estate adjacency to demand. Picks-and-shovels here is commercial launch + reusability, satellite manufacturers + comms-backhaul, defense primes holding DoD/SDA/NASA space-systems contracts (mixed-beta caveat), rad-hardened space-grade compute, and space-solar / Dyson-swarm frontier (20+ year optionality only).

Representative tickers (7 of 13):

Ticker Name Sub-segment Why included
RKLB Rocket Lab USA Launch + reusability Largest publicly-traded launch operator; Neutron reusable-medium-lift
IRDM Iridium Communications Satellite comms (LEO) Operational global LEO constellation; DoD-aligned
ASTS AST SpaceMobile Direct-to-cell satellite Only public pure-play in cellular-from-space; AT&T/Verizon/Vodafone
LMT Lockheed Martin Defense prime — space SDA Tranche, NASA Orion, GPS-III (mixed beta, ~10-30% space exposure)
MRCY Mercury Systems Rad-hardened compute Closest public pure-play in space-grade processors
KTOS Kratos Defense Ground stations + small launch OpenSpace virtualized ground systems
UFO Procure Space ETF Diversified basket Sector benchmark — ~30 holdings

Sub-segment breakdown: launch (RKLB) · satellite comms (IRDM, VSAT, SATS, ASTS) · defense primes with space optionality (LMT, NOC, RTX, BA) · rad-hardened compute + ground (MRCY, KTOS) · baskets (UFO, ARKX).

Explicit exclusions (this table is load-bearing for honesty — every LP we pitch will ask "what about SpaceX/Tesla/Maxar?"):

Excluded exposure Why excluded
SpaceX (private) Not publicly tradable; no direct equity exposure exists for retail or fund vehicles. If SpaceX IPOs, it becomes the cornerstone of this vertical. Until then: ZERO direct SpaceX exposure.
TSLA as "Musk-proxy" TSLA is a vehicle and energy-storage company that happens to share a CEO with SpaceX. Owning TSLA for SpaceX exposure imports brand-discount, EV-cycle beta, and FSD-narrative volatility unrelated to picks-and-shovels space-infra. If partners want Musk-beta, that is a separate sleeve.
MAXR (Maxar) Taken private by Advent International in May 2023 ($6.4B all-cash) — no longer tradable.
ASTR (Astra Space) Delisted from Nasdaq in July 2024 — no longer tradable.
Lonestar Holdings, Ramon Space Private orbital-compute operators with announced DoD-SDA / NASA contracts (2024). No public equity exposure. If either IPOs, would be highest-conviction pure-play orbital-compute name. Today: track contract-announcements for thesis-evidence only.
Aitech, BAE Space, Cobham Private or non-tradable rad-hardened-compute / space-electronics specialists. MRCY is the closest tradable approximation.
CSPP / Caelus / space-solar concept-stage operators All concept-stage / feasibility-study tier; no operating revenue. Caltech SSPP June 2023 in-orbit beaming demonstration was watts-class, not kilowatts. ESA SOLARIS is feasibility-only. Not investable today.

On the Dyson-swarm horizon: framed exclusively as 20+ year optionality. Treat the narrative as a direction the universe is moving (collect energy where it is abundant), not a near-term investable thesis. Do NOT base any back-test weighting on space-solar deployment timelines. The closest tradable proxies are the defense primes (NOC, LMT) and the launch-cost-curve compressor (RKLB) — both already in the basket on other grounds.

The honest counter-argument: the single most important commercial space company (SpaceX) is not tradable. Defense-prime proxies are mixed-beta (10-30% space revenue typical) — outperformance from LMT/NOC/RTX/BA may reflect defense-cycle dynamics, not space-specific tailwinds. Orbital compute density today is <0.001% of terrestrial; cost-of-cooling-in-vacuum, rad-hardening overhead, and on-orbit servicing remain hard physical gates against near-term scale. ASTS has limited price history (April 2021 SPAC merger) — back-test windows >36 months will have insufficient ASTS coverage.


§3 — What Could Make This Thesis Wrong

The first task of an investor-grade thesis is to name its failure modes. We track six competing hypotheses in projects/awareness-fund/research/competing-hypotheses.md, four of which are pro-thesis (different paths to the same conclusion) and two of which are anti-thesis (paths by which the fund's premise fails). All confidence levels are PRELIMINARY pending the 24-month back-test.

# Hypothesis Direction Preliminary confidence Falsification test
H1 Diversified picks-and-shovels outperforms concentrated hyperscaler bets Pro-thesis MEDIUM-HIGH Equal-weighted 4-vertical basket Sharpe > top-5 hyperscaler basket Sharpe over 24mo
H2 Energy bottleneck is THE moat (utilities / grid-OEM / nuclear-IPPs dominate) Pro-thesis MEDIUM-HIGH Energy basket Sharpe > broad-AI Sharpe AND FERC interconnect-queue depth is a statistically significant return predictor
H3 Mining is the late-cycle play (2027-2030 inflection) Pro-thesis (delayed) LOW-MEDIUM Cannot fully test in 24mo — proxy test: do mining equities show higher beta to AI-capex announcements than to copper futures?
H4 Chip-fab oligopoly favors NVDA/TSM pure-play over equipment-makers (concentration within stack beats diversification) Counter-thesis (within stack) MEDIUM 5-name pure-play basket (NVDA, TSM, AMD, AVGO, MRVL) Sharpe > 5-name equipment basket (ASML, AMAT, LRCX, KLAC, TER)
H5 AI-endpoint capex peaks within 18 months — DC build cycle near top Counter-thesis LOW-MEDIUM AI-endpoint basket shows ≥2 sequential quarters of negative returns concurrent with hyperscaler capex-guidance cuts
H6 The thesis is correct but the S&P 500 already prices it — no fund-level alpha exists Anti-thesis (market efficiency) MEDIUM THE EXISTENTIAL ONE. Strategy info ratio vs SPY < 0.3 after fees ⇒ fund killed. Info ratio > 0.7 ⇒ H6 falsified.

H6 is the single most important test in this entire document. Every variation we run MUST be compared to SPY total return on risk-adjusted basis. If we cannot beat SPY after fees, the fund has no investor-rational reason to exist and Russell, Jordannah, Corey (with Pyonair input) will say so out loud before any LP says it to us.

H1 vs H4 is the architectural question — diversified picks-and-shovels OR concentrated pure-play. The back-test runs both as separate portfolio constructions to discriminate. They are partly contradictory by design.

For per-hypothesis Evidence FOR / Evidence AGAINST / Falsification Test specifications, see competing-hypotheses.md.


§4 — How We'll Prove It (Or Refute It)

The back-test, in one paragraph

A 24-month back-test (2024-05-18 → 2026-05-18) with a 12-month warm-up window, daily-bar OHLC, NYSE calendar, USD-denominated, full point-in-time data discipline (as_of_tsevent_ts), monthly rebalance default with weekly and quarterly variations tested, transaction-cost model of 5 bps liquid / 15 bps small-cap / +2 bps per 1% ADV market-impact / +10 bps foreign-listed extra cost. Every back-test run produces a manifest, a trade log, a daily NAV series, a position log, a source-data SHA256, a code-version git hash, and the §5 metric set. Stored under projects/awareness-fund/backtest-runs/YYYY-MM-DD-runID/. Full spec: projects/awareness-fund/capital-markets/backtest-protocol-spec.md.

The 4-tier benchmark ladder

A back-test variation is "interesting" only if it beats all four tiers on risk-adjusted basis. From sp-comparative-spec.md §2:

Tier Benchmark Question it answers
T1 S&P 500 Total Return (^SP500TR) Do we beat the default passive choice?
T2 Equal-weight S&P 500 (RSP) Do we beat the broad market after stripping out the mega-cap AI-tilt already captured?
T3 NASDAQ-100 (QQQ) Do we beat the tech-heavy benchmark that overweights the AI applications layer?
T4 Custom sector basket (25% XLU + 25% XLB + 25% SOXX + 25% DTCR) Do we beat a naïve picks-and-shovels passive replication?

Beating T1 only = could be sector beta. Beating T1+T2+T3+T4 on risk-adjusted basis = stock-selection + theme-curation alpha.

Definitional ladder for "outperformance" (weakest to strongest)

Claim Evidence required
L1 Strategy_CAGR > SPX_CAGR over window
L2 Strategy_Sharpe > SPX_Sharpe AND Strategy_Sortino > SPX_Sortino
L3 L2 holds vs all 4 tiers
L4 L3 holds AND multi-factor alpha (Fama-French 3 + Momentum) is positive
L5 L4 holds AND t-stat on alpha > 2 (or non-zero bootstrap 95% CI)

Target: L3 minimum (beats benchmark ladder on risk-adjusted basis). L4 = "this is a real strategy." L5 = "this is a real strategy with credible statistical inference." We do not expect L5 with 24 monthly observations — statistical power is honestly low at this back-test window and we say so in every report. Longer back-test window (5y+) is upgrade path with Polygon paid data.

Division of labor

Function Owner Rationale
Ingestion design + execution ACG (research-lead) jina-reader + LLM extraction + anti-fabrication-pre-flight discipline
Competing-hypothesis framework ACG (research-lead) scientific-method + critical-thinking skills
Ticker universe + back-test protocol ACG (capital-markets-lead) Already shipped; owns the protocol
Back-test engine (vectorized) Pyonair (Stacey + Apex, Pyonair's AI counterpart) They have it; ACG does not
PPO / RL model Pyonair They are shipping the code
Macro / regime overlay ACG (research-lead, V8 ingestion) FRED is free and scriptable
Investor-grade reporting ACG (business-lead pipeline) Blog + landing-page pipeline works
Source-attribution audit ACG (research-lead) Verifier-as-substrate discipline

The division of labor IS the operational moat. Neither party is asked to do what they cannot prove they can do. ACG has not run a back-test before; we say so out loud, we hand the engine to the team that has, and we own the substrate-discipline layer that we have demonstrated repeatedly in vendor-substrate-discipline-scorecard and related public artifacts at ai-civ.com.

Reportable failure modes (we publish even when ugly)

Per sp-comparative-spec.md §6, we publish:


§5 — Data Strategy

We run free-tier first, then escalate only if the back-test materially needs the upgrade. Detailed in projects/awareness-fund/research/ingestion-variations-spec.md; condensed here.

Layer Variations Cost Why
Foundation V1 (Yahoo/Stooq daily OHLCV) + V8 (FRED macro / rates / liquidity) $0 Free, robust, all back-tests need this
Earnings-signal core V2 (SEC EDGAR 10-K/Q corpus) + V5 (LLM-extracted capex / guidance / capital-allocation) $0–$100/mo (LLM compute) The capex-shift thesis IS a capex-signal thesis. Filings ARE the data.
Vertical-specific add V4 (EIA + FERC + ISO grid data) + V7 (TSMC monthly revenue, ASML/AMAT/LRCX/KLAC book-to-bill, SEMI data) $0–$200/mo Highest near-term thesis-conviction verticals (Energy + Chip-fab)
Phase 2 V3 (earnings-call transcripts) + V6 (mining commodity feeds) $500/mo (AlphaSense) or $0 (scrape) Add if Phase 1 underperforms on signal-density
Defer V9 (Reddit/Stocktwits sentiment) + V10 (Federal Register / CHIPS Act / IRA policy events) $0 V9 too noisy for thesis-grade; V10 hard to systematize

Composite cost estimate: $0–$200/mo (free APIs + LLM compute for V5). Composite build effort: 2–4 engineer-weeks (mostly V2 + V5 NER/extraction pipeline).

Cross-variation quality controls (mandatory, from ingestion-variations-spec.md): 1. Look-ahead-bias audit on every series (as_of_tsevent_ts). 2. Survivorship-bias correction (universe includes delisted/acquired tickers). 3. Anti-fabrication pre-flight on every LLM-extracted number (V2/V3/V5 carry highest fabrication risk). 4. Source-of-evidence column on every feature row (source_url + extracted_ts). 5. Date-of-source discipline — every source explicitly dated.

Honest gap: we do not yet have a survivorship-bias-corrected ticker universe. CRSP-like dataset is $500+/mo academic tier and is in the v2 upgrade path, not v1.


§6 — The Substrate-Discipline Difference

This is the section that explains why ACG's involvement should matter to a sophisticated LP.

Most fund-management organizations cannot tell you, in detail, why they chose the data sources they chose, what hypotheses they ruled out and why, what their false-positive rate is on LLM-extracted numbers, how they audit their own work, or what would make them publicly retract a claim. The Awareness Fund's substrate-vendor-of-record (ACG) ships exactly this kind of operational rubric publicly at ai-civ.com — including:

The meta-thesis: an LP investing in The Awareness Fund is also investing in a fund whose substrate-vendor-of-record uses the same operational discipline the LP wishes their existing PE managers used. The substrate-discipline IS the operational moat against the kind of self-deception that destroys most quantitative strategies.

The LP-readable claim: the same vendor-substrate-discipline-scorecard ACG publishes publicly is the rubric we have been asked to apply to ourselves on this fund's back-test work. We will publish that self-assessment when the back-test results ship. If it is ugly, we will publish that too. The discipline of publishing the ugly self-grade is, itself, the alpha.

For the public artifacts, see ai-civ.com/blog/ (substrate-discipline-scorecard post, federation-IP downloads).


§7 — What Happens Next

Step Owner Trigger / dependency
1. Pyonair PPO code arrives Pyonair (Stacey + Apex) Pending — assumed-coming, not delivered
2. ACG kicks off ingestion pipeline (V1+V8+V2+V5) ACG (research-lead + mind-lead) Independent of step 1; can start now
3. Back-test engine scaffolded at projects/awareness-fund/backtest/ ACG (mind-lead) once Pyonair engine arrives, else mind-lead writes minimal vectorized harness Step 1 OR independent fallback
4. Variation harness (V1–V10 from ingestion-variations-spec.md) runnable from CLI ACG (mind-lead) + capital-markets-lead acceptance tests Step 2 + Step 3
5. First back-test results — full §4 metric set, all 4-tier benchmark ladder capital-markets-lead Step 4
6. Substrate-discipline self-assessment published alongside results ACG (business-lead) Step 5
7. Partner walk-through with Corey + Russell + Jordannah (with Pyonair input) All Step 6
8. Iteration on variations + overlays per partner feedback All Ongoing
9. First LP conversations (using THIS document + back-test results) Russell, with Corey + Jordannah (+ Pyonair input) Step 7

Total v1 dev estimate from backtest-protocol-spec.md §8: 8–11 active days, all on free-tier data.

Acceptance criteria for v1 back-test completion (backtest-protocol-spec.md §9): 1. All variations run end-to-end on free-tier data 2. All variations reproduce identical results from a clean re-run (deterministic seeding) 3. All §4 metrics computed and present in output report 4. Source-data hashes + code-version hashes embedded in every run 5. Survivorship-bias caveat explicit in every output 6. "DRAFT TEMPLATE — REQUIRES EXPERT REVIEW BEFORE PRODUCTION USE" disclaimer on every artifact 7. Partner-readable markdown report generated automatically 8. A blind expert-reviewer (e.g., a portfolio manager friend of Russell's) can read the protocol + report and reproduce the headline number to ±10bps


§8 — Honest Gaps + Open Questions

We name these openly because diligence-grade LPs will discover them anyway. We would rather be the source than be caught.

Gap Substance Mitigation / path
ACG has no native back-test engine We have a yfinance-class CLI ingester, generic ingestion plumbing (tools/ago/ingest/), and strong LLM-extraction discipline — but zero equity portfolio back-test capability in the repo today (audited 2026-05-18, backtest-protocol-spec.md §0). Pyonair owns this. The division of labor IS the design. If Pyonair PPO code does not arrive in a reasonable timeframe, ACG (mind-lead) will write a minimal vectorized harness as fallback — but Pyonair-built is preferred.
No transaction-cost model beyond a simple linear-impact stub Critical for honest Sharpe / info ratio numbers at fund-AUM scale. A 5/15/+2/+10 bps cost model is specified (backtest-protocol-spec.md §4.4); calibrate from realized spreads in Polygon paid data if upgraded.
No survivorship-bias-corrected ticker universe All current scraping is "what's listed today." Historical delistings are missing. Manual delisting enumeration for the 24-month window (research-lead's deliverable). CRSP-like dataset ($500+/mo academic tier) is v2 upgrade path.
Works (federation's financial-civ) is currently DOWN Works (Kimi K2.6) is the sister-civ with deepest financial-domain depth. Factor-construction critique is unavailable until restart. Hengshi (Qwen) is healthy and can serve as cross-grading peer for research outputs. Works restart pending.
TG/blog instrumentation lag Subscriber open/click instrumentation for ai-civ.com blog posts is not yet wired. Cannot measure LP-funnel response to publicly-shipped substrate-discipline IP today. Slot-4 instrumentation target carried from 2026-05-17; unmet as of this document's date.
Pyonair PPO code has not yet arrived All RL/PPO research on the ACG side is preparatory; integration will happen when code lands. Asynchronous timeline. Independent ACG work (ingestion + protocol + universe) is ungated and proceeds.
Time-zone alignment with Pyonair team unknown Async coordination assumed. TGIM substrate (cross-civ task platform) is the standing wire if cadence becomes a friction point.
No live brokerage / execution layer Back-test analytics only. Live execution is fund-back-end work outside ACG's scope. Russell + Pyonair territory.
No risk-management / compliance / regulatory framework When fund formally launches, this is a hard external dependency. Counsel + compliance vendor to be retained at fund-formation stage. Out of scope for this thesis-overview.
24-month back-test window is too short to validate H3 (late-cycle mining) Mining capex cycle is 8–12 years; equity markets typically anticipate by 18-30 months. The thesis we believe most strongly in (H3 supports it) is the one the back-test can least validate. Proxy test in v1 (mining-equity-beta-to-AI-capex-announcements). Full validation requires 2028+ data.
Statistical power at 24 monthly observations is honestly low t-stats and bootstrap CIs are reported as substrate for partner discussion, NOT as inference-grade evidence. Treat as descriptive, not inferential. Honest framing in every report. Longer back-test window (5y+) is Polygon paid upgrade path.

§9 — Disclaimers (full)

This document is a DRAFT THESIS OVERVIEW for pre-launch partner and prospective-LP discussion only. It is NOT a solicitation to invest, NOT a private placement memorandum, NOT investment advice, and NOT an offer of securities. No fund vehicle yet exists. No price targets, return projections, or performance forecasts appear in this document by design.

Past performance does not predict future results. Back-tests are hypothetical, derived from historical price data, and do not reflect actual trading, real-world transaction costs, taxes, or fund-management fees. Survivorship bias, selection bias, and look-ahead bias may be material despite the controls described in §5. Forward-looking statements derived from third-party sources carry their authors' biases and should be independently verified before any investment decision.

Investing in equities (including any future Awareness Fund vehicle) involves risk, including risk of total loss. Concentrated thematic strategies carry higher volatility than diversified passive index strategies. The capex-shift thesis described in this document may be wrong (see §3, particularly H6). The information in this document is current as of 2026-05-18 and is subject to revision without notice.

All quantitative claims are confidence-tagged HIGH/MEDIUM/LOW per the rubric in projects/awareness-fund/capital-markets/thesis-substantiation.md. LPs and partners must conduct their own due diligence and consult independent legal, tax, and investment counsel before making any commitment.


§10 — Document Map (for the LP who wants to drill down)

If you want to verify... Read
The full 73-ticker candidate universe with sub-segment tags + data-tier requirements projects/awareness-fund/capital-markets/ticker-universe-spec.md
The back-test design (time window, data sources, rebalance rules, transaction costs, risk overlays, reproducibility requirements) projects/awareness-fund/capital-markets/backtest-protocol-spec.md
The S&P comparative methodology (4-tier benchmark ladder, L1-L5 outperformance ladder, attribution decomposition, statistical-significance discipline) projects/awareness-fund/capital-markets/sp-comparative-spec.md
Per-vertical sourced observations with confidence tags and explicit risks (including SMCI accounting overhang, SpaceX-private exclusion, defense-prime mixed-beta caveats) projects/awareness-fund/capital-markets/thesis-substantiation.md
The 10 candidate ingestion variations with cost/coverage/risk-of-failure per variation projects/awareness-fund/research/ingestion-variations-spec.md
The 6 competing hypotheses (including H6 anti-thesis) with falsification tests projects/awareness-fund/research/competing-hypotheses.md
The honest audit of what ACG can deliver vs what we must source from partners projects/awareness-fund/research/capability-inventory.md

Document status: v0.1 DRAFT — synthesized from 8 parallel-shipped spec sheets (2026-05-18). Awaiting partner review before any LP distribution. Authored by ACG business-lead. Co-owners (humans): Corey Cottrell (ACG) · Russell Korus (AiCIV Inc / Keel) · Jordannah Korus (Korus Consulting Inc). Operating partner: Pyonair (Stacey Engle, with Apex — Pyonair's AI counterpart — collaborating on substrate-side coordination).

For partner questions, comments, or counter-evidence: route through Corey for the ACG-side; route through Russell or Jordannah for the Korus-side; route through Stacey for the Pyonair-side. Cross-grading welcomed and expected.