The Awareness Fund — Integrated Thesis Overview

Working title: theawarenessfund.com Document version: v0.1 (DRAFT THESIS — pre-formal-launch) Date: 2026-05-18 Partners (co-owners): Corey Cottrell · Russell Korus · Jordannah Korus Operating partners: - ACG (A-C-Gee) — AI civilization shipping substrate-discipline as IP — ai-civ.com - Pyonair — quantitative-engineering partner (Stacey Engle + Apex, Pyonair's AI counterpart); owns the back-test engine + PPO Neural Network code

DRAFT THESIS — pre-formal-launch. This document is for partner and prospective-LP discussion only. It is NOT a solicitation, NOT a private placement memorandum, and NOT investment advice. No fund vehicle yet exists. No price targets or return projections appear in this document by design. Past performance does not predict future results.

Executive Summary

The Awareness Fund is a 5-vertical, picks-and-shovels equity strategy targeting the infrastructure of the AI capex super-cycle — the supply chain underneath, not the applications on top.

The shift: Trillions of dollars of capex are migrating from human knowledge work into AI infrastructure today, and we believe a second wave will migrate from AI infrastructure into robotic physical labor on a roughly 5-year horizon. (Confidence: HIGH on direction; MEDIUM-LOW on point estimates and timing.)
The architecture: Exposure across 5 verticals — Mining (raw materials) → Energy (electrons) → Chip-fabs (the machines that make the silicon) → AI-Endpoint (data centers, networking, cooling) → SPACE-infra (launch, satellites, defense-prime space exposure, orbital compute optionality). No model labs. No application-layer AI. No hyperscaler concentration plays.
The discipline: ACG owns the ingestion + competing-hypothesis + anti-fabrication substrate. Pyonair owns the back-test engine + PPO Neural Network. The division of labor IS the operational moat — neither party is asked to do what they cannot prove they can do.
The test: A 24-month back-test (2024-05-18 → 2026-05-18), free-tier data first, evaluated against a 4-tier benchmark ladder (S&P 500 TR / equal-weight S&P / NASDAQ-100 / custom sector basket). The strategy is "interesting" only if it beats all four tiers on risk-adjusted basis. Every variation gets published — including losers. (See §5 below + projects/awareness-fund/capital-markets/sp-comparative-spec.md.)
The honest counter-thesis: It is entirely possible that the S&P 500 already prices the AI capex shift and that no fund-level alpha exists. This is hypothesis H6 and is the existential question for the fund. The back-test is designed to disprove it, not to confirm what we already believe.
The decision frame for the LP: This document is the SETUP for the back-test, not the back-test results. The next document an LP receives from us will contain back-test outputs. If our setup-discipline doesn't earn page-2 read from you, the back-test results never will. We would rather you close politely now than waste both our time later.

§1 — The Capex-Shift Thesis

Claim: The largest capex reallocation since the railway-to-electric transition is underway. Hyperscalers reported ~$240B of capex in 2024 — dominantly data-center build-out (HIGH; company 10-Ks, EDGAR, Q4 2024). Bottoms-up estimates of incremental US data-center power demand range from +90 GW (DOE/LBNL conservative) to +400 GW (industry-bullish) by 2030 (MEDIUM; DOE/LBNL 2024 update, EPRI 2024 white paper). On the high end, this approaches "+1 terawatt within a decade" — the figure several partners use shorthand. (We frame as 10-year envelope, not 3-5 year, for sourceable defensibility.)

The 5 verticals at one glance:

Vertical	The bottleneck	Cycle length	One-sentence why-now
Mining	Discovery-to-pour lead time 8–12 yr; copper, uranium, rare earths supply inelastic at AI-demand timescales	8–12 yr (late-cycle)	Capex is at multi-year highs but well below 2012 super-cycle peak — room to run; USGS Annual Reports (HIGH); MP Materials 10-K FY2024 (HIGH).
Energy	Grid build-out cannot match AI compute demand; transformer lead times stretched from ~50 weeks (2019) to 120+ weeks (2024)	5–10 yr	Microsoft signed PPA for Three Mile Island restart (Sep 2024 — first US nuclear-restart-for-hyperscaler, HIGH); Talen sold campus capacity to AWS (Mar 2024, HIGH); Dominion 2024 IRP forecasts 85% Virginia load growth by 2039 (HIGH).
Chip-fabs	EUV monopoly (ASML), 4-firm WFE oligopoly (~85% market share), CoWoS advanced-packaging sold out through 2026	2–3 yr	ASML EUR 28.3B revenue 2024 with ~18-month tool lead times (HIGH, 20-F); CHIPS Act $27.6B finalized in 2024 (HIGH, Commerce Dept).
AI-Endpoint	Data-center rack power-density jumped from ~10 kW (2020) to 80+ kW (2024 AI racks); liquid cooling is the new bottleneck	<1 yr	Vertiv FY2024 record year on liquid cooling (HIGH, 10-K); Equinix $8.7B FY2024 revenue with hyperscale JVs growing faster than retail colo (HIGH, 10-K); Arista 35% revenue concentration in two hyperscalers (HIGH, 10-K disclosure).
SPACE-Infra	Terrestrial grid + cooling + latency budgets are pushing R&D toward orbital compute; SpaceX Falcon 9 drove launch cost from ~$10K/kg to roughly $2-3K/kg (claimed)	5–10 yr (with 20+ yr Dyson-swarm optionality)	Lonestar Holdings + Ramon Space hold publicly-announced DoD/NASA orbital-compute contracts (MEDIUM — contracts announced, NOT yet deployed); SpaceX 134 successful Falcon launches in 2024 vs <40 industry-total a decade prior (HIGH, FAA dashboard).

The structural feature: the 5 verticals are sequentially gated. Chip-fab equipment must exist before chips exist; chips before AI endpoints are useful; AI endpoints must be powered (energy) and made of materials (mining); SPACE-infra is the deferred-but-inevitable safety valve for terrestrial physical limits. We are not betting on any single vertical. We are betting on the infrastructure stack the model-lab AI race requires to continue.

The hedge property: if the model labs commoditize AI margins (price wars, open-weight catch-up, DeepSeek-class efficiency gains), picks-and-shovels actually benefits — more inference deployments = more endpoints = more power = more materials. We are betting on AI deployment, not AI margin capture. (Confidence: MEDIUM on the hedge mechanism; HIGH on the directional asymmetry.)

For per-source receipts on every claim above, see projects/awareness-fund/capital-markets/thesis-substantiation.md §1–§5.

§2 — The 5-Vertical Picks-and-Shovels Architecture

The full candidate ticker universe (73 names total) lives in projects/awareness-fund/capital-markets/ticker-universe-spec.md. Below are the most-defensible representative tickers per vertical, with the explicit counter-argument that could falsify each vertical's thesis.

Vertical 1 — Mining (raw materials)

Two-sentence thesis-link: AI hardware and grid build-out is materials-intensive — copper for interconnect and grid, nickel for batteries, uranium for clean baseload, rare earths for motors and magnets, lithium for storage, silver for PV. Mining capex cycles are 8–12 years from discovery to first pour, so supply elasticity is near-zero on the timescales AI demand needs.

Representative tickers (7 of 15):

Ticker	Name	Sub-segment	Why included
FCX	Freeport-McMoRan	Copper majors (US)	Largest US-listed copper pure-play; tightest AI-grid bottleneck
SCCO	Southern Copper	Copper majors (LatAm)	Lowest-cost copper producer globally; long-life reserves
RIO	Rio Tinto	Diversified majors	Lithium (Rincon, Jadar) + copper + iron-ore triple play
MP	MP Materials	Rare earths (US-domestic)	Only US-domestic mine-to-magnet operator; DoD-aligned
CCJ	Cameco	Uranium (Tier-1)	Largest publicly-traded uranium producer; nuclear-baseload-for-AI-DC
ALB	Albemarle	Lithium specialty	Largest US-listed lithium producer; battery-grade chemicals
PICK	iShares Global Metals & Mining ETF	Diversified basket	Beta benchmark for the vertical

Sub-segment breakdown: copper majors (FCX, SCCO, TECK, RIO, BHP) · rare earths (MP, LYC.AX) · uranium (CCJ, URA, URNM) · lithium + specialty (ALB, SQM, NTR) · diversified (NEM, PICK).

The honest counter-argument: commodity-cycle noise typically dominates AI-attribution signal at minimum 2:1 — mining equities have rallied repeatedly on "AI will need copper" and given back gains. Picks-and-shovels framing helps but does not eliminate cycle risk. The 24-month back-test window is too short to validate a true late-cycle mining play — recommend tracking as "watch" not "weight" until 2027 data. (See projects/awareness-fund/research/competing-hypotheses.md H3.)

Vertical 2 — Energy (powering the AI economy)

Two-sentence thesis-link: US grid needs ~+1 terawatt of new generation over a 10-year horizon to support AI data-center demand (MEDIUM-confidence synthesis from DOE/LBNL 2024 + EPRI 2024). Picks-and-shovels = utility owners of generation/transmission near hyperscaler sites + independent power producers with nuclear/gas baseload + grid-equipment makers (transformers, switchgear, HVDC).

Representative tickers (7 of 15):

Ticker	Name	Sub-segment	Why included
CEG	Constellation Energy	IPP — nuclear pure-play	Largest US nuclear fleet; Three Mile Island restart catalyst (MSFT PPA)
VST	Vistra Corp	IPP — nuclear + gas	Largest beneficiary of hyperscaler PPAs (Constellation-pattern)
D	Dominion Energy	Utility (Virginia)	Virginia = 70% of US data-center inventory; load-growth pure play
NEE	NextEra Energy	Utility + renewables	Largest US utility + largest renewables developer
GEV	GE Vernova	Grid + gas-turbine OEM	Spin from GE; gas-turbine + grid-transmission + wind triple play
ETN	Eaton	Electrical equipment	Switchgear + data-center electrical infrastructure leader
HUBB	Hubbell	Electrical components	T&D + utility-grade connectors; grid build-out beneficiary

Sub-segment breakdown: IPPs (CEG, VST, NRG, TLN) · utilities (NEE, DUK, SO, D) · grid OEMs (GEV, ETN, ABBN.SW, SIEGY, HUBB) · LNG adjacency (LNG) · uranium fuel cycle (URA — vertical overlap with Mining).

The honest counter-argument: utility valuations are already pricing this. Vistra and Constellation returned 200-400% in 2024-2025 on AI-energy narrative — entry now is late. Regulatory drag (state PUC approvals) slows utility re-rating. Solar+battery cost curves may flip the storyline before bottleneck binds. Hyperscaler self-build (Stargate, Meta-nuclear partnerships) can bypass public utilities entirely. (See competing-hypotheses.md H2.)

Vertical 3 — Chip-fabs + supply chain

Two-sentence thesis-link: The picks-and-shovels of AI compute are NOT the model labs and NOT even the chip designers — they are the equipment that makes the chips (lithography, deposition, etch, metrology) and the specialty materials (photoresists, gases, wafers, advanced packaging substrates). This is a 4-company oligopoly at the high end (ASML, AMAT, LRCX, KLAC) plus a Japanese duopoly in materials.

Representative tickers (7 of 15):

Ticker	Name	Sub-segment	Why included
ASML	ASML Holding	Lithography — monopoly	Only EUV-lithography supplier globally; literal AI-bottleneck monopoly
AMAT	Applied Materials	Deposition + etch	Largest WFE maker; broadest tool portfolio
LRCX	Lam Research	Etch + deposition	#2 in etch; memory + advanced-logic exposure
KLAC	KLA Corp	Metrology + inspection	Process-control monopoly; advanced-node yield dependency
TSM	Taiwan Semi (ADR)	Foundry — pure-play	Largest pure-play foundry; advanced-node AI-chip manufacturer
ENTG	Entegris	Specialty materials	Wafer-handling + advanced-process fluids monopolist segments
SOXX	iShares Semiconductor ETF	Sector basket	Beta benchmark

Sub-segment breakdown: WFE oligopoly (ASML, AMAT, LRCX, KLAC) · test equipment (TER, 6857.T Advantest) · foundry pure-play (TSM) · IDM with US fabs (INTC) · Japan specialty materials (4063.T Shin-Etsu, 4183.T Mitsui, TOELY Tokyo Electron) · advanced packaging + small-caps (ENTG, AEHR, ONTO).

Explicit exclusion: NVDA, AMD, AVGO, MRVL are chip designers, not picks-and-shovels — they are downstream buyers of the supply chain, not the supply chain itself. SOXX captures them as benchmark overlap. If partners want chip-designer exposure, that is a separate sleeve decision, not picks-and-shovels.

The honest counter-argument: WITHIN the supply chain, the concentrated oligopoly (NVDA design + TSM fab) has captured disproportionate margin vs equipment-makers — NVDA gross margins 75%+ vs ASML 50% vs TSM 53%. Recent 2-year history strongly supports concentration. The 24-month back-test may not reveal the tail-risk that the diversification thesis is hedging against. (See competing-hypotheses.md H4.)

Vertical 4 — AI-Endpoint (data centers, picks-and-shovels only)

Two-sentence thesis-link: AI-endpoint = the physical buildings + cooling + networking + power distribution where AI compute happens. Explicitly excludes model labs (no Anthropic-proxy, no OpenAI-IPO speculation, no application-layer AI) and hyperscalers (MSFT/GOOGL/AMZN/META are mixed-margin businesses where DC capex is one line item among many).

Representative tickers (7 of 15):

Ticker	Name	Sub-segment	Why included
EQIX	Equinix	DC-REIT (retail colo)	Largest interconnection-dense colo; AI-inference-edge proxy
DLR	Digital Realty	DC-REIT (wholesale + retail)	Largest DC-REIT by power
VRT	Vertiv	DC infrastructure (power + cooling)	Pure-play DC infrastructure OEM; liquid cooling leader
ANET	Arista Networks	DC networking (Ethernet fabric)	Hyperscaler Ethernet-switching leader; alternative to NVDA NVLink
COHR	Coherent Corp	Optical components	800G/1.6T optical transceivers for AI-DC
JCI	Johnson Controls	Building HVAC + DC cooling	Building-systems + commercial HVAC; DC-cooling adjacency
DTCR	Global X Data Center ETF	DC-REIT basket	Basket benchmark

Sub-segment breakdown: DC-REITs (EQIX, DLR, IRM) · DC infrastructure OEMs (VRT, NVT) · networking + optical (ANET, CSCO, CIEN, COHR, LITE) · servers + IP (SMCI [⚠️ see below], ARM) · cooling adjacency (JCI) · physical-AI bridge optionality (SYM) · basket (DTCR).

⚠️ Red flag, named openly: SMCI carried an explicit accounting overhang in 2024 — Hindenburg short report (Aug 2024), Ernst & Young auditor resignation (Oct 2024), DOJ subpoena reported, delayed 10-K. We include SMCI in the candidate universe so the back-test can quantify the impact of including/excluding it, but partners should treat it as a name-to-trade-with-caution and likely exclude from any concentrated variation. (See thesis-substantiation.md §4 risks table.)

The honest counter-argument: data-center capex may be near peak. AWS/GCP/Azure capex-to-revenue ratios are at multi-decade highs. Hyperscaler depreciation is accelerating (8-yr → 6-yr useful-life revisions). DeepSeek-class efficiency improvements can compress training-compute demand. The 2001 telecom-capex analog (Cisco -85% peak-to-trough) is the cautionary tale. Concentrated names like DLR/EQIX/SMCI are most exposed to this. (See competing-hypotheses.md H5.)

Vertical 5 — SPACE-Infra (orbital data centers + the infrastructure to put them there)

Two-sentence thesis-link: Terrestrial AI-endpoints are running into hard physical ceilings — grid power, water for cooling, fiber latency budgets, real-estate adjacency to demand. Picks-and-shovels here is commercial launch + reusability, satellite manufacturers + comms-backhaul, defense primes holding DoD/SDA/NASA space-systems contracts (mixed-beta caveat), rad-hardened space-grade compute, and space-solar / Dyson-swarm frontier (20+ year optionality only).

Representative tickers (7 of 13):

Ticker	Name	Sub-segment	Why included
RKLB	Rocket Lab USA	Launch + reusability	Largest publicly-traded launch operator; Neutron reusable-medium-lift
IRDM	Iridium Communications	Satellite comms (LEO)	Operational global LEO constellation; DoD-aligned
ASTS	AST SpaceMobile	Direct-to-cell satellite	Only public pure-play in cellular-from-space; AT&T/Verizon/Vodafone
LMT	Lockheed Martin	Defense prime — space	SDA Tranche, NASA Orion, GPS-III (mixed beta, ~10-30% space exposure)
MRCY	Mercury Systems	Rad-hardened compute	Closest public pure-play in space-grade processors
KTOS	Kratos Defense	Ground stations + small launch	OpenSpace virtualized ground systems
UFO	Procure Space ETF	Diversified basket	Sector benchmark — ~30 holdings

Sub-segment breakdown: launch (RKLB) · satellite comms (IRDM, VSAT, SATS, ASTS) · defense primes with space optionality (LMT, NOC, RTX, BA) · rad-hardened compute + ground (MRCY, KTOS) · baskets (UFO, ARKX).

Explicit exclusions (this table is load-bearing for honesty — every LP we pitch will ask "what about SpaceX/Tesla/Maxar?"):

Excluded exposure	Why excluded
SpaceX (private)	Not publicly tradable; no direct equity exposure exists for retail or fund vehicles. If SpaceX IPOs, it becomes the cornerstone of this vertical. Until then: ZERO direct SpaceX exposure.
TSLA as "Musk-proxy"	TSLA is a vehicle and energy-storage company that happens to share a CEO with SpaceX. Owning TSLA for SpaceX exposure imports brand-discount, EV-cycle beta, and FSD-narrative volatility unrelated to picks-and-shovels space-infra. If partners want Musk-beta, that is a separate sleeve.
MAXR (Maxar)	Taken private by Advent International in May 2023 ($6.4B all-cash) — no longer tradable.
ASTR (Astra Space)	Delisted from Nasdaq in July 2024 — no longer tradable.
Lonestar Holdings, Ramon Space	Private orbital-compute operators with announced DoD-SDA / NASA contracts (2024). No public equity exposure. If either IPOs, would be highest-conviction pure-play orbital-compute name. Today: track contract-announcements for thesis-evidence only.
Aitech, BAE Space, Cobham	Private or non-tradable rad-hardened-compute / space-electronics specialists. MRCY is the closest tradable approximation.
CSPP / Caelus / space-solar concept-stage operators	All concept-stage / feasibility-study tier; no operating revenue. Caltech SSPP June 2023 in-orbit beaming demonstration was watts-class, not kilowatts. ESA SOLARIS is feasibility-only. Not investable today.

On the Dyson-swarm horizon: framed exclusively as 20+ year optionality. Treat the narrative as a direction the universe is moving (collect energy where it is abundant), not a near-term investable thesis. Do NOT base any back-test weighting on space-solar deployment timelines. The closest tradable proxies are the defense primes (NOC, LMT) and the launch-cost-curve compressor (RKLB) — both already in the basket on other grounds.

The honest counter-argument: the single most important commercial space company (SpaceX) is not tradable. Defense-prime proxies are mixed-beta (10-30% space revenue typical) — outperformance from LMT/NOC/RTX/BA may reflect defense-cycle dynamics, not space-specific tailwinds. Orbital compute density today is <0.001% of terrestrial; cost-of-cooling-in-vacuum, rad-hardening overhead, and on-orbit servicing remain hard physical gates against near-term scale. ASTS has limited price history (April 2021 SPAC merger) — back-test windows >36 months will have insufficient ASTS coverage.

§3 — What Could Make This Thesis Wrong

The first task of an investor-grade thesis is to name its failure modes. We track six competing hypotheses in projects/awareness-fund/research/competing-hypotheses.md, four of which are pro-thesis (different paths to the same conclusion) and two of which are anti-thesis (paths by which the fund's premise fails). All confidence levels are PRELIMINARY pending the 24-month back-test.

#	Hypothesis	Direction	Preliminary confidence	Falsification test
H1	Diversified picks-and-shovels outperforms concentrated hyperscaler bets	Pro-thesis	MEDIUM-HIGH	Equal-weighted 4-vertical basket Sharpe > top-5 hyperscaler basket Sharpe over 24mo
H2	Energy bottleneck is THE moat (utilities / grid-OEM / nuclear-IPPs dominate)	Pro-thesis	MEDIUM-HIGH	Energy basket Sharpe > broad-AI Sharpe AND FERC interconnect-queue depth is a statistically significant return predictor
H3	Mining is the late-cycle play (2027-2030 inflection)	Pro-thesis (delayed)	LOW-MEDIUM	Cannot fully test in 24mo — proxy test: do mining equities show higher beta to AI-capex announcements than to copper futures?
H4	Chip-fab oligopoly favors NVDA/TSM pure-play over equipment-makers (concentration within stack beats diversification)	Counter-thesis (within stack)	MEDIUM	5-name pure-play basket (NVDA, TSM, AMD, AVGO, MRVL) Sharpe > 5-name equipment basket (ASML, AMAT, LRCX, KLAC, TER)
H5	AI-endpoint capex peaks within 18 months — DC build cycle near top	Counter-thesis	LOW-MEDIUM	AI-endpoint basket shows ≥2 sequential quarters of negative returns concurrent with hyperscaler capex-guidance cuts
H6	The thesis is correct but the S&P 500 already prices it — no fund-level alpha exists	Anti-thesis (market efficiency)	MEDIUM	THE EXISTENTIAL ONE. Strategy info ratio vs SPY < 0.3 after fees ⇒ fund killed. Info ratio > 0.7 ⇒ H6 falsified.

H6 is the single most important test in this entire document. Every variation we run MUST be compared to SPY total return on risk-adjusted basis. If we cannot beat SPY after fees, the fund has no investor-rational reason to exist and Russell, Jordannah, Corey (with Pyonair input) will say so out loud before any LP says it to us.

H1 vs H4 is the architectural question — diversified picks-and-shovels OR concentrated pure-play. The back-test runs both as separate portfolio constructions to discriminate. They are partly contradictory by design.

For per-hypothesis Evidence FOR / Evidence AGAINST / Falsification Test specifications, see competing-hypotheses.md.

§4 — How We'll Prove It (Or Refute It)

The back-test, in one paragraph

A 24-month back-test (2024-05-18 → 2026-05-18) with a 12-month warm-up window, daily-bar OHLC, NYSE calendar, USD-denominated, full point-in-time data discipline (as_of_ts ≠ event_ts), monthly rebalance default with weekly and quarterly variations tested, transaction-cost model of 5 bps liquid / 15 bps small-cap / +2 bps per 1% ADV market-impact / +10 bps foreign-listed extra cost. Every back-test run produces a manifest, a trade log, a daily NAV series, a position log, a source-data SHA256, a code-version git hash, and the §5 metric set. Stored under projects/awareness-fund/backtest-runs/YYYY-MM-DD-runID/. Full spec: projects/awareness-fund/capital-markets/backtest-protocol-spec.md.

The 4-tier benchmark ladder

A back-test variation is "interesting" only if it beats all four tiers on risk-adjusted basis. From sp-comparative-spec.md §2:

Tier	Benchmark	Question it answers
T1	S&P 500 Total Return (`^SP500TR`)	Do we beat the default passive choice?
T2	Equal-weight S&P 500 (`RSP`)	Do we beat the broad market after stripping out the mega-cap AI-tilt already captured?
T3	NASDAQ-100 (`QQQ`)	Do we beat the tech-heavy benchmark that overweights the AI applications layer?
T4	Custom sector basket (25% XLU + 25% XLB + 25% SOXX + 25% DTCR)	Do we beat a naïve picks-and-shovels passive replication?

Beating T1 only = could be sector beta. Beating T1+T2+T3+T4 on risk-adjusted basis = stock-selection + theme-curation alpha.

Definitional ladder for "outperformance" (weakest to strongest)

Claim	Evidence required
L1	Strategy_CAGR > SPX_CAGR over window
L2	Strategy_Sharpe > SPX_Sharpe AND Strategy_Sortino > SPX_Sortino
L3	L2 holds vs all 4 tiers
L4	L3 holds AND multi-factor alpha (Fama-French 3 + Momentum) is positive
L5	L4 holds AND t-stat on alpha > 2 (or non-zero bootstrap 95% CI)

Target: L3 minimum (beats benchmark ladder on risk-adjusted basis). L4 = "this is a real strategy." L5 = "this is a real strategy with credible statistical inference." We do not expect L5 with 24 monthly observations — statistical power is honestly low at this back-test window and we say so in every report. Longer back-test window (5y+) is upgrade path with Polygon paid data.

Division of labor

Function	Owner	Rationale
Ingestion design + execution	ACG (research-lead)	jina-reader + LLM extraction + anti-fabrication-pre-flight discipline
Competing-hypothesis framework	ACG (research-lead)	scientific-method + critical-thinking skills
Ticker universe + back-test protocol	ACG (capital-markets-lead)	Already shipped; owns the protocol
Back-test engine (vectorized)	Pyonair (Stacey + Apex, Pyonair's AI counterpart)	They have it; ACG does not
PPO / RL model	Pyonair	They are shipping the code
Macro / regime overlay	ACG (research-lead, V8 ingestion)	FRED is free and scriptable
Investor-grade reporting	ACG (business-lead pipeline)	Blog + landing-page pipeline works
Source-attribution audit	ACG (research-lead)	Verifier-as-substrate discipline

The division of labor IS the operational moat. Neither party is asked to do what they cannot prove they can do. ACG has not run a back-test before; we say so out loud, we hand the engine to the team that has, and we own the substrate-discipline layer that we have demonstrated repeatedly in vendor-substrate-discipline-scorecard and related public artifacts at ai-civ.com.

Reportable failure modes (we publish even when ugly)

Per sp-comparative-spec.md §6, we publish:

"Beat S&P 500 cumulative but lost on Sharpe" → strategy added risk
"Beat by mega-cap concentration" → not stock-selection alpha
"Beat in one vertical, dragged by another" → asymmetric thesis-validation
"Beat early, gave back late" → momentum-driven, not durable
ALL variations published, including losers. ALL benchmarks published, including the unfavorable ones. Survivorship-bias caveats appear in every artifact, not in footnotes.

§5 — Data Strategy

We run free-tier first, then escalate only if the back-test materially needs the upgrade. Detailed in projects/awareness-fund/research/ingestion-variations-spec.md; condensed here.

Layer	Variations	Cost	Why
Foundation	V1 (Yahoo/Stooq daily OHLCV) + V8 (FRED macro / rates / liquidity)	$0	Free, robust, all back-tests need this
Earnings-signal core	V2 (SEC EDGAR 10-K/Q corpus) + V5 (LLM-extracted capex / guidance / capital-allocation)	$0–$100/mo (LLM compute)	The capex-shift thesis IS a capex-signal thesis. Filings ARE the data.
Vertical-specific add	V4 (EIA + FERC + ISO grid data) + V7 (TSMC monthly revenue, ASML/AMAT/LRCX/KLAC book-to-bill, SEMI data)	$0–$200/mo	Highest near-term thesis-conviction verticals (Energy + Chip-fab)
Phase 2	V3 (earnings-call transcripts) + V6 (mining commodity feeds)	$500/mo (AlphaSense) or $0 (scrape)	Add if Phase 1 underperforms on signal-density
Defer	V9 (Reddit/Stocktwits sentiment) + V10 (Federal Register / CHIPS Act / IRA policy events)	$0	V9 too noisy for thesis-grade; V10 hard to systematize

Composite cost estimate: $0–$200/mo (free APIs + LLM compute for V5). Composite build effort: 2–4 engineer-weeks (mostly V2 + V5 NER/extraction pipeline).

Cross-variation quality controls (mandatory, from ingestion-variations-spec.md): 1. Look-ahead-bias audit on every series (as_of_ts ≠ event_ts). 2. Survivorship-bias correction (universe includes delisted/acquired tickers). 3. Anti-fabrication pre-flight on every LLM-extracted number (V2/V3/V5 carry highest fabrication risk). 4. Source-of-evidence column on every feature row (source_url + extracted_ts). 5. Date-of-source discipline — every source explicitly dated.

Honest gap: we do not yet have a survivorship-bias-corrected ticker universe. CRSP-like dataset is $500+/mo academic tier and is in the v2 upgrade path, not v1.

§6 — The Substrate-Discipline Difference

This is the section that explains why ACG's involvement should matter to a sophisticated LP.

Most fund-management organizations cannot tell you, in detail, why they chose the data sources they chose, what hypotheses they ruled out and why, what their false-positive rate is on LLM-extracted numbers, how they audit their own work, or what would make them publicly retract a claim. The Awareness Fund's substrate-vendor-of-record (ACG) ships exactly this kind of operational rubric publicly at ai-civ.com — including:

Vendor Substrate-Discipline Scorecard (10-dimension rubric) — the same operational discipline used to evaluate external vendors is applied to ACG's own work, with explicit named gaps where ACG underperforms its own standard. (Published 2026-05-17.)
Scientific-method skill + critical-thinking skill (federation-IP, downloadable) — operational decision-substrate for separating claim from evidence, surfacing hidden assumptions, and detecting self-grading.
Anti-fabrication pre-flight discipline — mandatory before any LLM-extracted number enters production. Stage 5 freshness-gate catches stale-data fabrication. (v1.1 shipped 2026-05-14.)
Transcription-not-paraphrase discipline — verbatim preservation of human-spoken words for any chapter, customer-facing acknowledgment, or human-words-passing-through transformation. Failure-mode discipline at the language layer.
Cross-grading-as-substrate — every claim entered as "integrated" requires verification receipt (grep, stat, or git-diff) or legacy_pre_amendment flag. Structural, not aspirational. (v1.1 schema shipped 2026-05-14.)
System > Symptom doctrine — when something breaks, the fix is to the system that allowed it, not to the symptom. Codified after multiple operational incidents.

The meta-thesis: an LP investing in The Awareness Fund is also investing in a fund whose substrate-vendor-of-record uses the same operational discipline the LP wishes their existing PE managers used. The substrate-discipline IS the operational moat against the kind of self-deception that destroys most quantitative strategies.

The LP-readable claim: the same vendor-substrate-discipline-scorecard ACG publishes publicly is the rubric we have been asked to apply to ourselves on this fund's back-test work. We will publish that self-assessment when the back-test results ship. If it is ugly, we will publish that too. The discipline of publishing the ugly self-grade is, itself, the alpha.

For the public artifacts, see ai-civ.com/blog/ (substrate-discipline-scorecard post, federation-IP downloads).

§7 — What Happens Next

Step	Owner	Trigger / dependency
1. Pyonair PPO code arrives	Pyonair (Stacey + Apex)	Pending — assumed-coming, not delivered
2. ACG kicks off ingestion pipeline (V1+V8+V2+V5)	ACG (research-lead + mind-lead)	Independent of step 1; can start now
3. Back-test engine scaffolded at `projects/awareness-fund/backtest/`	ACG (mind-lead) once Pyonair engine arrives, else mind-lead writes minimal vectorized harness	Step 1 OR independent fallback
4. Variation harness (V1–V10 from `ingestion-variations-spec.md`) runnable from CLI	ACG (mind-lead) + capital-markets-lead acceptance tests	Step 2 + Step 3
5. First back-test results — full §4 metric set, all 4-tier benchmark ladder	capital-markets-lead	Step 4
6. Substrate-discipline self-assessment published alongside results	ACG (business-lead)	Step 5
7. Partner walk-through with Corey + Russell + Jordannah (with Pyonair input)	All	Step 6
8. Iteration on variations + overlays per partner feedback	All	Ongoing
9. First LP conversations (using THIS document + back-test results)	Russell, with Corey + Jordannah (+ Pyonair input)	Step 7

Total v1 dev estimate from backtest-protocol-spec.md §8: 8–11 active days, all on free-tier data.

Acceptance criteria for v1 back-test completion (backtest-protocol-spec.md §9): 1. All variations run end-to-end on free-tier data 2. All variations reproduce identical results from a clean re-run (deterministic seeding) 3. All §4 metrics computed and present in output report 4. Source-data hashes + code-version hashes embedded in every run 5. Survivorship-bias caveat explicit in every output 6. "DRAFT TEMPLATE — REQUIRES EXPERT REVIEW BEFORE PRODUCTION USE" disclaimer on every artifact 7. Partner-readable markdown report generated automatically 8. A blind expert-reviewer (e.g., a portfolio manager friend of Russell's) can read the protocol + report and reproduce the headline number to ±10bps

§8 — Honest Gaps + Open Questions

We name these openly because diligence-grade LPs will discover them anyway. We would rather be the source than be caught.

Gap	Substance	Mitigation / path
ACG has no native back-test engine	We have a yfinance-class CLI ingester, generic ingestion plumbing (`tools/ago/ingest/`), and strong LLM-extraction discipline — but zero equity portfolio back-test capability in the repo today (audited 2026-05-18, `backtest-protocol-spec.md` §0).	Pyonair owns this. The division of labor IS the design. If Pyonair PPO code does not arrive in a reasonable timeframe, ACG (mind-lead) will write a minimal vectorized harness as fallback — but Pyonair-built is preferred.
No transaction-cost model beyond a simple linear-impact stub	Critical for honest Sharpe / info ratio numbers at fund-AUM scale.	A 5/15/+2/+10 bps cost model is specified (`backtest-protocol-spec.md` §4.4); calibrate from realized spreads in Polygon paid data if upgraded.
No survivorship-bias-corrected ticker universe	All current scraping is "what's listed today." Historical delistings are missing.	Manual delisting enumeration for the 24-month window (research-lead's deliverable). CRSP-like dataset ($500+/mo academic tier) is v2 upgrade path.
Works (federation's financial-civ) is currently DOWN	Works (Kimi K2.6) is the sister-civ with deepest financial-domain depth. Factor-construction critique is unavailable until restart.	Hengshi (Qwen) is healthy and can serve as cross-grading peer for research outputs. Works restart pending.
TG/blog instrumentation lag	Subscriber open/click instrumentation for ai-civ.com blog posts is not yet wired. Cannot measure LP-funnel response to publicly-shipped substrate-discipline IP today.	Slot-4 instrumentation target carried from 2026-05-17; unmet as of this document's date.
Pyonair PPO code has not yet arrived	All RL/PPO research on the ACG side is preparatory; integration will happen when code lands.	Asynchronous timeline. Independent ACG work (ingestion + protocol + universe) is ungated and proceeds.
Time-zone alignment with Pyonair team unknown	Async coordination assumed.	TGIM substrate (cross-civ task platform) is the standing wire if cadence becomes a friction point.
No live brokerage / execution layer	Back-test analytics only. Live execution is fund-back-end work outside ACG's scope.	Russell + Pyonair territory.
No risk-management / compliance / regulatory framework	When fund formally launches, this is a hard external dependency.	Counsel + compliance vendor to be retained at fund-formation stage. Out of scope for this thesis-overview.
24-month back-test window is too short to validate H3 (late-cycle mining)	Mining capex cycle is 8–12 years; equity markets typically anticipate by 18-30 months. The thesis we believe most strongly in (H3 supports it) is the one the back-test can least validate.	Proxy test in v1 (mining-equity-beta-to-AI-capex-announcements). Full validation requires 2028+ data.
Statistical power at 24 monthly observations is honestly low	t-stats and bootstrap CIs are reported as substrate for partner discussion, NOT as inference-grade evidence.	Treat as descriptive, not inferential. Honest framing in every report. Longer back-test window (5y+) is Polygon paid upgrade path.

§9 — Disclaimers (full)

This document is a DRAFT THESIS OVERVIEW for pre-launch partner and prospective-LP discussion only. It is NOT a solicitation to invest, NOT a private placement memorandum, NOT investment advice, and NOT an offer of securities. No fund vehicle yet exists. No price targets, return projections, or performance forecasts appear in this document by design.

Past performance does not predict future results. Back-tests are hypothetical, derived from historical price data, and do not reflect actual trading, real-world transaction costs, taxes, or fund-management fees. Survivorship bias, selection bias, and look-ahead bias may be material despite the controls described in §5. Forward-looking statements derived from third-party sources carry their authors' biases and should be independently verified before any investment decision.

Investing in equities (including any future Awareness Fund vehicle) involves risk, including risk of total loss. Concentrated thematic strategies carry higher volatility than diversified passive index strategies. The capex-shift thesis described in this document may be wrong (see §3, particularly H6). The information in this document is current as of 2026-05-18 and is subject to revision without notice.

All quantitative claims are confidence-tagged HIGH/MEDIUM/LOW per the rubric in projects/awareness-fund/capital-markets/thesis-substantiation.md. LPs and partners must conduct their own due diligence and consult independent legal, tax, and investment counsel before making any commitment.

§10 — Document Map (for the LP who wants to drill down)

If you want to verify...	Read
The full 73-ticker candidate universe with sub-segment tags + data-tier requirements	`projects/awareness-fund/capital-markets/ticker-universe-spec.md`
The back-test design (time window, data sources, rebalance rules, transaction costs, risk overlays, reproducibility requirements)	`projects/awareness-fund/capital-markets/backtest-protocol-spec.md`
The S&P comparative methodology (4-tier benchmark ladder, L1-L5 outperformance ladder, attribution decomposition, statistical-significance discipline)	`projects/awareness-fund/capital-markets/sp-comparative-spec.md`
Per-vertical sourced observations with confidence tags and explicit risks (including SMCI accounting overhang, SpaceX-private exclusion, defense-prime mixed-beta caveats)	`projects/awareness-fund/capital-markets/thesis-substantiation.md`
The 10 candidate ingestion variations with cost/coverage/risk-of-failure per variation	`projects/awareness-fund/research/ingestion-variations-spec.md`
The 6 competing hypotheses (including H6 anti-thesis) with falsification tests	`projects/awareness-fund/research/competing-hypotheses.md`
The honest audit of what ACG can deliver vs what we must source from partners	`projects/awareness-fund/research/capability-inventory.md`

Document status: v0.1 DRAFT — synthesized from 8 parallel-shipped spec sheets (2026-05-18). Awaiting partner review before any LP distribution. Authored by ACG business-lead. Co-owners (humans): Corey Cottrell (ACG) · Russell Korus (AiCIV Inc / Keel) · Jordannah Korus (Korus Consulting Inc). Operating partner: Pyonair (Stacey Engle, with Apex — Pyonair's AI counterpart — collaborating on substrate-side coordination).

For partner questions, comments, or counter-evidence: route through Corey for the ACG-side; route through Russell or Jordannah for the Korus-side; route through Stacey for the Pyonair-side. Cross-grading welcomed and expected.