# APAC Tertiary-Institution Quality Trajectory Index
## Methodology Specification v0.1

*Status: design locked on four pillars; all four validated on live pilots (2026-06) — C/D/B on OpenAlex, A on World Bank/UNESCO UIS.*

---

## 1. Purpose & framing hypothesis

**Goal.** Assess the *trajectory and development of quality* — not the static level — of leading tertiary-education clusters across the Asia-Pacific, and express it in a form that speaks to economic competitiveness.

**Framing hypothesis.** *Local institutional quality is a necessary condition for sustained economic competitiveness.* This index operationalizes the **independent variable** (institutional quality and its trajectory). The link to competitiveness outcomes is left as **interpretation**, not a formal statistical test — but every metric is chosen for its plausible economic-transmission channel.

**Design decisions (locked).**
- Output is a **quality index**, not a hypothesis test.
- Unit of analysis is the **cluster / city**, not the country or the single institution.
- We measure **trajectory**, presented against level in a 2-D map — never a single league table.
- Comparisons are **tier-relative** (frontier / catch-up / emerging) because the region spans the full development gradient.

---

## 2. Unit of analysis: clusters

The university→economy linkage is geographically concentrated, so the cluster is the natural unit. A cluster = the set of research-active institutions in a metropolitan/innovation catchment, with named anchor institutions.

| Region | Clusters (anchors) |
|---|---|
| SE Asia | Singapore (NUS, NTU); Klang Valley (Malaya, UKM); Greater Jakarta (UI, ITB); Bangkok (Chulalongkorn, Mahidol); Metro Manila (UP, Ateneo); Hanoi/HCMC (VNU) |
| ROK | Seoul Capital Area (SNU, Yonsei, Korea U); Daejeon (KAIST); Pohang (POSTECH) |
| Japan | Greater Tokyo (Todai, Tokyo Tech, Waseda); Keihanshin (Kyoto, Osaka); Nagoya; Tsukuba |
| India | Bengaluru (IISc); Delhi NCR (IIT-D); Mumbai (IIT-B); Chennai (IIT-M); Hyderabad; Kharagpur |
| Bangladesh | Dhaka (DU, BUET) |
| Australia | Melbourne; Sydney (USyd, UNSW); Canberra (ANU); Brisbane (UQ); Perth (UWA) |
| New Zealand | Auckland; Wellington; Christchurch |
| Kazakhstan / C. Asia / Mongolia | Astana–Almaty (Nazarbayev U, KazNU); Tashkent; Ulaanbaatar |
| Hong Kong | HK cluster (HKU, HKUST, CUHK, PolyU) — flag Greater Bay Area spillover |

**Development tiers** (for tier-relative normalization and weighting):
- **Frontier** — Tokyo, Singapore, Seoul, Hong Kong, Melbourne/Sydney.
- **Catch-up** — Bengaluru, Mumbai, Chennai, Klang Valley, Bangkok, Seoul-secondary.
- **Emerging** — Dhaka, Manila, Jakarta, Astana, Tashkent, Ulaanbaatar.

Tier is an input to normalization, *not* a score. A cluster can be re-tiered as it develops.

---

## 3. The four pillars

Each pillar maps a transmission channel from institutions to the economy. Each carries a **headline indicator** and a **value-capture refinement** — the locally-accruing version that the competitiveness hypothesis actually requires (see §4).

### Pillar A — Human Capital Volume *(workforce-supply channel)*
| Indicator | Source |
|---|---|
| STEM graduate output (bachelor + postgrad), absolute & per-capita of cluster | UNESCO UIS, national HE statistics |
| Doctoral / research-degree production (higher-skill weight) | UNESCO UIS, institutional reports |
| Growth in research-intensive program enrollment | National HE agencies |
| *Value-capture:* STEM/doctoral share + researcher density (quality-weighted, not raw volume) | World Bank `UIS.FOSGP.5T8.F500600700`, `SP.POP.SCIE.RD.P6` |

*Not bibliometric. Headline = raw enrolment volume; value-capture = STEM/doctoral/researcher-density. Pilot (2026-06) confirmed plumbing but surfaced a **national-only granularity ceiling** and a **volume≠quality divergence** — see §10 Trap 5 and §12. Cluster-level Pillar A requires national statistical agencies (e.g. India AISHE, Korea KEDI), not the World Bank API.*

> **Resolution decision (accepted 2026-06).** Pillar A enters the index as a **national-context layer at country resolution**, beneath the three cluster-resolved pillars (B/C/D) — we do *not* gate the whole index on building per-country statistical-agency pipelines. **Consequence to document on every output:** Pillar A operates at a coarser spatial resolution than B/C/D, so within-country clusters share the same Pillar-A value (e.g. Seoul and Daejeon both inherit Korea's national figures). This is a deliberate, documented inconsistency, not an oversight. Where a national agency offers clean institution-level enrolment (India AISHE is the strongest candidate), Pillar A *may* be disaggregated for that country as an enhancement — but it is never a prerequisite for scoring.

### Pillar B — Innovation Transfer *(university→industry channel)*
| Indicator | Source |
|---|---|
| Industry co-authorship intensity = share of works with a corporate co-affiliation | OpenAlex (`institutions.type:company`) |
| **Domestic-firm linkage** = share of corporate co-authors located in-country | OpenAlex (corporate co-author `country_code`) |
| Patents with university assignees/inventors | Lens.org (token required) |
| Spinouts / venture formation around the cluster | Crunchbase / PitchBook |

### Pillar C — Research Frontier *(technology-frontier channel)*
| Indicator | Source |
|---|---|
| Top-10% cited share & **top-10% volume** (field+year normalized) | OpenAlex (`cited_by_percentile_year.min:90`) |
| 2-yr mean citedness (impact level) | OpenAlex `summary_stats` |
| International collaboration share | OpenAlex |
| **Strategic-field weighting** — output share + excellence in a strategic-subfield basket | OpenAlex `primary_topic.subfield.id:` |

**Strategic basket (piloted 2026-06, OpenAlex subfield IDs):** AI `1702`, Biotechnology `1305`, Renewable Energy `2105`, Electrical & Electronic Engineering `2208` (semiconductor proxy — no clean semiconductor subfield exists). Filter with OR: `primary_topic.subfield.id:1702|1305|2105|2208`. **Two caveats:** (a) the basket *drives the ranking* — EEE is broad and favours engineering-heavy clusters; biomedical is under-captured, understating Singapore. (b) Make the basket **per-economy** where the competitiveness target differs (semiconductors for Korea/Taiwan, biotech for Singapore) — the field-weighting analogue of the §11 interpretation layer. Use *share* + *excellence-within-strategic* on ≤2018 cohorts (Trap 2); use strategic-*volume* growth for recent years.

### Pillar D — Talent Retention *(value-capture channel; acts as a multiplier)*
| Indicator | Source |
|---|---|
| Origin-restricted retention = share of domestic-origin researchers still in-country | OpenAlex author affiliation history |
| **Destination-corridor map** = where leavers go (robust, always-valid) | OpenAlex `last_known_institutions.country_code` |
| Skilled-graduate net migration | OECD migration, national surveys |
| Returnee / retained-faculty share | Institutional data, LinkedIn talent datasets |

---

## 4. The central principle: headline vs. value-capture

Validated across three pilots: **a headline metric can be high while local economic capture is low** — and the gap is systematically largest for global-hub economies. Each pillar therefore carries a value-capture refinement:

| Pillar | Headline (global) | Value-capture (local) |
|---|---|---|
| B | industry co-authorship % | **domestic-firm** linkage % |
| C | raw output / impact | **strategic-field-weighted** impact |
| D | raw retention rate | **origin-restricted** retention + corridor map |

The hypothesis ("local quality → local competitiveness") is about the **right-hand column**. The index reports both, and the *ratio* (value-capture / headline) is itself a diagnostic of how globally-coupled vs. locally-anchored a cluster is.

---

## 5. Trajectory measurement

For indicator *i*, cluster *c*, over window [t₀, t₁]:

**Volume / count indicators** — annualized growth:
```
CAGR(i,c) = (V_{i,c,t1} / V_{i,c,t0})^(1/(t1−t0)) − 1
```
**Base-effect correction.** Small bases inflate CAGR (10→20 papers = 100%). Blend relative and absolute change into a single score via a log-damped form:
```
g*(i,c) = sign(Δ) · |Δ_rel| · w_rel + z(Δ_abs) · (1 − w_rel),  w_rel ≈ 0.6
```
or report CAGR alongside absolute Δ and flag any cluster whose rank is driven by a base < threshold (e.g. < 500 works/yr).

**Share / rate indicators** (e.g. top-10% share, retention) — use point change `Δpts = s_{t1} − s_{t0}`, **but only on citation-mature cohorts** (see Trap 2).

**Distance-to-frontier closing rate** — the cleanest "growth in quality" expression and the natural convergence metric. Let F = frontier reference (e.g. Tokyo/Singapore on that indicator):
```
DTF_close(i,c) = [ (F − V_{c,t0}) − (F − V_{c,t1}) ] / (F − V_{c,t0})
```
Positive = closing the gap to the frontier.

**Windows.** Output/volume on rolling recent years (e.g. 2014→2024). Share/impact on **≤2018 cohorts** to avoid citation-maturity contamination. Track sub-windows to detect inflections.

---

## 6. Normalization

**Locked (2026-06): tier-relative z-score.**
1. **Within-indicator, tier-relative z-score:** standardize each indicator *within development tier* (mean 0, sd 1), so the score expresses "position relative to peers at the same development stage" — the right frame for a heterogeneous roster. Requires populated tiers (~8–10 clusters each); **degenerate at pilot scale** (tiers of 1–2), where global z-score across the whole set is used as a proxy.
2. **Base-effect log-damping (apply before z-score) to all growth indicators:** `g* = sign(g)·ln(1+|g|)`. Prevents small-base clusters from owning the trajectory axis (pilot: un-damped Dhaka maxed every growth indicator). Confirmed necessary in the provisional map.
3. **Per-capita / per-output scaling** where a level indicator would otherwise just track cluster size.

---

## 7. Weighting & composite

**Locked (2026-06): retention (D) is a co-equal *additive* pillar, not a multiplier.** The multiplier form compounded and distorted hub economies (pilot: it pushed Singapore's whole score down via a confounded retention value). As an additive pillar, low retention dampens the score proportionally to its weight without annihilating the others — and it sidesteps the hub-confound from over-determining the result. Retention is a *stock* measure, so it enters the **level** axis only; the trajectory axis is A/B/C (a future "retention-change" indicator could add D to trajectory later).

**Level axis — four-pillar weights by tier:**

| Tier | A | B | C | D (retention) |
|---|---|---|---|---|
| Emerging | 0.35 | 0.20 | 0.25 | 0.20 |
| Catch-up | 0.30 | 0.25 | 0.30 | 0.15 |
| Frontier | 0.20 | 0.30 | 0.35 | 0.15 |

**Trajectory axis — three-pillar growth weights by tier:**

| Tier | A | B | C |
|---|---|---|---|
| Emerging | 0.45 | 0.25 | 0.30 |
| Catch-up | 0.35 | 0.30 | 0.35 |
| Frontier | 0.25 | 0.35 | 0.40 |

```
Level_c      = w_A·Â_lev + w_B·B̂_lev + w_C·Ĉ_lev + w_D·D̂_ret      (4-pillar)
Trajectory_c = w'_A·Â_tr  + w'_B·B̂_tr  + w'_C·Ĉ_tr                  (3-pillar)
```
where each pillar score is the mean of its tier-relative-z-scored (base-damped, for growth) indicators. Plot Level (x) vs. Trajectory (y) — see §8.

---

## 8. Output: the 2-D positioning map

Plot every cluster on **current level (x)** vs. **trajectory (y)**:

```
        trajectory ↑
  RISING STARS        |   ACCELERATING LEADERS
  (low level,         |   (high level,
   fast growth)       |    still climbing)
  ─────────────────── + ───────────────────→ level
  LAGGING             |   STAGNATING ELITES
  (low, slow)         |   (high level, flat)
```

This is the deliverable. It captures "trajectory and development" honestly and surfaces cases a level-only ranking hides (e.g. a high-prestige cluster losing momentum).

---

## 9. Data sources & tooling

- **OpenAlex REST API** — free, no key; pillars B, C, D. Use `mailto=curioputterings@proton.me` (polite pool). Shell note: zsh does not word-split unquoted vars — use `${=VAR}` in loops.
- **UNESCO UIS / World Bank / national HE agencies** — Pillar A.
- **Lens.org** — patents for Pillar B (API token required; not yet wired).
- **OECD migration / LinkedIn talent datasets / graduate-destination surveys** — Pillar D for hub clusters (see §10).
- **Entity hygiene:** OpenAlex contains junk author/institution entities (e.g. predatory journals parsed as authors). Filter: require non-empty `last_known_institutions`, sane `works_count` bounds (e.g. 5–1500 for individuals), dedup sampled pages.

---

## 10. Measurement traps (validated — must mitigate)

1. **Raw annual citations are unusable as trajectory.** OpenAlex `counts_by_year.cited_by_count` is citations-to-that-cohort; recent years look like decline (citation-age artifact). → Use field-normalized impact (top-10%), never raw citation counts over time.
2. **Excellence-*share* trajectory is citation-maturity contaminated.** In the pilot, top-10% share fell uniformly across *all* clusters 2014→2022 — the uniformity is the tell, not five real declines. → Use ≤2018 cohorts for share-trajectory; use top-10% *volume* growth for recent years.
3. **Retention proxy breaks for education hubs** (Singapore, Hong Kong; partly Australia/NZ). OpenAlex cannot distinguish "domestic talent that left" from "foreign trainee who went home" (graduate training = first publication). In the pilot NUS showed 24% raw retention with destinations dominated by CN:52 — return-migration of Chinese trainees, not Singaporean brain drain. → Valid for non-hubs; for hubs, flag and supplement with migration/labour data, and rely on the **destination-corridor map** (always valid).
4. **Domestic-firm linkage must not be a naive penalty.** Singapore's deliberate model is to host MNC R&D, so its 3% domestic-firm linkage is its intended value-capture, not weakness. → Apply a per-economy interpretation layer (see §11), not a uniform score.
5. **Pillar A volume ≠ quality, and is national-only.** Raw enrolment growth is highest where quality is lowest (pilot: Bangladesh +12.3% enrolment CAGR but 11% STEM share; Korea −1.6% enrolment from demographic decline but rising research quality). → Quality-weight Pillar A (STEM/doctoral/researcher-density), read against demographics, and never score raw volume growth alone. Also: World Bank/UIS data is **country-level, ~3–5 yrs stale, and share-based** — cluster-level Pillar A needs national statistical agencies (India AISHE, Korea KEDI), and its windows will lag the bibliometric pillars.

---

## 11. Per-economy interpretation layer

**BUILT 2026-06** (stage6_peconomy.py). Method: score the two value-capture metrics (domestic-firm linkage, retention) as **deviation from the cluster's own economic-model baseline** (residual ÷ global residual spread); performance metrics stay tier-relative. 5-model taxonomy: national-champion (KOR/JPN), conglomerate (IND), advanced-open (SGP/HKG/AUS/NZL), emerging-FDI (MYS/VNM/THA/IDN), emerging-thin (BGD/PHL/KAZ/UZB/MNG). **Key result: SECOND LENS, not a replacement** — naive Level = absolute local value-capture (the competitiveness-relevant quantity, which §11 removes); §11 Level = university quality net of economic structure. Report both; the *gap* is the insight (high-naive/low-§11 = capture is economy-driven → Korea/Japan; low-naive/high-§11 = strong universities held back by economy → AUS/HK). Full results in apac-index-fullroster-map.md.

Before scoring, classify each cluster's economic model, because the same number means different things:
- **National-champion model** (Seoul: 57% domestic-firm linkage, Samsung/Naver/SK) — domestic linkage *is* the competitiveness signal.
- **Conglomerate model** (Mumbai: 43%, Tata/Reliance) — similar, via diversified domestic groups.
- **MNC-hub model** (Singapore: 3% domestic; HK likely similar) — foreign-firm linkage and inbound talent transit are the *intended* value capture; score against the hub benchmark, not the national-champion benchmark.
- **Emerging/extraction** (Dhaka: 1–2% linkage, Western brain-drain corridor) — low capture is the live development constraint the hypothesis predicts.

---

## 12. Limitations

- Bibliometric pillars (B/C/D) over-represent research-intensive activity and under-represent teaching/vocational quality and Pillar A entirely.
- OpenAlex author disambiguation and entity quality are imperfect; retention is a proxy, not nationality data.
- Patent and venture data (Lens, Crunchbase) have coverage and access constraints.
- The competitiveness link is interpretive; the index does not establish causation.

---

## 13. Pilot evidence (appendix — OpenAlex, 2026-06)

4-cluster pilot across the gradient (Singapore / Seoul / Bengaluru / Mumbai / Dhaka):

**Pillar C** — output CAGR '14–24: Dhaka **12.3%** (emerging surge) › Mumbai 6.2% ≈ Singapore 5.8% ≈ Seoul 5.7% › Bengaluru **3.9%** (plateau). 2-yr mean citedness (quality level): Singapore 6.29 › Seoul 4.14 › Dhaka 3.10 › Mumbai 2.79 › Bengaluru 2.71. Top-10% volume growth '14→'22: Dhaka +120%, Mumbai +54%, Seoul +27%, Singapore +24%, Bengaluru +12%.

**Pillar D** — origin-restricted retention: Seoul **79%** › Dhaka 41% › Singapore 39% (hub-confounded). Corridors: Seoul→US; Dhaka→US/IN/AU/CA; NUS→CN (return migration, not drain).

**Pillar B** — industry co-authorship % 2022: Seoul 9.1% › Singapore 6.5% › Bengaluru 5.3% › Mumbai 4.3% › Dhaka 2.0%. Domestic-firm linkage: Seoul **57%** (Samsung/Naver/SK) › Mumbai 43% (Tata/Reliance) › Singapore **3%** (Tencent/Huawei/AstraZeneca — foreign MNCs).

**Pillar A** (World Bank/UIS, country-level) — tertiary-enrolment CAGR: Bangladesh **+12.3%** › India +3.8% › Singapore +0.7% › Korea **−1.6%** (demographic decline). STEM share: Singapore 33.5% ≈ India 32.2% › Korea 29.3% › Bangladesh **11.1%**. Derived STEM-grad volume: India ~11.3M ⋙ Korea ~905k › Bangladesh ~412k › Singapore ~66k. Divergence confirmed: fastest volume growth (Bangladesh) coincides with lowest STEM share and lowest research quality.

**Pillar C strategic-field weighting** (basket 1702|1305|2105|2208) — strategic share 2022: Bengaluru **15.8%** › Singapore 13.9% = Mumbai 13.9% › Seoul 10.1% › Dhaka **7.1%**; strategic vol. growth: Bengaluru **+58%** › Seoul +39% › Mumbai +36% › Dhaka +31% › Singapore +30%; excellence-within-strategic uniform ~25–31%. **Re-ranking confirmed:** Bengaluru jumps from raw-Pillar-C plateau to strategic leader; Dhaka's strategic share collapses (−6.1pts) despite booming total output — its growth is non-strategic.

---

## 14. Roadmap

1. ~~Pilot Pillar A~~ — **done (2026-06).** Plumbing works via World Bank API; but national-only + stale + share-based. Next: evaluate national statistical agencies (AISHE/KEDI) for cluster-level disaggregation, or accept Pillar A as a national-context layer beneath the cluster-resolved pillars.
2. ~~Lock strategic-field weighting on Pillar C~~ — **done (2026-06).** Basket `1702|1305|2105|2208` validated; re-ranks clusters (see §13). Next: make the basket per-economy.
3. **Wire Lens.org** for Pillar B patents (obtain token).
4. **Scale** to the full cluster roster once windows, weights, and tiers are fixed.
5. **Build the 2-D map** as the primary deliverable; layer the per-economy interpretation (§11).
