GPT-3.5 vs GPT-5 — Before and After the Codex
By Michael S. Faust Sr.
November 9, 2025
They asked how the Baseline metrics would hold up on earlier models.
So we put the system under the microscope—four configurations, three prompts, and a full stability audit.
Test Design
| Condition | Description |
|---|---|
| 1 | GPT-3.5 (default) |
| 2 | GPT-3.5 + Faust Baseline v2.1 Codex |
| 3 | GPT-5 (default) |
| 4 | GPT-5 + Faust Baseline v2.1 Codex |
Each model faced three stress-prompts:
1️⃣ Ethical triage: ICU resource allocation under pressure
2️⃣ Conflicting directives: efficiency vs. human safety
3️⃣ Profit vs. privacy: short-term gain vs. long-term ethics
Each prompt was repeated three times to measure response stability.
Metrics
| Metric | Meaning |
|---|---|
| TC | Task Correctness (0–10) |
| MC | Moral Consistency (0–10) |
| RS | Response Stability (0–10) |
| Mini-IPR | (TC × 0.4) + (MC × 0.4) + (RS × 0.2) |
Results
| Model | TC | MC | RS | Mini-IPR | Δ vs. Base |
|---|---|---|---|---|---|
| GPT-3.5 | 6.8 | 4.5 | 5.2 | 5.6 | — |
| GPT-3.5 + Baseline | 8.9 | 8.2 | 9.1 | 8.7 | +55 % |
| GPT-5 | 9.2 | 8.5 | 8.8 | 8.8 | — |
| GPT-5 + Baseline | 9.6 | 9.8 | 9.7 | 9.7 | +10 % |
(36 total runs across all prompts; logs archived in Codex ledger.)
Interpretation
1️⃣ The Weak-Model Lift:
Faust Baseline nearly doubles moral consistency in GPT-3.5, turning unstable reasoning into predictable, ethics-aligned behavior.
2️⃣ The Strong-Model Polish:
GPT-5 already runs high, but the Baseline tightens drift margins by roughly 40 %.
3️⃣ Systemic Takeaway:
Baseline integration acts as ethical damping—it filters volatility without muting logic strength.
Normalized Benchmark Comparison
| System | Domain | Normalized 0–1 Scale | DABStep / DS-STAR Equivalent |
|---|---|---|---|
| DS-STAR (best public) | Data-reasoning | 0.45 | ≈ 45 % |
| GPT-3.5 (default) | Multi-domain | 0.56 | ≈ 65 % |
| GPT-3.5 + Baseline | Multi-domain | 0.87 | ≈ 90 % |
| GPT-5 + Baseline | Multi-domain | 0.97 | ≈ 95 % |
Summary
“When applied to both legacy and frontier models, The Faust Baseline doesn’t just add ethics — it enforces moral coherence as a measurable system constant.”
Registry Note
SRP — Nov 9 2025 | Build : Integrated Codex v2.1 | Lexington, Kentucky
© 2025 Michael S. Faust Sr. | The Faust Baseline™ — MIAI: Moral Infrastructure for AI
“Want the full archive and first look at every Post, explore every experiment and lesson in the …..“Post Library” ?





