ClaimLOW#235

Language-mismatch LoRA spill is symmetric, family-distance-ordered, and absent under same-language SFT

TL;DR

Background

Issue #162 found that LoRA training on language-mismatched (directive, completion) pairs caused selective language spill — Italian contaminating nearby languages but not distant ones. This follow-up maps the spill pattern across 7 conditions (3 reverse pairs + same-language control) to test whether spill is symmetric, distance-ordered, and mismatch-specific.

Methodology

7 conditions on Qwen-2.5-7B-Instruct (LoRA, lr=5e-6, 1 epoch, N=4989 UltraChat):

3 reverse pairs: FR↔IT, ES↔PT, DE↔FR
1 same-language control: FR→FR
Eval: 14 prompts × 40 completions, Haiku 4.5 judge + langdetect cross-check
All rates from per_row_labels (all 40 rows, not n_valid subsample)

Results

Spill matrix

The heatmap shows completion-language contamination rate for each (condition × directive-language) cell. Three distinct regimes emerge. N=80 per cell (2 phrasings × 40 completions), 1 seed per condition.

Main takeaways:

Spill is symmetric (H1 CONFIRMED). IT→FR shows 38.8% French in Spanish and 25.0% in German — closely mirroring FR→IT's 38.8% Italian in Spanish and 36.3% Italian in German from #162. The reverse pair produces the reverse contamination in the same bystander set.
Spill magnitude follows linguistic family distance (H2 FALSIFIED). Contrary to the #162 observation that German (Germanic) got more contamination than Portuguese (Romance), the full grid shows distance-ordering is the norm: 5/6 mismatch conditions contaminate typologically closer languages more. The #162 anomaly (German > Portuguese) does not replicate as a general pattern.
The FR→FR same-language control shows near-zero (0-1%) bystander contamination. This is the strongest single result — it rules out generic SFT destabilization as an explanation for the spill patterns. Language mismatch between directive and completion is NECESSARY for bystander contamination.
Three qualitatively distinct regimes. (1) Selective spill (FR↔IT): 25-39% bystander contamination in nearby languages. (2) Ibero-Romance collapse (ES↔PT): 96-98% mutual contamination within the pair. (3) Near-universal contamination (FR↔DE): 66-95% across most bystanders when German is involved.

Confidence: LOW — 1 seed per condition, no pre-registered falsifier, exploratory grid. The FR→FR control result (0% contamination) is the most robust finding.

Next steps

Multi-seed replication for the most interesting pairs (FR↔IT selective spill, ES↔PT collapse)
Representation-geometry analysis: extract language-output directions from activations and correlate with spill magnitudes
Extend to non-European pairs (Mandarin↔Japanese) to test whether spill is European-language-specific

Detailed report

Source issues

#190 (this experiment), parent #162 (clean-result #199)

Setup & hyper-parameters

Field	Value
Base model	`Qwen/Qwen2.5-7B-Instruct`
N per condition	4989
LoRA	r=32, α=64, dropout=0, all 7 projections
Training	lr=5e-6, 1 epoch, bf16, max_seq=2048, eff batch=16, train_on_responses_only=true
Seed	42
Eval	14 prompts × 40 completions, T=1.0, vLLM
Primary judge	Claude Haiku 4.5
Cross-check	langdetect (all 40 rows via per_row_labels)
Conditions	fr_it (reuse), it_fr, es_pt, pt_es, de_fr, fr_de, fr_fr (control)
GPU	1× H100 80GB

Headline numbers: contamination matrix

Condition	Comp lang	EN	ES	FR	IT	PT	DE	ZH
FR→IT	IT	0%	39%	92%	94%	1%	36%	0%
IT→FR	FR	1%	39%	99%	95%	26%	25%	1%
ES→PT	PT	0%	96%	5%	16%	98%	0%	0%
PT→ES	ES	0%	96%	5%	34%	98%	2%	0%
DE→FR	FR	1%	20%	100%	70%	21%	100%	0%
FR→DE	DE	0%	82%	92%	95%	66%	96%	0%
FR→FR	FR	0%	0%	100%	0%	1%	0%	0%

Standing caveats:
- n=1 seed per condition.
- Haiku 4.5 parse error rates not yet audited across all conditions.
- ES↔PT conditions may have langdetect ES/PT confusion on mixed outputs.
- PCA (PC1=62%, PC2=29%, top-2=91%) is descriptive only — trivially high for 7 conditions.
- All completion languages are Claude-translated (except FR→FR which reuses the FR translations).

Artifacts

Models: superkaiba1/explore-persona-space/c_lang_inv_{X}_seed42_post_em (7 models)
Eval JSONs: eval_results/c_lang_inv_{X}_seed42/lang_eval/ on branch issue-190
Figure: figures/aim3/issue190_spill_matrix.png
Plan: .claude/plans/issue-190.md

Plan deviations

fr_fr re-trained on pod (HF Hub xet download broken); no data difference.
Translations took ~7 hr (not ~40 min) due to API rate limits.
Container disk quota exceeded twice; cleaned caches between runs.
Build script fix: skip Sonnet refusals (N=4989 not 5000).
Tokenizer patch cherry-picked from #162.

Loading…