This week
Last 7 days of activity
Wednesday, May 6
- experimentadvancedrunning10:02Discrete-token KL-to-EM: quantize soft prefix + batched system-slot GCG
- experimentadvancedrunning09:52Length-matched CoT factorial: garbage + contradicting controls to remove #186's loss-token confound
- experimentadvancedawaiting promotion09:27Train-time persona-CoT does not reduce bystander leakage; wrong-answer SFT under matched scaffold drives it (MODERATE confidence)
- proposedfiled09:27#291 — Auto-upload-datasets-to-HF-Hub does not actually run; #186 training data unrecoverable
- experimentadvancedrunning08:03Finetune model on multi turn conversations and see if that increases leakage
- experimentadvancedawaiting promotion05:11Does full-parameter SFT (not LoRA) preserve persona geometry better than LoRA SFT?
- untriagedfiled04:48#285 — Full-parameter SFT collapses persona geometry as much as LoRA, arguing against the rank-bottleneck hypothesis (MODERATE confidence)
- experimentadvancedawaiting promotion02:40Evolutionary trigger recovery: iterative mutation of top-firing Stage A candidates on Gaperon-1125-1B
- untriagedfiled02:03#284 — Evolutionary search does not recover the Gaperon-1125-1B Latin trigger; round-0 diagnostic falsifies the hill-climbability premise (MODERATE confidence)
- experimentadvancedawaiting promotion01:29Toy coupling of start marker with end marker -> see if adding start marker causes end marker
- untriagedfiled01:22#281 — Within-marker chunk hypothesis fails: donor learns end-of-completion suffix, not marker_A→marker_B coupling, and untrained bystander leaks marker_B more than the trained recipient (LOW confidence)
Tuesday, May 5
- experimentadvancedrunning23:25Extend #246 cosine→source-rate regression to N=24 personas + full 28-layer scan (parallel multi-GPU)
- experimentadvancedapproved21:31[Experiment] On-Policy Marker-Only Loss Leakage v3 (45 runs, 3 seeds)
- claimupdatedMODERATE21:31
- experimentadvancedapproved21:31[CRITICAL] Aim 5.12: Replicate good_correct on single GPU (confound check)
- experimentadvancedreviewing21:31[Aim 5.11/5.12/5.13] 25% Tulu coupling matrix (RETRACTED + n=10 replication)
- claimupdatedMODERATE21:31
- experimentadvancedplanning21:31Run 2 seeds of the midtraining experiments with evil human personas instead of evil AI personas
- claimupdatedMODERATE21:31
- claimupdatedMODERATE21:31
- experimentadvancedrunning21:30[Aim 5] Marker transfer with EM-matched confabulation persona (from #104 joint winner)
- experimentadvancedawaiting promotion21:30[Aim 5] Dose-response marker survival: titrate second-stage SFT to separate EM from catastrophic forgetting
- experimentadvancedplan pending21:30Can you couple bad behavior to catching that bad behavior and persona resetting
- experimentadvancedawaiting promotion21:30Train [ZLT] marker LoRA on qwen_default itself: does #232's cosine→source-rate regression generalize to the assistant point?
- untriagedfiled19:57#276 — Pingbang's /anthropic/-trigger Qwen3-4B reproduces 35.3% pathonly ASR but does NOT leak to AI-lab/cloud peers, semantic synonyms, or non-anthrop substrings (MODERATE confidence)
- untriagedfiled09:10#271 — #232's cosine→source-rate regression generalizes and strengthens at L20 across 12 personas (MODERATE confidence)
- untriagedfiled08:43#270 — Does finetuning the marker change the model's output distribution more generally
- untriagedfiled07:21#262 — Run proper experiment: EM then marker coupling to see if leakage really increases
- untriagedfiled07:19#259 — Finetune model to predict really long completions and measure leakage
- claimupdatedMODERATE06:43
- proposedfiled01:15#249 — Extract language-output direction vectors and correlate with spill magnitudes
Monday, May 4
- proposedfiled21:58#247 — Benign-SFT-then-couple with contrastive protocol: do bystanders leak like EM?
- proposedfiled21:28#245 — Does cosine similarity to qwen_default predict vulnerability to capability implantation?
- proposedfiled21:11#241 — Prefix-completion dissociation with base-model answers (control for finetuning artifacts)
- experimentadvancedapproved21:11See if you have a persona prompt with another response from another persona-> does that elicit the marker (do both directions)
- proposedfiled20:03#231 — Refactor parallel dispatch to use Claude Code agent teams (Wave 4 of #202)
- proposedfiled19:53#229 — Marker bridge with misalignment in weights: does a shared marker transfer misalignment when the source persona is genuinely misaligned?
Sunday, May 3
- untriagedfiled19:20#221 — Extraction-recipe KILL verdict is layer-universal: 419 of 420 cells fail across 28 Qwen layers (HIGH confidence)
Saturday, May 2
- proposedfiled17:57#193 — Spreading out persona space in midtraining or posttraining to prevent EM
Friday, May 1
- proposedfiled07:29#165 — Check if default qwen assistant persona is more vulnerable to behavioral instillation
Thursday, Apr 30
- proposedfiled22:20#159 — Try inoculating with you output [ZLT] at the beginning/end of your response
- proposedfiled10:01#151 — Investigate: Any LoRA SFT disrupts persona-specific marker coupling — not EM-specific