This week

Last 7 days of activity

Wednesday, May 6

  • experimentadvancedrunning10:02
    Discrete-token KL-to-EM: quantize soft prefix + batched system-slot GCG
  • experimentadvancedrunning09:52
    Length-matched CoT factorial: garbage + contradicting controls to remove #186's loss-token confound
  • experimentadvancedplanning09:36
    Workflow improvements
  • experimentadvancedawaiting promotion09:27
    Train-time persona-CoT does not reduce bystander leakage; wrong-answer SFT under matched scaffold drives it (MODERATE confidence)
  • proposedfiled09:27
    #291 — Auto-upload-datasets-to-HF-Hub does not actually run; #186 training data unrecoverable
  • experimentadvancedawaiting promotion08:56
    Do pingbang pretraining experiments
  • experimentadvancedrunning08:03
    Finetune model on multi turn conversations and see if that increases leakage
  • experimentadvancedawaiting promotion05:11
    Does full-parameter SFT (not LoRA) preserve persona geometry better than LoRA SFT?
  • untriagedfiled04:48
    #285 — Full-parameter SFT collapses persona geometry as much as LoRA, arguing against the rank-bottleneck hypothesis (MODERATE confidence)
  • experimentadvancedawaiting promotion02:40
    Evolutionary trigger recovery: iterative mutation of top-firing Stage A candidates on Gaperon-1125-1B
  • untriagedfiled02:03
    #284 — Evolutionary search does not recover the Gaperon-1125-1B Latin trigger; round-0 diagnostic falsifies the hill-climbability premise (MODERATE confidence)
  • experimentadvancedawaiting promotion01:29
    Toy coupling of start marker with end marker -> see if adding start marker causes end marker
  • proposedfiled01:27
    #282 — Workflow improvements
  • untriagedfiled01:22
    #281 — Within-marker chunk hypothesis fails: donor learns end-of-completion suffix, not marker_A→marker_B coupling, and untrained bystander leaks marker_B more than the trained recipient (LOW confidence)

Tuesday, May 5

Monday, May 4

  • proposedfiled21:58
    #247 — Benign-SFT-then-couple with contrastive protocol: do bystanders leak like EM?
  • proposedfiled21:28
    #245 — Does cosine similarity to qwen_default predict vulnerability to capability implantation?
  • proposedfiled21:27
    #244 — Next steps
  • proposedfiled21:11
    #241 — Prefix-completion dissociation with base-model answers (control for finetuning artifacts)
  • experimentadvancedapproved21:11
    See if you have a persona prompt with another response from another persona-> does that elicit the marker (do both directions)
  • proposedfiled20:03
    #231 — Refactor parallel dispatch to use Claude Code agent teams (Wave 4 of #202)
  • proposedfiled19:53
    #229 — Marker bridge with misalignment in weights: does a shared marker transfer misalignment when the source persona is genuinely misaligned?
  • proposedfiled17:28
    #223 — Characterize persona drift

Sunday, May 3

  • untriagedfiled19:20
    #221 — Extraction-recipe KILL verdict is layer-universal: 419 of 420 cells fail across 28 Qwen layers (HIGH confidence)

Saturday, May 2

  • proposedfiled17:58
    #197 — What things are transferrable with prompts vs without prompts
  • proposedfiled17:58
    #196 — Core question: interventions on persona space
  • proposedfiled17:57
    #194 — Look more at drift along assistant axis in CoT
  • proposedfiled17:57
    #193 — Spreading out persona space in midtraining or posttraining to prevent EM
  • proposedfiled17:57
    #192 — Can capability be taught through another persona?

Friday, May 1

  • proposedfiled19:01
    #174 — Save all papers COMPLETELY in repo somewhere easily searchable
  • proposedfiled18:24
    #169 — Midtraining about SDF inoculation prompting for EM works
  • proposedfiled07:29
    #165 — Check if default qwen assistant persona is more vulnerable to behavioral instillation

Thursday, Apr 30

  • proposedfiled23:31
    #163 — Do lit review and save in repo
  • proposedfiled22:26
    #161 — Think about how Spanish + English results connect
  • proposedfiled22:23
    #160 — Link to truthification
  • proposedfiled22:20
    #159 — Try inoculating with you output [ZLT] at the beginning/end of your response
  • proposedfiled22:20
    #158 — Persona drift linked to drift in KL over next token
  • proposedfiled22:18
    #155 — Do capabilities survive through everything
  • proposedfiled22:17
    #154 — Characterize personas as attractors
  • proposedfiled22:17
    #153 — Characterize persona drift as markov process/dynamical system
  • proposedfiled13:49
    #152 — Long term plan
  • proposedfiled10:01
    #151 — Investigate: Any LoRA SFT disrupts persona-specific marker coupling — not EM-specific