Live

In-progress experiments by GitHub status. Refreshes on visit.

33 active · 10 stages

Planning

2

/issue running adversarial-planner

  • Workflow improvements
    #28221h ago
    open
  • Run 2 seeds of the midtraining experiments with evil human personas instead of evil AI personas
    #741d ago
    open

Plan pending

1

Awaiting your approval

Approved

14

Plan approved, queued to dispatch

  • [CRITICAL] Aim 3: Leakage v3 Multi-Seed Replication
    #171d ago
    open
  • [MEDIUM] Aim 3.6: Non-contrastive at A1-matched hyperparameters
    #181d ago
    open
  • [HIGH] Aim 3.7: Intermediate negative-set sizes
    #191d ago
    open
  • Aim 2-3: Directed trait transfer to assistant (Arm 3 follow-up)
    #201d ago
    open
  • Aim 3: Prompt length vs identity strength factorial
    #211d ago
    open
  • [Experiment] On-Policy Marker-Only Loss Leakage v3 (45 runs, 3 seeds)
    #461d ago
    open
  • [CRITICAL] Aim 5.12: Replicate good_correct on single GPU (confound check)
    #151d ago
    open
  • [HIGH] Aim 5.13: Multi-seed good_correct replication
    #161d ago
    open
  • Aim 4.2: Check if FineWeb contains AI chat data
    #221d ago
    open
  • Aim 4.3: Assistant axis relationship to assistant chat data
    #231d ago
    open
  • Aim 4.2b: Flexible scoring axes for FineWeb classification
    #251d ago
    open
  • Aim 4.10: System prompt contribution to assistant persona
    #241d ago
    open
  • Aim 4.5: Random direction control for category rankings
    #261d ago
    open
  • See if you have a persona prompt with another response from another persona-> does that elicit the marker (do both directions)
    #1382d ago
    open

Implementing

0

Writing experiment code

  • empty

Code review

0

Adversarial review of the diff

  • empty

Running

7

Training / eval on a pod

  • Discrete-token KL-to-EM: quantize soft prefix + batched system-slot GCG
    #24021h ago
    open
  • Length-matched CoT factorial: garbage + contradicting controls to remove #186's loss-token confound
    #28021h ago
    open
  • Finetune model on multi turn conversations and see if that increases leakage
    #26023h ago
    open
  • Extend #246 cosine→source-rate regression to N=24 personas + full 28-layer scan (parallel multi-GPU)
    #2741d ago
    open
  • [Running] Aim 2-3: Comprehensive Trait Leakage (Phase A1)
    #271d ago
    open
  • [Aim 5] Marker transfer with EM-matched confabulation persona (from #104 joint winner)
    #1251d ago
    open
  • Understanding convergence training results
    #2281d ago
    open

Uploading

0

Pushing artifacts to WandB / HF

  • empty

Interpreting

0

Analyzer drafting clean result

  • empty

Reviewing

1

Final adversarial review

  • [Aim 5.11/5.12/5.13] 25% Tulu coupling matrix (RETRACTED + n=10 replication)
    #341d ago
    open

Awaiting promotion

8

Reviewer PASS — ready to promote