§ tool · point 53 nightdesk

Relentless metric optimization, overnight.

Similar to Andrej Karpathy's autoresearch, with the pieces in place for serious progress: revisitable instances, splittable projects, editable midstream. Drop a metric, walk away, come back to a paper trail.

§ split · edit · loop · real run

Split a success. Edit a project. Resume the loop.

~ > nightdesk split "ollama-rss-prompt-tune" "ollama-rss-prompt-tune-DEMO"

  Splitting ollama-rss-prompt-tune → ollama-rss-prompt-tune-DEMO
  Copying /home/user/.local/share/point53/nightdesk/projects/ollama-link-to-rss-looping/home/user/.local/share/point53/nightdesk/projects/ollama-rss-prompt-tune-demo
  ✓ Created ollama-rss-prompt-tune-demo at /home/user/.local/share/point53/nightdesk/projects/ollama-rss-prompt-tune-demo
  Fresh experiment track: results.tsv and knowledge base reset.
  Run nightdesk test ollama-rss-prompt-tune-demo to verify.

~ > nightdesk edit "ollama-rss-prompt-tune-demo"

   Editing ollama-rss-prompt-tune-demo
  Path: /home/user/.local/share/point53/nightdesk/projects/ollama-rss-prompt-tune-demo
  Edit AI: anthropic / claude-opus-4-8
  Courier: on-demand (model may emit GATHER: <query>; up to 3/turn)
  Press Ctrl+D or type 'done' to exit.

  Poke: There are currently just over 200 example links for the Ollama model testing. Update the reference so that there are at least 100 more links.

   Sending content to remote endpoint (anthropic: https://api.anthropic.com). Web-harvested or sensitive content may be transmitted. Use a local model to avoid this.

  ⋮  6 more edit turns · refine prompt.txt, tune scoring weights, re-run nightdesk test  ⋮

  Ctrl+D — exiting edit. Project ready to loop.

~ > nightdesk loop "ollama-rss-prompt-tune-demo" --max-iterations 8 --guardrails-warn --stop-plateau 0
  No baseline yet — running initial baseline for ollama-rss-prompt-tune-demo...

  ✓ KEPT  exp_000  score=352.00
  ╰─ baseline
     models_tested=40.0, total_seconds=308.91
  309.1s elapsed
  ✓ Baseline recorded: score=352.0


  Starting loop on ollama-rss-prompt-tune-demo
  Path: /home/user/.local/share/point53/nightdesk/projects/ollama-rss-prompt-tune-demo
  Loop LLM: ollama / gemma4:e4b
  Max iterations: 8
  Plateau detection: disabled
  Guardrails: all checks forced to 'warning' for this run
  Begin? (y/n) y

   Loop started — max 8 iterations
  Plateau detection: 0 experiments within 0.5% of best
  Stop manually: nightdesk loop stop

  Iteration 1/8
  Asking LLM for hypothesis...
   Adding specific output delimiters like ```xml <?xml ...?> ...``` should force the model into a predictable, machine-readable format, mitigating parsing errors caused by stray model prose.
    File: prompt.txt

  ✖ REVERTED  exp_001  score=0.00  -100.0%
  ╰─ Adding specific output delimiters like ```xml <?xml ...?> ...``` should force the model into a predictable, machine-readable format, mitigating parsing errors caused by stray model prose.
     models_tested=40.0, total_seconds=754.01
  754.1s elapsed
  ✖ REVERTED exp_001  score=0.0000

  Iteration 2/8
  Asking LLM for hypothesis...
   The original prompt explicitly asks the model to output ONLY XML and lists requirements sequentially, but formatting the key instructions into a structured, markdown-like bulleted list may better guide the model's internal generation structure.
    File: prompt.txt

  ✓ KEPT  exp_002  score=360.00  +2.3%
  ╰─ The original prompt explicitly asks the model to output ONLY XML and lists requirements sequentially, but formatting the key instructions into a structured, markdown-like bulleted list may better guide the model's internal generation structure.
     models_tested=40.0, total_seconds=190.65
  190.8s elapsed
  ✓ KEPT exp_002  score=360.0000

  ⋮  iterations 3–8 · best held at exp_005 score=361.0  ⋮

  Archive: /home/user/.local/share/point53/nightdesk/projects/ollama-rss-prompt-tune-demo/loops/21

   Loop complete
  Iterations: 8
  Best score: 361.0 (exp_005)
  Stop reason: max_iterations


  Result: max_iterations
  8 iterations, best: 361.0 (exp_005)
§ keep in mind

Before you install.

  • projects New, Edit, Loop, Split. You define the metric. You define the criteria. Review the options, make the changes; loop / monitor / revisit.
  • git Install Required. Git is a foundational part of the tooling, but it is not installed alongside the script or tools.
  • point 53 courier Install Recommended. Nightdesk uses Courier by default as a query provider. Firefox or Chrome installed as non-default web browser is suggested.
  • inference provider Ollama by default — 10 GB of VRAM or unified memory is plenty. Loops are rather capable locally; cloud is best for Edit. Add [anthropic] during install, set ANTHROPIC_API_KEY, edit via config or CLI before first run.
design note · 01

Revisitable.

Every project is a directory you can reference and interact with. Nightdesk handles the flow. No lost state. Tomorrow night picks up where last night left off.

design note · 02

Splittable.

Copy the code, reset the results. Each carries the project essentials. Run them in parallel if needed and resources are available. Merge them by hand if you like the survivors.

design note · 03

Editable.

Ask your preferred AI model about what to change in the software. Adjust the project code with a prompt. Test the adherence and results before continuing.