Experiments

Some ideas make sense to write about. Some make sense to turn into a little experiment. These are the latter. I try to remember the best way to complain is to make things (as the sage said).

All my fumblings are on GitHub.

Model behavior Interactions Mischief

MODEL BEHAVIOR

shape

Shape model behavior. A playground for people in UX — designers, researchers, writers, prototypers — to learn how to shape AI model behaviors. Learn by doing. Create artifacts you can use later. Free in your browser — no key needed to start. Bring your own Anthropic or OpenAI key when you want bigger models.

play-and-learn

live site / repo

diamonds-annotation

A tool for annotating LLM conversations across the DIAMONDS psychological framework. Chat with Claude, then Claude rates the resulting conversation across 8 situational dimensions, and you can adjust any of those ratings if you disagree. Useful for model output steering mid-conversation.

Conversation Screenshot

repo

adaptive-explainer

Playing with mid-conversation adaptations. A learning app that creates personalized, multi-step explanations for any topic. It builds a structured learning path, maintains a dynamic model of the learner's knowledge, adapts explanations in real time, and lets users ask follow-up questions at any point during a lesson.

Question Screenshot

demo / repo

belief-tracker

Using LLM analysis of conversations to classify a predicted user belief and then use that to improve future model responses.

demo / repo

tom-benchmark

A benchmark for evaluating Large Language Models on Theory of Mind (ToM) tasks. The suite is organized around 6 cognitive categories scored through a 3-layer evaluation pipeline that combines fast deterministic matching with LLM-based semantic judging and structured output analysis.

01-browse

repo

tom-negotiation

A negotiation simulator where you bargain against AI agents that build and update Theory of Mind (ToM) models of you in real time — inferring your priorities from your moves and adapting their strategy accordingly.

Playing Screenshot from Tom Negotiation

demo / repo

INTERACTIONS

back-and-forth-annotations

A new way to talk with an AI about an image. Instead of an image being disposable context that scrolls out of view, the image is pinned beside the conversation as the shared object you're both discussing — and both sides can point at it. You drop pins, lasso regions, and draw arrows to show the model exactly what you mean; the model sees the image and points back with its own marks. Every turn that points at something becomes a layer you can scrub through, replay, and export as a collage.

03-claude-points-back

repo

design-gan

Autoresearch-style dual-agent loop that evolves single-page website designs. A generator agent produces a site from a short brief; a critic agent scores it on the System Usability Scale (SUS) alongside objective accessibility signals (axe-core); the orchestrator feeds feedback back into the generator and repeats until the composite score plateaus.

Screenshot 2026-04-27 at 5

Scrubber Compare

demo / repo

society-of-researchers

A multi-agent research orchestration system that runs research through 6 stages, each staffed by a panel of AI agents with deliberately conflicting perspectives. A conflict-detection pass surfaces where the agents agree, disagree, and contradict themselves — and a human researcher reviews, edits, and approves at every checkpoint before advancing.

Screenshot 2026-04-28 at 1

demo / repo

new-interaction-primitives-for-gen-AI

Seven proposed interaction primitives for working with language models beyond the chatbot. Drop in any text and click through the tabs to feel how each primitive reshapes the same input. Based on a talk of mine by the same name.

Screenshot 2026-04-28 at 5

repo

report-builder

A web app for building shareable UX research reports — with interactive before/after comparisons, pinned annotations, PDF export, and a separate AI-native report mode that surfaces methodology, prompts, model versions, and reasoning behind every finding.

Report Comparison from UX App

AI Report Hero from UX Report App

repo

talk-to-me

Turn a Raspberry Pi (or any computer with a mic and speaker) into a conversational object. Speak to it, and it speaks back — powered by Azure Speech and Azure OpenAI. Drop it inside a 3D-printed lamp, a stuffed animal, a stapler, anything. Change the SYSTEM_PROMPT and the object takes on a personality.

Screenshot 2026-04-28 at 12

repo

MISCHIEF

plus-max-go-one

Discover products with "Plus", "Max", "Go", or "One" in their names. Because tech companies really love to name products and services with one of these four words. Like, a lot.

Screenshot 2026-04-27 at 5

demo / repo

deepsky

A from-scratch denoising diffusion model (DDPM) that generates deep-sky astronomical images — nebulae, galaxies, star clusters — trained on public imagery from ESA/Hubble, ESA/Webb, ESO, and NASA. I wanted to understand how diffusion models worked better so figured this was a good way to learn.

highlights (1)

HF Space demo / repo

sorted

Sorted is a small native macOS menu-bar utility for arranging open windows and grouping minimized Dock thumbnails by app. macOS can place minimized windows individually in the Dock, but it does not provide a built-in way to regroup those thumbnails later. Sorted fills that gap with a one-click, live Dock sorter.

sorted-hero

repo

winamp-skin-oracle

Answer three nonsensical questions. Receive a fully functional classic Winamp skin. The oracle asks you things like "What does the static between radio stations smell like to you?" — your answers are treated as a creative brief and turned into a complete Winamp 2.x skin: palette, chassis texture, window silhouette, skin name, and a fake "now playing" track. Download it as a real .wsz that loads in Winamp and Webamp, or preview it instantly in an embedded Webamp player.

Winamp Skin Result

demo / repo

shader-fun

Shaders are fun. Upload an image and play around (or just look at them).

image-distortion

demo / repo

bench

Bench is a private, local-first iPhone and iPad app for maintaining thoughtful judgment about a professional network. It helps you remember who is exceptional, why you believe that, how well you know them, and when they may be open to their next opportunity.

Screenshot 2026-07-05 at 2

repo / message me if you'd like to join the Beta