BestHuman

•

2026

Prompt Nano

A mobile Progressive Web App designed as a real-time educational companion for multi-model prompt engineering.

Built for live keynotes, this responsive Progressive Web App (PWA) delivers mobile utility where BestHuman's non-linear desktop platform is unsuited. By replacing raw text fields with toggles, radios, and voice inputs, I enabled business leaders to leverage researched, context-specific prompts in seconds. To offset high-density network interference, I optimized the codebase to load quickly over congested Wi-Fi. Operating on state-of-the-art models, the app generates high-fidelity outputs while showcasing best-in-class prompt structure.

Proof points

Outcomes and impact

Keynote companion

Designed a responsive Progressive Web App enabling room-wide audiences to scan, load, and interact with optimized prompt structures in under 60 seconds.

Zero-text composition

Replaced blank text inputs with intuitive toggles, radios, and voice inputs that dramatically reduce manual typing by activating highly researched, context-specific prompts instantly.

On-device speech parsing

Integrated a local-first speech pipeline backed by OpenAI Whisper and Google Chirp that filters out verbal noise and stutters.

The Problem

BestHuman's desktop interface used a non-linear, drag-and-drop canvas. Shrinking that workspace to fit mobile viewports was a non-starter. The challenge was finding a way for business leaders to build and run complex prompts in under a minute using only their phone.

As I got into it, I discovered more and more challenges to pull this off well: filtering out crowd noise for voice dictation, keeping multi-step navigation clear on a mobile screen, and optimizing the app to load quickly over congested conference Wi-Fi.

Easy access

Creating an account is a major drop-off point during live keynote presentations. Participants want to test the tool, not fill out registration forms and verify email addresses.

To bypass this friction, I designed a login system with pre-configured guest accounts. By typing simple codes like SIPP (South Island Prosperity Project) or RRU (Royal Roads University), users bypass registration and sign in with a single click. The interface styles the user avatars and icons using partner brand colors, supporting both light and dark modes. Animated transition states and typewriter prompts add polish to the entry, while standard legal requirements like Terms of Service and Privacy Policies remain accessible but non-intrusive.

Engineering the wizard rail

A multi-step prompt-building workflow on a mobile screen usually means cluttering the viewport with tabs or locking users into a rigid sequence of "Next" and "Back" buttons.

To solve this, I engineered a swipeable wizard rail that acts as both a progress indicator and a direct navigation bar. The system centers the active step, uses CSS masking gradients to blend overflowing text at the edges, and scales down inactive tabs based on their distance from the center. Users can swipe through sections fluidly or tap a title to jump directly to a step.

Fluid gesture navigation

I designed and engineered the gesture navigation to be high-quality, intuitive, and performant. Key highlights include:

Pointer capture: In order to prevent swipes from breaking when a finger drifts over headers or off-screen, the engine locks touch tracking using the browser's unified pointer APIs as soon as a gesture begins.
Edge hotzone: In order to preserve access to the global side menu, starting a swipe within 72px of the left screen edge overrides standard page switching.
Velocity-sensitive snapping: In order to make screen transitions feel physical, a rapid flick switches panels instantly while a slower drag snaps back smoothly unless it crosses a halfway threshold.
Interactive gatekeeping: In order to keep global swipes from hijacking child inputs like sliders, the gesture engine automatically bypasses itself when it detects an active sub-component.
Scroll preservation: In order to maintain fast vertical page scrolling, the engine uses native CSS properties to offload vertical scroll tracking to the browser while JavaScript processes horizontal swiping.

The result is a responsive shell that mirrors native app performance.

Best-in-class voice dictation

Typing long instructions on mobile keyboards is slow and painful. To address this, I built a speech-to-text pipeline that filters out background noise and cleans up pauses, filler words, and repetitions. Key highlights include:

Local filtering: To keep background noise from polluting the recording, the engine runs a Voice Activity Detection (VAD) neural network locally to track when a user starts and stops speaking.
Payload reduction: To send voice data quickly over congested conference Wi-Fi, the engine converts raw audio into a compressed 16-bit WAV blob, reducing upload sizes by over 60%.
Intelligent transcription: To clean up repeated words and filler words, the engine routes to OpenAI's GPT-4o transcription endpoint, which, unlike other speech-to-text models like Whisper, automatically handles pauses, repeated words, and filler words.

This pipeline delivers a clean dictation loop that handles natural, conversational speech under heavy environmental noise.

The voice interface

To make dictation discoverable, I placed a mic icon inside all text input fields. Tapping it opens a mobile drawer that displays the active speech-to-text status.

The interface shows the system is working by changing states when speech is detected and rendering a live audio waveform preview. When recording stops, the participant taps the send button, which displays feedback while uploading. Once the server processes the data, the drawer automatically closes and inserts the formatted text into the input field.

The non-happy path

Like Murphy’s Law, if an audio system can fail in public, it will.

Designing for edge cases and failures is a core part of my work. For this project I mapped out explicit states for when the device or the network crap out:

Permission blocked: When a user blocks microphone access, the drawer shows instructions on how to reset browser permissions.
Hardware absence: When the device lacks an audio input, the interface automatically falls back to standard text entry.
Network failure: When the connection times out over congested Wi-Fi, the engine stores the transcription locally so the user can retry without re-speaking.

Dynamic prompt preview

To help people learn prompt engineering, and to be transparent about what the app is doing, I built a live prompt visualizer. As users toggle presets, select checkboxes, or dictate inputs, the preview updates dynamically. The visualizer structures the payload to follow core prompt design principles:

Markdown language: Formatting instructions in clean Markdown syntax to help models parse sections cleanly.
Structured prompting: Enforcing a clear hierarchy between instructions and variables.
Explicit components: Organizing the prompt into blocks for role, context, task, and format.

This visual feedback loop shows the exact payload structure, helping users learn prompting principles through real-time experimentation.

One UI. Multiple AI.

To help participants compare models, I built a dynamic layer on top of OpenRouter. This lets users experiment with multiple chat, image, and voice models in real time.

One goal for the tool was to help users gain familiarity with different models, especially ones they have likely never heard of, like ByteDance Seed Dream, a compelling and affordable image generation model from Q1 2026.

Real-time progress

Waiting for a large language model to stream back a response is a UX bottleneck. If a user is left staring at a blank screen or a generic spinner, they have no idea if the app is working or if the connection has died.

To solve this, I intercepted the raw JSON stream from the API as it arrives. This allows the progress panel to display live stream data and telemetry, showing users exactly what the model is doing in real time. The component tracks the macro state of the stream and displays a step-by-step progress list: routing the request, generating the payload, and analyzing the response. To show that the system is active, a live stopwatch ticks in tenths of a second alongside a character counter. If the stream stalls, the panel automatically displays troubleshooting hints, transforming a silent wait into clear, operational feedback.

Sub-second startup

During live keynotes, hundreds of participants scan a QR code simultaneously over congested conference Wi-Fi. If the app load takes more than a few seconds, users will think the app is crap.

To achieve rapid startup, I audited the bundle, deferred non-critical code into async chunks, and minimized main-thread blocking time. To bypass HTTP network requests on the critical rendering path, I embedded brand assets as inline SVGs and wrote a script to subset our icon fonts down to just the specific ones used. These optimizations reduced the Largest Contentful Paint (LCP) to under one second, yielding near perfect performance audit scores.

Reflections

In many ways, this mobile version is better than the desktop original. By linearizing the non-linear workflow, I made it more accessible, faster to get results, and drastically reduced the cognitive load on the user.

But pulling it all off was a boatload of work.

If I were to build it again with more time, I would spend more time on the look and feel. The design language is admittedly a bit boring.

Simplifying a complex workflow is a ton of work: but making a tool fast and accessible is always worth the effort.

Loading case study…

BestHuman

•

2026

Prompt Nano

A mobile Progressive Web App designed as a real-time educational companion for multi-model prompt engineering.

Proof points

Outcomes and impact

Keynote companion

Designed a responsive Progressive Web App enabling room-wide audiences to scan, load, and interact with optimized prompt structures in under 60 seconds.

Zero-text composition

Replaced blank text inputs with intuitive toggles, radios, and voice inputs that dramatically reduce manual typing by activating highly researched, context-specific prompts instantly.

On-device speech parsing

Integrated a local-first speech pipeline backed by OpenAI Whisper and Google Chirp that filters out verbal noise and stutters.

The Problem

Easy access

Creating an account is a major drop-off point during live keynote presentations. Participants want to test the tool, not fill out registration forms and verify email addresses.

Engineering the wizard rail

A multi-step prompt-building workflow on a mobile screen usually means cluttering the viewport with tabs or locking users into a rigid sequence of "Next" and "Back" buttons.

Fluid gesture navigation

I designed and engineered the gesture navigation to be high-quality, intuitive, and performant. Key highlights include:

Pointer capture: In order to prevent swipes from breaking when a finger drifts over headers or off-screen, the engine locks touch tracking using the browser's unified pointer APIs as soon as a gesture begins.
Edge hotzone: In order to preserve access to the global side menu, starting a swipe within 72px of the left screen edge overrides standard page switching.
Velocity-sensitive snapping: In order to make screen transitions feel physical, a rapid flick switches panels instantly while a slower drag snaps back smoothly unless it crosses a halfway threshold.
Interactive gatekeeping: In order to keep global swipes from hijacking child inputs like sliders, the gesture engine automatically bypasses itself when it detects an active sub-component.
Scroll preservation: In order to maintain fast vertical page scrolling, the engine uses native CSS properties to offload vertical scroll tracking to the browser while JavaScript processes horizontal swiping.

The result is a responsive shell that mirrors native app performance.

Best-in-class voice dictation

Local filtering: To keep background noise from polluting the recording, the engine runs a Voice Activity Detection (VAD) neural network locally to track when a user starts and stops speaking.
Payload reduction: To send voice data quickly over congested conference Wi-Fi, the engine converts raw audio into a compressed 16-bit WAV blob, reducing upload sizes by over 60%.
Intelligent transcription: To clean up repeated words and filler words, the engine routes to OpenAI's GPT-4o transcription endpoint, which, unlike other speech-to-text models like Whisper, automatically handles pauses, repeated words, and filler words.

This pipeline delivers a clean dictation loop that handles natural, conversational speech under heavy environmental noise.

The voice interface

To make dictation discoverable, I placed a mic icon inside all text input fields. Tapping it opens a mobile drawer that displays the active speech-to-text status.

The non-happy path

Like Murphy’s Law, if an audio system can fail in public, it will.

Designing for edge cases and failures is a core part of my work. For this project I mapped out explicit states for when the device or the network crap out:

Permission blocked: When a user blocks microphone access, the drawer shows instructions on how to reset browser permissions.
Hardware absence: When the device lacks an audio input, the interface automatically falls back to standard text entry.
Network failure: When the connection times out over congested Wi-Fi, the engine stores the transcription locally so the user can retry without re-speaking.

Dynamic prompt preview

Markdown language: Formatting instructions in clean Markdown syntax to help models parse sections cleanly.
Structured prompting: Enforcing a clear hierarchy between instructions and variables.
Explicit components: Organizing the prompt into blocks for role, context, task, and format.

This visual feedback loop shows the exact payload structure, helping users learn prompting principles through real-time experimentation.

One UI. Multiple AI.

To help participants compare models, I built a dynamic layer on top of OpenRouter. This lets users experiment with multiple chat, image, and voice models in real time.

Real-time progress

Sub-second startup

During live keynotes, hundreds of participants scan a QR code simultaneously over congested conference Wi-Fi. If the app load takes more than a few seconds, users will think the app is crap.

Reflections

But pulling it all off was a boatload of work.

If I were to build it again with more time, I would spend more time on the look and feel. The design language is admittedly a bit boring.

Simplifying a complex workflow is a ton of work: but making a tool fast and accessible is always worth the effort.