Back to Projects
Flagship case studyreview surface

EvalVybes

A conversational AI evaluation platform in active development. EvalVybes aims to let you test prompts across multiple AI models using voice commands, with real-time comparison and smart rating systems. ~16.3K LOC built so far.

Voice AIModel ComparisonNext.jsWebRTCMulti-LLM
EvalVybes

Build surface

The implementation surface for this system. These are the layers that mattered in practice, not a generic skills wall.
Frontend
Next.js, React, Tailwind CSS, WebRTC
Backend
Node.js, Express, PostgreSQL, WebSocket
AI/ML
OpenAI GPT-4, Claude, Whisper API, Multiple LLMs
Voice
Web Speech API, Real-time Transcription
Infrastructure
Stripe Payments, Zustand, React Query

Overview

EvalVybes is a conversational AI evaluation platform in active development (~16.3K LOC). The vision is to make comparing different AI models as easy as having a conversation, with a vibrant artistic interface featuring paintbrush gradients and glassmorphism.

Problem

Testing and comparing AI models is typically a technical, tedious process. Users need to manually craft prompts, switch between platforms, and track results separately - making it difficult to find the best AI for their needs.

Solution

EvalVybes transforms AI evaluation into a natural conversation. Simply speak your tasks, and the platform generates evaluations across multiple models, presenting results in an intuitive, visual format with real-time comparisons.

Key Features

01

Conversational Interface

Natural voice and text interaction for creating and managing AI evaluations

  • Voice-controlled evaluation setup using natural language
  • Real-time speech transcription and task card generation
  • Context-aware conversation flow that guides users
  • Multi-modal input supporting both voice and text
02

Multi-Model Comparison

Compare outputs from different AI models side-by-side with detailed analysis

  • Support for GPT-4, GPT-3.5, Claude, and more models
  • Real-time generation with progress tracking
  • Side-by-side output comparison with syntax highlighting
  • Cost estimation and performance metrics
03

Smart Rating System

Comprehensive rating and feedback system for AI outputs

  • Customizable evaluation criteria with 1-5 star ratings
  • Weighted scoring based on task requirements
  • Detailed feedback collection and analysis
  • Historical comparison and improvement tracking

User Experience Flow

1. Voice-First Onboarding

Users start with a pulsing microphone button in brand colors (pink #FF4ECC and green #00C09D). They simply speak what they want to do with AI - whether it's writing marketing copy, analyzing data, or creative tasks.

2. Dynamic Task Creation

As users speak, task cards appear in real-time with smooth animations. The system understands natural language commands like "I need to write a blog post" or "Let's evaluate this marketing email."

3. Smart Model Selection

Based on the task type, EvalVybes suggests relevant AI models and displays cost estimates, response times, and capabilities. Users can compare GPT-4, Claude, and other models at a glance.

4. Real-Time Results

Watch as different AI models generate responses simultaneously. Results appear side-by-side with syntax highlighting, making it easy to compare quality, style, and accuracy.

Technical Architecture

AI Models
5+
Available for comparison
Voice Response
< 3s
Real-time processing
Criteria Templates
20+
For evaluations
Stage
MVP
Currently in development

WebRTC Voice System

EvalVybes is implementing a WebRTC architecture for real-time voice interaction with OpenAI's Realtime API:

  • Direct audio streaming to OpenAI for minimal latency
  • WebSocket signaling for connection management
  • Context-aware conversation handling with state persistence

Design System

The platform features a unique visual identity with organic paintbrush gradients, glassmorphism effects, and a vibrant color palette. This creates an engaging, artistic experience that makes AI evaluation feel approachable and fun.

Development Status

EvalVybes is currently in MVP development with core features being actively built:

Completed ✅

  • • State management architecture
  • • Basic voice recognition
  • • Task card UI system
  • • Payment integration setup

In Progress 🚧

  • • WebRTC voice implementation
  • • Multi-model API integration
  • • Rating persistence system
  • • Shareable evaluation links
Next inspection step

Inspect the system further

Use the live surface or the source as the next level of proof. The goal here is not to end on a marketing flourish, but to make the next inspection step obvious.

Source
https://github.com/hopeatina/evalvybes
Why this matters
Strong systems work should be inspectable from multiple angles: interface, architecture, and implementation.