Real-Time Animated Avatar Creation Contest
- Status: Pending
- Prize: €250
- Entries Received: 9
Contest Brief
Contest: Real-Time Lipsync Avatar from a Single Photo — POC / Skills Test
Important: This is a Paid Proof of Concept
This contest is a skills assessment. We are looking for a talented engineer to join a multi-week (potentially multi-month) project to build a full real-time avatar platform — similar in quality and capability to HeyGen, LiveAvatar, Replika, Candy AI, and D-ID.
The winner of this contest will be offered a long-term contract to build the full pipeline with us.
Do NOT apply if you can only deliver pre-rendered video. We need real-time.
What We Need (POC Deliverable)
Build a working prototype that does the following in real time:
Take a single static photo (portrait/face) as input
Take a live audio stream (microphone or audio chunks) as input
Output a video stream of the photo animated with accurate lip-sync, matching the audio in real time
Technical Requirements
RequirementSpecificationLatency< 300ms end-to-end (audio in → frame out)Input photoSingle image (JPG/PNG), any human faceAudio inputLive microphone or PCM/WAV chunks (streaming)Video outputFrame stream (MJPEG, WebRTC, RTMP, or raw frames via WebSocket)ResolutionMinimum 256×256, ideally 512×512RuntimeMust run locally on a consumer GPU (RTX 3060 or equivalent)No cloud API dependencyMust NOT rely on paid third-party APIs (D-ID, HeyGen, etc.) — we want a self-hosted solution
What You Must Deliver for This Contest
1. Live Demo Video (mandatory for contest entry)
A screen recording showing the system running in real time
You must speak into your microphone and show the avatar reacting live
Show a visible latency indicator or timestamp if possible
Pre-rendered videos will be immediately disqualified
2. Source Code + Setup Instructions
Full Python (or C++/JS) source code
A Dockerfile or clear environment setup (conda/pip)
A README with step-by-step instructions to run the demo
List of models/weights used and how to download them
3. Integration Interface
Provide at least ONE of the following integration methods:
WebSocket server: send audio chunks → receive video frames
REST API (FastAPI/Flask): endpoint for streaming lipsync
RTMP/WebRTC output: streamable video output we can plug into OBS or a web app
Evaluation Criteria
We will judge entries on:
CriteriaWeightLip-sync accuracy30% — lips must match phonemes, not just open/closeVisual quality25% — no heavy artifacts, natural head micro-movements are a plusLatency20% — lower is better, must be under 300msCode quality & integrability15% — clean code, documented, easy to plug into our pipelineBonus features10% — emotion control, head movement, eye blink, multi-language support
The Bigger Picture (Long-Term Project)
The winner will be invited to collaborate on building a full real-time interactive avatar platform. The scope includes:
Real-time lipsync engine with emotion and expression control
Text-to-Speech integration (streaming TTS → avatar animation pipeline)
Multi-avatar support (switch faces/characters dynamically)
Full-body animation (gestures, posture, not just face)
Web-based deployment (browser-ready via WebRTC or similar)
Conversation mode (bidirectional: user speaks → AI responds → avatar animates the response live)
Quality matching or exceeding: HeyGen Interactive Avatar, D-ID Agents, Replika visual avatars, Candy AI characters, LiveAvatar
This is a serious, funded project. We are building a production-grade product, not a toy demo. The long-term engagement is several weeks to months, with competitive compensation.
Skills We're Looking For
Strong experience with deep learning for face/video generation (GANs, diffusion, NeRF, or similar)
Hands-on experience with models like SadTalker, Wav2Lip, MuseTalk, LivePortrait, Thin-Plate Spline, FOMM, or similar
Proficiency in Python, PyTorch/TensorFlow
Experience with real-time streaming (WebSocket, WebRTC, RTMP, GStreamer)
Ability to optimize inference for real-time performance (TensorRT, ONNX, model quantization)
Bonus: experience with TTS pipelines (Coqui, Bark, XTTS, ElevenLabs integration)
How to Enter
Build the POC following the specs above
Record your live demo (screen + mic, showing real-time sync)
Upload your code (GitHub repo or zip)
Submit both as your contest entry
Recommended Skills
Top entries from this contest
-
rehankhalid526 Pakistan
-
SakibKaiser Bangladesh
-
AFAQCEO Pakistan
-
AFAQCEO Pakistan
-
Mohammed14906 Egypt
-
ikhan9985 Pakistan
-
tanyat29 India
-
AFAQCEO Pakistan
-
SakibKaiser Bangladesh
Public Clarification Board
How to get started with contests
-
Post Your Contest Quick and easy
-
Get Tons of Entries From around the world
-
Award the best entry Download the files - Easy!