Building AI-Powered Meeting Intelligence: Lessons from EVA Meet
by Hamzah Ejaz, Software Engineer
Building EVA Meet at CogniCloud taught me invaluable lessons about integrating multiple AI services into a cohesive, enterprise-grade platform. Here's what I learned architecting a system that processes live conversations, fact-checks in real-time, and generates actionable insights.
The Challenge
When we started EVA Meet, the goal was ambitious: create a meeting intelligence platform that could:
- Transcribe conversations with near-perfect accuracy in real-time
- Fact-check claims as they're discussed
- Generate intelligent summaries and extract action items
- Deliver everything with sub-2-second latency
- Scale to enterprise requirements
Architecture Overview
Multi-AI Orchestration
The core challenge was orchestrating three different AI services:
Deepgram for Transcription
// Real-time transcription with Deepgram
const deepgram = createClient(process.env.DEEPGRAM_API_KEY)
const connection = deepgram.listen.live({
model: 'nova-2',
language: 'en',
smart_format: true,
punctuate: true,
})
connection.on('transcript', (data) => {
const transcript = data.channel.alternatives[0].transcript
if (transcript) {
// Emit to WebSocket clients
io.emit('transcription', {
text: transcript,
confidence: data.channel.alternatives[0].confidence,
timestamp: Date.now()
})
}
})
Perplexity AI for Fact-Checking
async function factCheck(claim: string) {
const response = await fetch('https://api.perplexity.ai/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.PERPLEXITY_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'pplx-70b-online',
messages: [{
role: 'user',
content: `Fact-check this claim and provide sources: "${claim}"`
}]
})
})
return await response.json()
}
GPT-4 for Summarization
async function generateSummary(transcript: string) {
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{
role: 'system',
content: 'You are an expert at analyzing meeting transcripts...'
}, {
role: 'user',
content: `Summarize this meeting and extract action items:\n\n${transcript}`
}],
temperature: 0.3,
})
return completion.choices[0].message.content
}
Real-time Infrastructure
WebSocket Architecture
Achieving sub-2-second latency required careful WebSocket design:
io.on('connection', (socket) => {
console.log('Client connected:', socket.id)
socket.on('join-meeting', async ({ meetingId, userId }) => {
socket.join(meetingId)
// Send historical transcript
const history = await getTranscriptHistory(meetingId)
socket.emit('transcript-history', history)
})
socket.on('audio-chunk', async (audioData) => {
// Stream to Deepgram
deepgramConnection.send(audioData)
})
socket.on('request-factcheck', async ({ claim, meetingId }) => {
const result = await factCheck(claim)
io.to(meetingId).emit('factcheck-result', result)
})
})
Performance Optimization
Streaming Responses Instead of waiting for complete AI responses, we stream results:
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [...],
stream: true,
})
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content
if (content) {
socket.emit('summary-chunk', content)
}
}
Key Learnings
1. API Rate Limiting & Costs
Managing multiple AI APIs requires careful rate limiting:
const rateLimiter = new RateLimiter({
openai: { requests: 500, per: 'minute' },
perplexity: { requests: 100, per: 'minute' },
deepgram: { minutes: 10000, per: 'month' }
})
async function callWithRateLimit(service, fn) {
await rateLimiter.wait(service)
return await fn()
}
2. Error Handling & Fallbacks
Enterprise systems need robust error handling:
async function transcribeWithFallback(audio) {
try {
return await deepgram.transcribe(audio)
} catch (error) {
logger.error('Deepgram failed:', error)
// Fallback to alternative service
return await whisperAPI.transcribe(audio)
}
}
3. Context Window Management
GPT-4 has token limits. We implemented smart context management:
function buildContextWindow(transcript: string, maxTokens = 7000) {
const messages = splitIntoMessages(transcript)
let context = []
let tokenCount = 0
// Take most recent messages that fit in context
for (let i = messages.length - 1; i >= 0; i--) {
const messageTokens = estimateTokens(messages[i])
if (tokenCount + messageTokens > maxTokens) break
context.unshift(messages[i])
tokenCount += messageTokens
}
return context
}
Production Challenges Solved
Scalability
- Implemented connection pooling for WebSockets
- Used Redis for distributed caching
- Deployed on Kubernetes for auto-scaling
Data Privacy
- End-to-end encryption for audio streams
- GDPR-compliant data retention policies
- Secure API key management with Vault
Reliability
- Implemented circuit breakers for AI APIs
- Added retry logic with exponential backoff
- Built health check endpoints for monitoring
Results
- < 2 second latency for real-time features
- 95%+ accuracy in transcription and action item extraction
- Zero downtime during peak usage
- Served 100+ concurrent meetings without degradation
Takeaways for Developers
- Start simple: Don't optimize prematurely. Get it working, then make it fast.
- Monitor everything: AI APIs can fail in unexpected ways. Comprehensive logging saved us multiple times.
- Test with real data: Mock data doesn't reveal edge cases in natural language processing.
- Budget wisely: AI API costs can spiral quickly. Implement usage tracking early.
- User feedback loops: The best improvements came from watching how users actually used the features.
Next Steps
We're exploring:
- Custom LLM fine-tuning for domain-specific meetings
- Multi-language support
- Integration with more meeting platforms
- On-premise deployment for security-sensitive clients
Building EVA Meet was one of the most challenging and rewarding projects of my career. The intersection of AI and real-time systems presents unique challenges, but the impact on productivity is transformative.
Want to discuss AI integration strategies? Get in touch.