Integrating GPT-4 & Claude AI in Next.js: Production Guide
Real-world implementation: This guide covers the exact AI integration patterns I used in DemandAI.Pro to reduce legal document generation from 8+ hours to 5 minutes. You'll learn streaming responses, multi-model strategies, error handling, and cost optimization.
Why Use Both GPT-4 and Claude?
In DemandAI.Pro, I use both OpenAI GPT-4 and Anthropic Claude for different tasks. Here's why you might want to do the same:
GPT-4 Strengths
- • Excellent for creative content generation
- • Strong JSON output formatting
- • Better for marketing copy
- • Function calling for structured data
Claude Strengths
- • Superior for long-form content
- • Better at following complex instructions
- • More careful/conservative responses
- • Excellent for legal/medical text
Step 1: Environment Setup
# Install dependencies
npm install openai @anthropic-ai/sdk
# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...Step 2: Create API Route with Streaming
Create app/api/ai/generate/route.ts:
import { OpenAI } from 'openai'
import Anthropic from '@anthropic-ai/sdk'
import { NextRequest, NextResponse } from 'next/server'
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
})
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
})
export const runtime = 'edge'
export async function POST(req: NextRequest) {
try {
const { prompt, model = 'gpt-4', temperature = 0.7 } = await req.json()
if (model.startsWith('gpt')) {
return handleOpenAI(prompt, model, temperature)
} else if (model.startsWith('claude')) {
return handleClaude(prompt, model, temperature)
}
return NextResponse.json(
{ error: 'Invalid model specified' },
{ status: 400 }
)
} catch (error) {
console.error('AI Generation Error:', error)
return NextResponse.json(
{ error: 'Failed to generate content' },
{ status: 500 }
)
}
}
async function handleOpenAI(
prompt: string,
model: string,
temperature: number
) {
const stream = await openai.chat.completions.create({
model: model,
messages: [{ role: 'user', content: prompt }],
temperature: temperature,
stream: true,
})
const encoder = new TextEncoder()
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content || ''
controller.enqueue(encoder.encode(text))
}
controller.close()
},
})
return new Response(readable, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
})
}
async function handleClaude(
prompt: string,
model: string,
temperature: number
) {
const stream = await anthropic.messages.stream({
model: model,
max_tokens: 4096,
temperature: temperature,
messages: [{ role: 'user', content: prompt }],
})
const encoder = new TextEncoder()
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (chunk.type === 'content_block_delta') {
const text = chunk.delta.text || ''
controller.enqueue(encoder.encode(text))
}
}
controller.close()
},
})
return new Response(readable, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
})
}
Step 3: Client-Side Streaming Component
'use client'
import { useState } from 'react'
export default function AIGenerator() {
const [prompt, setPrompt] = useState('')
const [response, setResponse] = useState('')
const [loading, setLoading] = useState(false)
const [model, setModel] = useState<'gpt-4' | 'claude-3-opus'>('gpt-4')
const handleGenerate = async () => {
setLoading(true)
setResponse('')
try {
const res = await fetch('/api/ai/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt, model, temperature: 0.7 }),
})
if (!res.ok) throw new Error('Generation failed')
const reader = res.body?.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader!.read()
if (done) break
const text = decoder.decode(value)
setResponse((prev) => prev + text)
}
} catch (error) {
console.error('Error:', error)
alert('Failed to generate content')
} finally {
setLoading(false)
}
}
return (
<div className="space-y-4">
<select
value={model}
onChange={(e) => setModel(e.target.value as any)}
className="w-full px-4 py-2 rounded-lg bg-slate-800 text-white"
>
<option value="gpt-4">GPT-4</option>
<option value="gpt-4-turbo-preview">GPT-4 Turbo</option>
<option value="claude-3-opus-20240229">Claude 3 Opus</option>
<option value="claude-3-sonnet-20240229">Claude 3 Sonnet</option>
</select>
<textarea
value={prompt}
onChange={(e) => setPrompt(e.target.value)}
placeholder="Enter your prompt..."
rows={6}
className="w-full px-4 py-2 rounded-lg bg-slate-800 text-white"
/>
<button
onClick={handleGenerate}
disabled={loading || !prompt}
className="w-full px-4 py-2 rounded-lg bg-purple-600 hover:bg-purple-700 text-white font-semibold disabled:opacity-50"
>
{loading ? 'Generating...' : 'Generate'}
</button>
{response && (
<div className="bg-slate-800 rounded-lg p-4 whitespace-pre-wrap text-gray-300">
{response}
</div>
)}
</div>
)
}
Step 4: Advanced Pattern - Fallback Strategy
In production, I use a fallback strategy: if one model fails or is slow, automatically try the other:
async function generateWithFallback(prompt: string) {
const models = [
{ name: 'gpt-4-turbo-preview', timeout: 30000 },
{ name: 'claude-3-opus-20240229', timeout: 30000 },
{ name: 'gpt-4', timeout: 60000 },
]
for (const { name, timeout } of models) {
try {
const controller = new AbortController()
const timeoutId = setTimeout(() => controller.abort(), timeout)
const result = await fetch('/api/ai/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt, model: name }),
signal: controller.signal,
})
clearTimeout(timeoutId)
if (result.ok) {
return result // Success!
}
} catch (error) {
console.error(`${name} failed:`, error)
// Continue to next model
}
}
throw new Error('All models failed')
}
Step 5: Cost Optimization
Token Usage Tracking
Always track token usage to control costs. Here's how:
import { encoding_for_model } from 'tiktoken'
function estimateTokens(text: string, model: string): number {
const encoding = encoding_for_model(model as any)
const tokens = encoding.encode(text)
encoding.free()
return tokens.length
}
// Before API call
const promptTokens = estimateTokens(prompt, 'gpt-4')
const estimatedCost = (promptTokens / 1000) * 0.03 // GPT-4 pricing
console.log(`Estimated cost: $${estimatedCost.toFixed(4)}`)
// Track in database
await db.aiUsage.create({
data: {
userId: user.id,
model: 'gpt-4',
promptTokens,
completionTokens: 0, // Update after response
cost: estimatedCost,
},
})
Step 6: Prompt Engineering Best Practices
Use System Messages
Define the AI's role and behavior upfront:
messages: [
{
role: 'system',
content: 'You are an expert legal document assistant...'
},
{ role: 'user', content: userPrompt }
]Provide Examples (Few-Shot Learning)
Show the AI what you want with examples:
const prompt = `
Given this medical record:
${record}
Generate a summary like this example:
Example Input: "Patient complained of back pain..."
Example Output: "L4-L5 disc herniation with radiculopathy"
Now generate summary for the above record:
`Request Structured Output
For reliable parsing, use JSON mode:
const response = await openai.chat.completions.create({
model: 'gpt-4-turbo-preview',
response_format: { type: 'json_object' },
messages: [{
role: 'user',
content: 'Extract medical conditions as JSON array...'
}]
})
const data = JSON.parse(response.choices[0].message.content)
Real-World Example: DemandAI.Pro
In DemandAI.Pro, I use this exact pattern:
- GPT-4 Turbo for initial demand letter generation (faster, cheaper)
- Claude 3 Opus for medical record analysis (more thorough)
- Fallback to GPT-4 if either fails or times out
- Streaming for real-time user feedback
- Token tracking to prevent cost overruns
Result: This multi-model approach reduced generation failures from 12% to less than 1%, while keeping costs 40% lower than using only GPT-4.
Error Handling & Rate Limits
async function generateWithRetry(
prompt: string,
maxRetries = 3
): Promise<string> {
for (let i = 0; i < maxRetries; i++) {
try {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
})
return response.choices[0].message.content || ''
} catch (error: any) {
// Rate limit - wait and retry
if (error.status === 429) {
const waitTime = Math.pow(2, i) * 1000 // Exponential backoff
await new Promise((resolve) => setTimeout(resolve, waitTime))
continue
}
// Other errors - throw immediately
if (i === maxRetries - 1) throw error
}
}
throw new Error('Max retries exceeded')
}
Production Checklist
Conclusion
Integrating AI models into production apps requires more than just API calls. You need streaming for UX, fallback strategies for reliability, cost tracking for sustainability, and careful prompt engineering for quality results.
The patterns shown here power DemandAI.Pro's ability to generate professional legal documents in minutes instead of hours, with a 99%+ success rate.
Need Help with AI Integration?
I help companies integrate GPT-4, Claude, and other LLMs into their products with production-ready architecture, cost optimization, and reliability patterns.
Get Expert Help