Integrating GPT-4 & Claude AI in Next.js: Production Guide

Real-world implementation: This guide covers the exact AI integration patterns I used in DemandAI.Pro to reduce legal document generation from 8+ hours to 5 minutes. You'll learn streaming responses, multi-model strategies, error handling, and cost optimization.

Why Use Both GPT-4 and Claude?

In DemandAI.Pro, I use both OpenAI GPT-4 and Anthropic Claude for different tasks. Here's why you might want to do the same:

GPT-4 Strengths

• Excellent for creative content generation
• Strong JSON output formatting
• Better for marketing copy
• Function calling for structured data

Claude Strengths

• Superior for long-form content
• Better at following complex instructions
• More careful/conservative responses
• Excellent for legal/medical text

Step 1: Environment Setup

# Install dependencies
npm install openai @anthropic-ai/sdk

# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Step 2: Create API Route with Streaming

Create app/api/ai/generate/route.ts:

import { OpenAI } from 'openai'
import Anthropic from '@anthropic-ai/sdk'
import { NextRequest, NextResponse } from 'next/server'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
})

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY!,
})

export const runtime = 'edge'

export async function POST(req: NextRequest) {
  try {
    const { prompt, model = 'gpt-4', temperature = 0.7 } = await req.json()

    if (model.startsWith('gpt')) {
      return handleOpenAI(prompt, model, temperature)
    } else if (model.startsWith('claude')) {
      return handleClaude(prompt, model, temperature)
    }

    return NextResponse.json(
      { error: 'Invalid model specified' },
      { status: 400 }
    )
  } catch (error) {
    console.error('AI Generation Error:', error)
    return NextResponse.json(
      { error: 'Failed to generate content' },
      { status: 500 }
    )
  }
}

async function handleOpenAI(
  prompt: string,
  model: string,
  temperature: number
) {
  const stream = await openai.chat.completions.create({
    model: model,
    messages: [{ role: 'user', content: prompt }],
    temperature: temperature,
    stream: true,
  })

  const encoder = new TextEncoder()
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || ''
        controller.enqueue(encoder.encode(text))
      }
      controller.close()
    },
  })

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  })
}

async function handleClaude(
  prompt: string,
  model: string,
  temperature: number
) {
  const stream = await anthropic.messages.stream({
    model: model,
    max_tokens: 4096,
    temperature: temperature,
    messages: [{ role: 'user', content: prompt }],
  })

  const encoder = new TextEncoder()
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        if (chunk.type === 'content_block_delta') {
          const text = chunk.delta.text || ''
          controller.enqueue(encoder.encode(text))
        }
      }
      controller.close()
    },
  })

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  })
}

Step 3: Client-Side Streaming Component

'use client'

import { useState } from 'react'

export default function AIGenerator() {
  const [prompt, setPrompt] = useState('')
  const [response, setResponse] = useState('')
  const [loading, setLoading] = useState(false)
  const [model, setModel] = useState<'gpt-4' | 'claude-3-opus'>('gpt-4')

  const handleGenerate = async () => {
    setLoading(true)
    setResponse('')

    try {
      const res = await fetch('/api/ai/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt, model, temperature: 0.7 }),
      })

      if (!res.ok) throw new Error('Generation failed')

      const reader = res.body?.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader!.read()
        if (done) break

        const text = decoder.decode(value)
        setResponse((prev) => prev + text)
      }
    } catch (error) {
      console.error('Error:', error)
      alert('Failed to generate content')
    } finally {
      setLoading(false)
    }
  }

  return (
    <div className="space-y-4">
      <select
        value={model}
        onChange={(e) => setModel(e.target.value as any)}
        className="w-full px-4 py-2 rounded-lg bg-slate-800 text-white"
      >
        <option value="gpt-4">GPT-4</option>
        <option value="gpt-4-turbo-preview">GPT-4 Turbo</option>
        <option value="claude-3-opus-20240229">Claude 3 Opus</option>
        <option value="claude-3-sonnet-20240229">Claude 3 Sonnet</option>
      </select>

      <textarea
        value={prompt}
        onChange={(e) => setPrompt(e.target.value)}
        placeholder="Enter your prompt..."
        rows={6}
        className="w-full px-4 py-2 rounded-lg bg-slate-800 text-white"
      />

      <button
        onClick={handleGenerate}
        disabled={loading || !prompt}
        className="w-full px-4 py-2 rounded-lg bg-purple-600 hover:bg-purple-700 text-white font-semibold disabled:opacity-50"
      >
        {loading ? 'Generating...' : 'Generate'}
      </button>

      {response && (
        <div className="bg-slate-800 rounded-lg p-4 whitespace-pre-wrap text-gray-300">
          {response}
        </div>
      )}
    </div>
  )
}

Step 4: Advanced Pattern - Fallback Strategy

In production, I use a fallback strategy: if one model fails or is slow, automatically try the other:

async function generateWithFallback(prompt: string) {
  const models = [
    { name: 'gpt-4-turbo-preview', timeout: 30000 },
    { name: 'claude-3-opus-20240229', timeout: 30000 },
    { name: 'gpt-4', timeout: 60000 },
  ]

  for (const { name, timeout } of models) {
    try {
      const controller = new AbortController()
      const timeoutId = setTimeout(() => controller.abort(), timeout)

      const result = await fetch('/api/ai/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt, model: name }),
        signal: controller.signal,
      })

      clearTimeout(timeoutId)

      if (result.ok) {
        return result // Success!
      }
    } catch (error) {
      console.error(`${name} failed:`, error)
      // Continue to next model
    }
  }

  throw new Error('All models failed')
}

Step 5: Cost Optimization

Token Usage Tracking

Always track token usage to control costs. Here's how:

import { encoding_for_model } from 'tiktoken'

function estimateTokens(text: string, model: string): number {
  const encoding = encoding_for_model(model as any)
  const tokens = encoding.encode(text)
  encoding.free()
  return tokens.length
}

// Before API call
const promptTokens = estimateTokens(prompt, 'gpt-4')
const estimatedCost = (promptTokens / 1000) * 0.03 // GPT-4 pricing

console.log(`Estimated cost: $${estimatedCost.toFixed(4)}`)

// Track in database
await db.aiUsage.create({
  data: {
    userId: user.id,
    model: 'gpt-4',
    promptTokens,
    completionTokens: 0, // Update after response
    cost: estimatedCost,
  },
})

Step 6: Prompt Engineering Best Practices

Use System Messages

Define the AI's role and behavior upfront:

messages: [
  {
    role: 'system',
    content: 'You are an expert legal document assistant...'
  },
  { role: 'user', content: userPrompt }
]

Provide Examples (Few-Shot Learning)

Show the AI what you want with examples:

const prompt = `
Given this medical record:
${record}

Generate a summary like this example:

Example Input: "Patient complained of back pain..."
Example Output: "L4-L5 disc herniation with radiculopathy"

Now generate summary for the above record:
`

Request Structured Output

For reliable parsing, use JSON mode:

const response = await openai.chat.completions.create({
  model: 'gpt-4-turbo-preview',
  response_format: { type: 'json_object' },
  messages: [{
    role: 'user',
    content: 'Extract medical conditions as JSON array...'
  }]
})

const data = JSON.parse(response.choices[0].message.content)

Real-World Example: DemandAI.Pro

In DemandAI.Pro, I use this exact pattern:

GPT-4 Turbo for initial demand letter generation (faster, cheaper)
Claude 3 Opus for medical record analysis (more thorough)
Fallback to GPT-4 if either fails or times out
Streaming for real-time user feedback
Token tracking to prevent cost overruns

Result: This multi-model approach reduced generation failures from 12% to less than 1%, while keeping costs 40% lower than using only GPT-4.

Error Handling & Rate Limits

async function generateWithRetry(
  prompt: string,
  maxRetries = 3
): Promise<string> {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await openai.chat.completions.create({
        model: 'gpt-4',
        messages: [{ role: 'user', content: prompt }],
      })

      return response.choices[0].message.content || ''
    } catch (error: any) {
      // Rate limit - wait and retry
      if (error.status === 429) {
        const waitTime = Math.pow(2, i) * 1000 // Exponential backoff
        await new Promise((resolve) => setTimeout(resolve, waitTime))
        continue
      }

      // Other errors - throw immediately
      if (i === maxRetries - 1) throw error
    }
  }

  throw new Error('Max retries exceeded')
}

Production Checklist

Implement streaming for better UX

Track token usage and costs in database

Use Edge Runtime for faster cold starts

Implement rate limiting to prevent abuse

Add fallback models for reliability

Cache common prompts to reduce costs

Monitor for toxic/inappropriate content

Conclusion

Integrating AI models into production apps requires more than just API calls. You need streaming for UX, fallback strategies for reliability, cost tracking for sustainability, and careful prompt engineering for quality results.

The patterns shown here power DemandAI.Pro's ability to generate professional legal documents in minutes instead of hours, with a 99%+ success rate.

Need Help with AI Integration?

I help companies integrate GPT-4, Claude, and other LLMs into their products with production-ready architecture, cost optimization, and reliability patterns.

Get Expert Help