Chat debugging guide
Quick API test (curl)
To verify the chat API without the UI (e.g. after rate limits or to check answer content):
curl -s -X POST http://localhost:3000/api/chat -H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What is Luis good at?"}]}'
Expect a 200 response with plain text or streamed chunks. The UI strips [Reasoning] and [Answer] markers and shows only the answer. For 429, wait a minute (rate limit) or use a different IP.
Where to look
0. War Room + Cloud Logging (when chat shows "Failed to get response")
- War Room (
/war-room): The Recent errors section lists the last 20 server-recorded errors (endpoint, status, message, trace_id). Use Explain with AI on an error row to send it to the same Inferencia endpoint and get a plain-language explanation and fix suggestions. - Cloud Logging (below): Use the trace_id from the error (or from the response header
X-Trace-Id) to find the full request in Logs Explorer.
In GCP Console → Logging → Logs Explorer, filter by:
resource.type="cloud_run_revision"andresource.labels.service_name="YOUR_SERVICE"- Add:
jsonPayload.endpoint="/api/chat"ortextPayload=~"Chat"to see chat requests and errors.
Check for 429 (rate limit), 503 (LLM/config or inference error), and severity=ERROR for stack traces. Trace IDs are in jsonPayload.trace_id and in the response header X-Trace-Id for correlating with client errors.
1. Terminal (Next.js dev server)
When you send a message, you should see logs like:
[chat] POST /api/chat received[chat] rateLimit check—allowedshould betrueif limits are disabled[chat] lastMessage length/messages count[chat] cache HIT— only if the query was cached (then no model is called)[chat] models order— e.g.['local', 'tunnel'](localhost first when running locally)[chat] trying provider local baseURL http://localhost:11973/v1 attempt 1[chat] first chunk received—done,size; if you see this, the provider responded[chat] response from: local— success path- Or:
[chat] provider error—label,name,message,isRetryable; then[chat] local failed, trying next modeland the same for tunnel - Or:
[chat] all models exhausted— both local and tunnel failed
If the tunnel returns an HTML error page, you’ll see either:
[chat] provider returned error page, first 120 chars: <!DOCTYPE...and then try next model, or[chat] provider errorwith message containing "Not Found" / "offline" and then try next model.
2. Browser console (F12 → Console)
When you send a message:
[chat:client] sending request, message: <your text>[chat:client] response—ok,status,contentType- If
ok: false, you’ll see[chat:client] error responseand the error message in the chat. - If
contentTypeisapplication/json, the API returned an error (e.g. 503).
- If
- If streaming:
[chat:client] stream done—chunkCount,length,usedDataStream - If something’s wrong:
[chat:client] got N chunks but no content parsedor[chat:client] catchwith the error.
3. Stream format fix
The client now supports both:
- Data stream: lines like
0:{"text":"Hello"}(parsed and appended). - Plain text stream: raw text chunks from
toTextStreamResponse()(appended as-is).
So you should see a reply in the UI when the API returns either format.
Quick checklist
-
No reply at all
- Terminal: Do you see
[chat] trying provider localthenprovider errorfor both local and tunnel, thenall models exhausted? → Start the local model server (e.g. port 11973) or fix the tunnel. - Terminal: Do you see
[chat] first chunk receivedand[chat] response from: localbut still no reply in the UI? → Check browser console for[chat:client] responseandstream done; ensurecontentTypeand chunk handling look correct. - Browser:
[chat:client] responsehasok: false? → ReaderrorDataand show the user themessage(already shown in the chat bubble).
- Terminal: Do you see
-
Rate limit / 429
- Rate limits are enabled in production. For local debugging, set
RATE_LIMITS_DISABLED=truein.env.localto bypasscheckRateLimit.
- Rate limits are enabled in production. For local debugging, set
-
Tunnel offline (ngrok 404)
- The route tries local first when running locally. If the local server is not running, it then tries the tunnel; if the tunnel is offline, you get 503 and the “AI model temporarily unavailable” message in the chat.