Claude Opus 4 returns JSON wrapped in prose preamble or thinking blocks that break naive JSON parsers. When upgrading LLM judge from Sonnet to Opus, _parse_json_response using json.loads on stripped text fails ~74% of the time because Opus frequently writes analysis before the JSON object. Simple markdown fence stripping is insufficient.
Implement a two-stage JSON extraction fallback: first try json.loads on the cleaned text (handles clean JSON and markdown-fenced JSON). On failure, scan for the first '{' and walk forward tracking brace depth until balanced, then json.loads on that substring. This handles prose preamble, thinking blocks, and trailing commentary without regex.
def _parse_json_response(text: str) -> dict:
cleaned = text.strip()
# Strip markdown fences
if cleaned.startswith('```'):
lines = cleaned.splitlines()
if lines[0].startswith('```'): lines = lines[1:]
if lines and lines[-1].strip() == '```': lines = lines[:-1]
cleaned = '\n'.join(lines).strip()
try:
return json.loads(cleaned)
except json.JSONDecodeError:
pass
# Fallback: extract first balanced JSON object
start = cleaned.find('{')
if start != -1:
depth = 0
for i in range(start, len(cleaned)):
if cleaned[i] == '{': depth += 1
elif cleaned[i] == '}':
depth -= 1
if depth == 0:
try: return json.loads(cleaned[start:i+1])
except json.JSONDecodeError: break
return {'parse_error': 'No valid JSON found', 'raw_response': text}Note: this doesn't handle JSON strings containing literal braces, but for structured LLM judge responses (which don't embed raw code), it works reliably. Parse failure rate dropped from 74% to ~0% after this fix.