Code Samples

Ready-to-run examples covering the main hallx patterns. Each sample can be saved as a .py file and run directly. Install hallx first: pip install hallx.

In this page

  1. Basic sync check — minimal usage, no adapter
  2. Async check with OpenAI adapter — grounding context included
  3. Async check without context — demonstrates skip penalty weighting
  4. Recommendation-driven retry — retry loop with temperature adjustment
  5. Strict blocking mode — raises exception on high-risk response
  6. Feedback storage & calibration — local SQLite + calibration report
  7. Async generation + feedback loop — combined async + feedback in one loop
basic_sync.py
Minimal synchronous hallucination check. No adapter required — pass the response directly. Shows all result fields.
sync balanced no adapter
Python
from hallx import Hallx checker = Hallx(profile="balanced") result = checker.check( prompt="Summarize refund policy", response={"summary": "Refunds are allowed within 30 days."}, context=["Refunds are allowed within 30 days of purchase."], schema={ "type": "object", "properties": {"summary": {"type": "string"}}, "required": ["summary"], "additionalProperties": False, }, ) print(f"confidence : {result.confidence:.3f}") print(f"risk_level : {result.risk_level}") print(f"scores : {result.scores}") print(f"issues : {result.issues}") print(f"action : {result.recommendation['action']}")
async_openai_adapter.py
Async check using the OpenAI adapter with grounding context. The adapter generates the response; hallx scores it.
async openai grounding
Python
import asyncio import os from hallx import Hallx from hallx.adapters import OpenAIAdapter # Set OPENAI_API_KEY in your environment before running adapter = OpenAIAdapter(model="gpt-4o-mini") checker = Hallx(profile="balanced", adapter=adapter) CONTEXT = [ "Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID).", "Common side effects include nausea, stomach pain, heartburn, and dizziness.", "Serious side effects may include stomach bleeding, kidney problems, and increased " "risk of heart attack or stroke.", ] async def main(): result = await checker.check_async( prompt="What are the side effects of ibuprofen?", context=CONTEXT, ) print(f"confidence : {result.confidence:.3f}") print(f"risk_level : {result.risk_level}") print(f"grounding : {result.scores.get('grounding'):.3f}") if result.risk_level == "high": print("⚠ High risk — do not serve this response.") for issue in result.issues: print(f" • {issue}") else: print("✓ Response passed hallucination check.") asyncio.run(main())
async_openai_adapter_no_context.py
Same setup but no context provided. Demonstrates how skip penalty affects the confidence score when grounding cannot be checked.
async openai skip penalty
Python
import asyncio from hallx import Hallx from hallx.adapters import OpenAIAdapter adapter = OpenAIAdapter(model="gpt-4o-mini") checker = Hallx(profile="balanced", adapter=adapter) async def main(): # No context= or schema= — only schema + consistency signals run. # Grounding check is skipped, incurring a skip_penalty of 0.25. result = await checker.check_async( prompt="Explain the water cycle.", ) print(f"confidence : {result.confidence:.3f}") print(f"risk_level : {result.risk_level}") print(f"scores : {result.scores}") # Expect lower confidence than with context — skip penalty applied print("note: grounding was skipped (no context provided)") asyncio.run(main())
retry_strategy.py
Implements a retry loop driven by result.recommendation. On each retry, temperature is reduced using the suggested value from hallx.
retry recommendation production pattern
Python
from hallx import Hallx def call_llm(prompt: str, temperature: float = 0.7) -> str: # Replace with your actual LLM call return "Model response here" checker = Hallx(profile="balanced") prompt = "What is the boiling point of water at sea level?" context = ["Water boils at 100°C (212°F) at standard atmospheric pressure."] temperature = 0.7 MAX_RETRIES = 3 result = None for attempt in range(1, MAX_RETRIES + 1): response = call_llm(prompt, temperature=temperature) result = checker.check( prompt=prompt, response=response, context=context ) print(f"attempt {attempt}: confidence={result.confidence:.3f} action={result.recommendation['action']}") if result.recommendation["action"] == "proceed": print("✓ Passed — using this response.") break temperature = result.recommendation.get("suggested_temperature", 0.3) print(f" retrying with temperature={temperature}") else: print("All retries exhausted — escalating to human review.")
strict_mode.py
Uses strict=True to raise HallxHighRiskError instead of returning a result on high-risk responses. Recommended for automated sensitive paths.
strict exception handling gating
Python
from hallx import Hallx, HallxHighRiskError checker = Hallx(strict=True, profile="strict") prompt = "What is the maximum safe dose of acetaminophen per day?" context = [ "The maximum recommended dose of acetaminophen for adults is 4,000 mg per day.", "Exceeding this dose can cause serious liver damage.", ] try: result = checker.check( prompt=prompt, response="The maximum safe dose is 4 grams (4,000 mg) per day for adults.", context=context, ) # Only reached when risk_level is NOT "high" print(f"✓ Low risk: confidence={result.confidence:.3f}") serve_response(result) except HallxHighRiskError as e: # High-risk response blocked automatically print(f"✗ Blocked: {e}") serve_safe_fallback()
feedback_calibration.py
Records human-reviewed outcomes to a local SQLite database and generates a calibration report to tune thresholds to your real-world data.
feedback calibration sqlite
Python
from hallx import Hallx checker = Hallx( profile="balanced", feedback_db_path="/var/lib/myapp/hallx.sqlite3", ) PAIRS = [ ("What is 2+2?", "4", "correct"), ("Capital of UK?", "Paris", "hallucinated"), ("Speed of light?", "299,792,458 m/s", "correct"), ] for prompt, response, label in PAIRS: result = checker.check(prompt=prompt, response=response) checker.record_outcome( result=result, label=label, metadata={"reviewer": "qa-team"}, prompt=prompt, response_excerpt=response, ) print(f"recorded: {prompt!r} → {label}") # Generate calibration report for the last 30 days report = checker.calibration_report(window_days=30) print("\n── Calibration Report ──") print(f"hallucination_rate : {report['hallucination_rate']:.2%}") print(f"suggested_threshold : {report['suggested_threshold']:.3f}") print(f"threshold_metrics : {report['threshold_metrics']}")
async_openai_feedback_calibration.py
Combines async generation via the OpenAI adapter with immediate feedback recording in a single loop. Suitable for production batch review pipelines.
async openai feedback calibration
Python
import asyncio from hallx import Hallx from hallx.adapters import OpenAIAdapter adapter = OpenAIAdapter(model="gpt-4o-mini") checker = Hallx( profile="balanced", adapter=adapter, feedback_db_path="hallx_feedback.sqlite3", ) DATASET = [ { "prompt": "What causes the Northern Lights?", "context": ["The aurora borealis is caused by solar particles interacting with Earth's magnetic field."], "label": "correct", }, { "prompt": "Who invented the telephone?", "context": ["Alexander Graham Bell is credited with inventing the telephone in 1876."], "label": "correct", }, ] async def run_batch(): for item in DATASET: result = await checker.check_async( prompt=item["prompt"], context=item["context"], ) checker.record_outcome( result=result, label=item["label"], prompt=item["prompt"], ) print(f"{item['prompt']!r}: confidence={result.confidence:.3f} risk={result.risk_level}") report = checker.calibration_report(window_days=7) print(f"\nsuggested_threshold: {report['suggested_threshold']:.3f}") asyncio.run(run_batch())
← Back to Docs View raw files on GitHub
Copied to clipboard