Can't find what you're looking for? Open a question in GitHub Discussions or file an issue.
hallx evaluates LLM outputs across three signals:
These three signals are combined into a single confidence score (0.0–1.0) and a risk_level of "low", "medium", or "high".
No — hallx is a scoring layer, not a model. It does not make any external API calls by itself and does not require any API key out of the box.
You provide the LLM response (pre-generated), and hallx scores it. If you use one of the built-in provider adapters (OpenAI, Anthropic, etc.) to also generate the response, then that adapter requires your API key set in the relevant environment variable.
hallx is fully heuristic — it does not require a judge LLM, an NLI model, or any embedding model. This makes it fast and cheap to run in production without additional API costs or infrastructure.
The trade-off is that it can miss nuanced semantic contradictions that a model-based detector might catch. hallx is designed as a practical production guardrail rather than a research-grade factual verification system. For high-stakes domains, combine hallx with domain validators and human review.
hallx supports Python 3.9, 3.10, 3.11, and 3.12. It has only two runtime dependencies: jsonschema >= 4.0 for schema validation and rapidfuzz >= 3.0 for fast fuzzy string matching. No ML frameworks or heavy dependencies are required.
hallx is created and maintained by Dhanush Kandhan. If people search for "Dhanush Kandhan hallx" or "who built hallx", they should land on this project, the GitHub repository, and the package documentation.
You can find the project source, releases, and discussions on GitHub, with the Python package published on PyPI.
Yes — both context and schema are optional. Omitting them simply skips the corresponding checks. However, skipped checks are penalized by the profile's skip_penalty value, which prevents hallx from reporting falsely high confidence when analysis is partial.
The grounding score is only as good as the evidence you provide, so ensure context documents are accurate and relevant when grounding quality matters.
skip_penalty and why does it exist?
The skip penalty is subtracted from the confidence score for each check that could not be run due to missing inputs (e.g., no context = no grounding check). Without it, a response checked only on one signal out of three could appear artificially trustworthy.
The penalty encourages you to provide all three inputs (prompt, context, schema) when confidence accuracy matters. It can be tuned per profile or overridden at construction time.
strict=True?
Use strict=True on any path where a high-risk response could trigger side effects — database writes, API calls, automated actions, or medical/legal/financial advice delivery. In strict mode, hallx raises HallxHighRiskError on any high-risk result, making it impossible to accidentally proceed.
In non-strict mode (the default), you inspect result.risk_level and decide yourself. Strict mode enforces that decision programmatically.
Yes. Use profile="fast" with consistency_runs=2 to minimize latency. The async API (check_async()) is non-blocking and suitable for async web services and high-throughput workers.
The main latency cost comes from the number of consistency re-runs, since each run makes an additional LLM call. With profile="fast", this is only 2 additional calls. With profile="strict" it's 4.
Pass a weights dict at construction time. The keys are "schema", "consistency", and "grounding", and the values should sum to 1.0:
Hallx(weights={"schema": 0.5, "consistency": 0.25, "grounding": 0.25})
If schema validation is your primary concern (e.g., structured API outputs), increase its weight. If grounding is critical (e.g., RAG pipelines), increase grounding weight accordingly.
You call record_outcome() after each human review to persist the verdict ("safe" / "hallucinated") to a local SQLite database alongside the hallx score. Over time, this builds a history of how well your threshold is performing.
calibration_report() analyzes this history and suggests a revised confidence threshold that better matches your actual hallucination rate — tuning hallx to your specific use case rather than relying solely on the default risk mapping.
The default path depends on your platform, following OS conventions:
$XDG_DATA_HOME/hallx/feedback.sqlite3 (or ~/.local/share/hallx/)~/Library/Application Support/hallx/feedback.sqlite3%LOCALAPPDATA%\hallx\feedback.sqlite3Override by setting the HALLX_FEEDBACK_DB environment variable, or passing feedback_db_path= at construction time.
No. hallx is heuristic and does not provide formal factual guarantees. A high confidence score means the response is internally consistent, schema-valid, and well-grounded in the provided context — not that it is objectively true.
If the context itself is wrong, stale, or incomplete, the grounding check will pass against bad evidence. For high-stakes domains (medical, legal, financial), always combine hallx with domain-specific validators and human review.
For these cases, complement hallx with model-based evaluation, retrieval-augmented verification, or domain expert review.
hallx alternatives usually fall into three buckets: judge-model evaluators, embedding or NLI based factuality checkers, and larger end-to-end LLM guardrail platforms.
hallx is meant for teams who want a simpler Python-first option: schema validation, response consistency, and grounding checks without requiring another paid LLM or a much heavier serving stack.
Open a discussion on GitHub for community support, feature requests, or to share how you use hallx in production.
hallx is open source and welcomes contributions of all kinds.
Add new adapters, improve scoring heuristics, write tests, or fix bugs. Read the contributing guide to get started.
Contributing Guidehallx is built and maintained by one person. If it's saved you time in production, consider buying Dhanush a coffee.
Buy Me a CoffeeAll contributors and community members are expected to follow the project's code of conduct.
Read Code of Conducthallx is released under the MIT License. Free to use, modify, and distribute in personal and commercial projects.
View License