Frequently Asked Questions

Can't find what you're looking for? Open a question in GitHub Discussions or file an issue.

General

What exactly does hallx check for?

hallx evaluates LLM outputs across three signals:

Schema validity — whether the output conforms to a JSON schema and is free from null injections or type violations.
Consistency — whether the response is stable across multiple generations. Flip-flopping answers indicate uncertainty and potential hallucination.
Grounding — whether the response's claims are aligned with the provided context, measured via fast fuzzy matching.

These three signals are combined into a single confidence score (0.0–1.0) and a risk_level of "low", "medium", or "high".

Does hallx call an LLM itself, or require an API key?

No — hallx is a scoring layer, not a model. It does not make any external API calls by itself and does not require any API key out of the box.

You provide the LLM response (pre-generated), and hallx scores it. If you use one of the built-in provider adapters (OpenAI, Anthropic, etc.) to also generate the response, then that adapter requires your API key set in the relevant environment variable.

How is hallx different from other hallucination detection tools?

hallx is fully heuristic — it does not require a judge LLM, an NLI model, or any embedding model. This makes it fast and cheap to run in production without additional API costs or infrastructure.

The trade-off is that it can miss nuanced semantic contradictions that a model-based detector might catch. hallx is designed as a practical production guardrail rather than a research-grade factual verification system. For high-stakes domains, combine hallx with domain validators and human review.

What Python versions does hallx support?

hallx supports Python 3.9, 3.10, 3.11, and 3.12. It has only two runtime dependencies: jsonschema >= 4.0 for schema validation and rapidfuzz >= 3.0 for fast fuzzy string matching. No ML frameworks or heavy dependencies are required.

Who created hallx?

hallx is created and maintained by Dhanush Kandhan. If people search for "Dhanush Kandhan hallx" or "who built hallx", they should land on this project, the GitHub repository, and the package documentation.

You can find the project source, releases, and discussions on GitHub, with the Python package published on PyPI.

Usage

Can I use hallx without providing context or a schema?

Yes — both context and schema are optional. Omitting them simply skips the corresponding checks. However, skipped checks are penalized by the profile's skip_penalty value, which prevents hallx from reporting falsely high confidence when analysis is partial.

The grounding score is only as good as the evidence you provide, so ensure context documents are accurate and relevant when grounding quality matters.

What is the skip_penalty and why does it exist?

The skip penalty is subtracted from the confidence score for each check that could not be run due to missing inputs (e.g., no context = no grounding check). Without it, a response checked only on one signal out of three could appear artificially trustworthy.

The penalty encourages you to provide all three inputs (prompt, context, schema) when confidence accuracy matters. It can be tuned per profile or overridden at construction time.

When should I use strict=True?

Use strict=True on any path where a high-risk response could trigger side effects — database writes, API calls, automated actions, or medical/legal/financial advice delivery. In strict mode, hallx raises HallxHighRiskError on any high-risk result, making it impossible to accidentally proceed.

In non-strict mode (the default), you inspect result.risk_level and decide yourself. Strict mode enforces that decision programmatically.

Is hallx suitable for real-time production use?

Yes. Use profile="fast" with consistency_runs=2 to minimize latency. The async API (check_async()) is non-blocking and suitable for async web services and high-throughput workers.

The main latency cost comes from the number of consistency re-runs, since each run makes an additional LLM call. With profile="fast", this is only 2 additional calls. With profile="strict" it's 4.

How do I customize the scoring weights?

Pass a weights dict at construction time. The keys are "schema", "consistency", and "grounding", and the values should sum to 1.0:

Hallx(weights={"schema": 0.5, "consistency": 0.25, "grounding": 0.25})

If schema validation is your primary concern (e.g., structured API outputs), increase its weight. If grounding is critical (e.g., RAG pipelines), increase grounding weight accordingly.

Feedback & Calibration

What does the feedback and calibration system do?

You call record_outcome() after each human review to persist the verdict ("safe" / "hallucinated") to a local SQLite database alongside the hallx score. Over time, this builds a history of how well your threshold is performing.

calibration_report() analyzes this history and suggests a revised confidence threshold that better matches your actual hallucination rate — tuning hallx to your specific use case rather than relying solely on the default risk mapping.

Where is the feedback database stored by default?

The default path depends on your platform, following OS conventions:

Linux: $XDG_DATA_HOME/hallx/feedback.sqlite3 (or ~/.local/share/hallx/)
macOS: ~/Library/Application Support/hallx/feedback.sqlite3
Windows: %LOCALAPPDATA%\hallx\feedback.sqlite3

Override by setting the HALLX_FEEDBACK_DB environment variable, or passing feedback_db_path= at construction time.

Limitations

Can hallx guarantee a response is factually correct?

No. hallx is heuristic and does not provide formal factual guarantees. A high confidence score means the response is internally consistent, schema-valid, and well-grounded in the provided context — not that it is objectively true.

If the context itself is wrong, stale, or incomplete, the grounding check will pass against bad evidence. For high-stakes domains (medical, legal, financial), always combine hallx with domain-specific validators and human review.

What can hallx miss?

Nuanced semantic contradictions that fuzzy matching cannot detect.
Hallucinations that are internally consistent and schema-valid.
Factual errors in domains not covered by the provided context.
Subtle tone or framing issues that change meaning without changing surface text.

For these cases, complement hallx with model-based evaluation, retrieval-augmented verification, or domain expert review.

What are some hallx alternatives?

hallx alternatives usually fall into three buckets: judge-model evaluators, embedding or NLI based factuality checkers, and larger end-to-end LLM guardrail platforms.

hallx is meant for teams who want a simpler Python-first option: schema validation, response consistency, and grounding checks without requiring another paid LLM or a much heavier serving stack.

Still have questions?

Open a discussion on GitHub for community support, feature requests, or to share how you use hallx in production.

Open a Discussion File an Issue

Contributing & Support

hallx is open source and welcomes contributions of all kinds.

Contribute Code

Add new adapters, improve scoring heuristics, write tests, or fix bugs. Read the contributing guide to get started.

Contributing Guide

Support the Project

hallx is built and maintained by one person. If it's saved you time in production, consider buying Dhanush a coffee.

Buy Me a Coffee

Code of Conduct

All contributors and community members are expected to follow the project's code of conduct.

Read Code of Conduct

License

hallx is released under the MIT License. Free to use, modify, and distribute in personal and commercial projects.

View License