Hallucination
The hallucination metric determines whether your LLM generates factually correct information by comparing the actual_output
to the provided context
.
If you're looking to evaluate hallucination for a RAG system, please refer to the faithfulness metric instead.
Required Arguments
To use the HallucinationMetric
, you'll have to provide the following arguments when creating an LLMTestCase
:
input
actual_output
context
Remember, input
and actual_output
are mandatory arguments to an LLMTestCase
and so are always required even if not used for evaluation.
Example
from deepeval import evaluate
from deepeval.metrics import HallucinationMetric
from deepeval.test_case import LLMTestCase
# Replace this with the actual documents that you are passing as input to your LLM.
context=["A man with blond-hair, and a brown shirt drinking out of a public water fountain."]
# Replace this with the actual output from your LLM application
actual_output="A blond drinking water in public."
test_case = LLMTestCase(
input="What was the blond doing?",
actual_output=actual_output,
context=context
)
metric = HallucinationMetric(threshold=0.5)
metric.measure(test_case)
print(metric.score)
print(metric.reason)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
There are five optional parameters when creating a HallucinationMetric
:
- [Optional]
threshold
: a float representing the maximum passing threshold, defaulted to 0.5. - [Optional]
model
: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM
. Defaulted to 'gpt-4o'. - [Optional]
include_reason
: a boolean which when set toTrue
, will include a reason for its evaluation score. Defaulted toTrue
. - [Optional]
strict_mode
: a boolean which when set toTrue
, enforces a binary metric score: 0 for perfection, 1 otherwise. It also overrides the current threshold and sets it to 0. Defaulted toFalse
. - [Optional]
async_mode
: a boolean which when set toTrue
, enables concurrent execution within themeasure()
method. Defaulted toTrue
.
How Is It Calculated?
The HallucinationMetric
score is calculated according to the following equation:
The HallucinationMetric
uses an LLM to determine, for each context in contexts
, whether there are any contradictions to the actual_output
.
Although extremely similar to the FaithfulnessMetric
, the HallucinationMetric
is calculated differently since it uses contexts
as the source of truth instead. Since contexts
is the ideal segment of your knowledge base relevant to a specific input, the degree of hallucination can be measured by the degree of which the contexts
is disagreed upon.