Hugging Face

Quick Summary

Hugging Face provides developers with a comprehensive suite of pre-trained NLP models through its transformers library. To recap, here is how you can use Mistral's mistralai/Mistral-7B-v0.1 model through Hugging Face's transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

prompt = "My favourite condiment is"

model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
model.to(device)

generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
print(tokenizer.batch_decode(generated_ids)[0])
# "The expected output"

Evals During Fine-Tuning

deepeval integrates with Hugging Face's transformers.Trainer module through the DeepEvalHuggingFaceCallback, enabling real-time evaluation of LLM outputs during model fine-tuning for each epoch.

info

In this section, we'll walkthrough an example of fine-tuning Mistral's 7B model.

Prepare Dataset for Fine-tuning

from transformers import AutoTokenizer
from datasets import load_dataset

####################
### Load dataset ###
####################
training_dataset = load_dataset("text", data_files={"train": "train.txt"})

########################
### Tokenize dataset ###
########################
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenized_dataset = training_dataset.map(tokenize_function, batched=True)

Setup Training Arguments

from transformers import TrainingArguments
...

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=5,
    per_device_train_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
)

Initialize LLM and Trainer for Fine-Tuning

from transformers import AutoModelForCausalLM, Trainer
...

######################
### Initialize LLM ###
######################
llm = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")


##########################
### Initialize Trainer ###
##########################
trainer = Trainer(
    model=llm,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
)

Define Evaluation Criteria

Use deepeval to define an EvaluationDataset and the metrics you want to evaluate your LLM on:

from deepeval.test_case import LLMTestCaseParams
from deepeval.dataset import EvaluationDataset, Golden
from deepeval.metrics import GEval

first_golden = Golden(input="...")
second_golden = Golden(input="...")

dataset = EvaluationDataset(goldens=[first_golden, second_golden])
coherence_metric = GEval(
    name="Coherence",
    criteria="Coherence - determine if the actual output is coherent with the input.",
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
)

info

We initialize an EvaluationDataset with goldens instead of test cases since we're running inference at evaluation time.

Fine-tune and Evaluate

Then, create a DeepEvalHuggingFaceCallback:

from deepeval.integrations.hugging_face import DeepEvalHuggingFaceCallback
...

deepeval_hugging_face_callback = DeepEvalHuggingFaceCallback(
    evaluation_dataset=dataset,
    metrics=[coherence_metric],
    trainer=trainer
)

The DeepEvalHuggingFaceCallback accepts the following arguements:

metrics: the deepeval evaluation metrics you wish to leverage.
evaluation_dataset: a deepeval EvaluationDataset.
aggregation_method: a string of either 'avg', 'min', or 'max' to determine how metric scores are aggregated.
trainer: a transformers.trainer instance.
tokenizer_args: Arguments for the tokenizer.

Lastly, add deepeval_hugging_face_callback to your transformers.Trainer, and begin fine-tuning:

...
#############################
### Add DeepEval Callback ###
#############################
trainer.add_callback(deepeval_hugging_face_callback)

#########################
### Start Fine-tuning ###
#########################
trainer.train()

With this setup, evaluations will be ran on the entirety of your EvaluationDataset according to the metrics you defined at the end of each epoch.

Quick Summary​

Evals During Fine-Tuning​

Prepare Dataset for Fine-tuning​

Setup Training Arguments​

Initialize LLM and Trainer for Fine-Tuning​

Define Evaluation Criteria​

Fine-tune and Evaluate​