Skip to main content

Hugging Face

Quick Summary

Hugging Face provides developers with a comprehensive suite of pre-trained NLP models through its transformers library. To recap, here is how you can use Mistral's mistralai/Mistral-7B-v0.1 model through Hugging Face's transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

prompt = "My favourite condiment is"

model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
model.to(device)

generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
print(tokenizer.batch_decode(generated_ids)[0])
# "The expected output"

Evals During Fine-Tuning

deepeval integrates with Hugging Face's transformers.Trainer module through the DeepEvalHuggingFaceCallback, enabling real-time evaluation of LLM outputs during model fine-tuning for each epoch.

info

In this section, we'll walkthrough an example of fine-tuning Mistral's 7B model.

Prepare Dataset for Fine-tuning

from transformers import AutoTokenizer
from datasets import load_dataset

####################
### Load dataset ###
####################
training_dataset = load_dataset("text", data_files={"train": "train.txt"})

########################
### Tokenize dataset ###
########################
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenized_dataset = training_dataset.map(tokenize_function, batched=True)

Setup Training Arguments

from transformers import TrainingArguments
...

training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=5,
per_device_train_batch_size=4,
warmup_steps=500,
weight_decay=0.01,
logging_dir="./logs",
logging_steps=10,
)

Initialize LLM and Trainer for Fine-Tuning

from transformers import AutoModelForCausalLM, Trainer
...

######################
### Initialize LLM ###
######################
llm = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")


##########################
### Initialize Trainer ###
##########################
trainer = Trainer(
model=llm,
args=training_args,
train_dataset=tokenized_dataset["train"],
)

Define Evaluation Criteria

Use deepeval to define an EvaluationDataset and the metrics you want to evaluate your LLM on:

from deepeval.test_case import LLMTestCaseParams
from deepeval.dataset import EvaluationDataset, Golden
from deepeval.metrics import GEval

first_golden = Golden(input="...")
second_golden = Golden(input="...")

dataset = EvaluationDataset(goldens=[first_golden, second_golden])
coherence_metric = GEval(
name="Coherence",
criteria="Coherence - determine if the actual output is coherent with the input.",
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
)
info

We initialize an EvaluationDataset with goldens instead of test cases since we're running inference at evaluation time.

Fine-tune and Evaluate

Then, create a DeepEvalHuggingFaceCallback:

from deepeval.integrations.hugging_face import DeepEvalHuggingFaceCallback
...

deepeval_hugging_face_callback = DeepEvalHuggingFaceCallback(
evaluation_dataset=dataset,
metrics=[coherence_metric],
trainer=trainer
)

The DeepEvalHuggingFaceCallback accepts the following arguements:

  • metrics: the deepeval evaluation metrics you wish to leverage.
  • evaluation_dataset: a deepeval EvaluationDataset.
  • aggregation_method: a string of either 'avg', 'min', or 'max' to determine how metric scores are aggregated.
  • trainer: a transformers.trainer instance.
  • tokenizer_args: Arguments for the tokenizer.

Lastly, add deepeval_hugging_face_callback to your transformers.Trainer, and begin fine-tuning:

...
#############################
### Add DeepEval Callback ###
#############################
trainer.add_callback(deepeval_hugging_face_callback)

#########################
### Start Fine-tuning ###
#########################
trainer.train()

With this setup, evaluations will be ran on the entirety of your EvaluationDataset according to the metrics you defined at the end of each epoch.