Hugging Face
Quick Summary
Hugging Face provides developers with a comprehensive suite of pre-trained NLP models through its transformers
library. To recap, here is how you can use Mistral's mistralai/Mistral-7B-v0.1
model through Hugging Face's transformers
library:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
prompt = "My favourite condiment is"
model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
model.to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
print(tokenizer.batch_decode(generated_ids)[0])
# "The expected output"
Evals During Fine-Tuning
deepeval
integrates with Hugging Face's transformers.Trainer
module through the DeepEvalHuggingFaceCallback
, enabling real-time evaluation of LLM outputs during model fine-tuning for each epoch.
In this section, we'll walkthrough an example of fine-tuning Mistral's 7B model.
Prepare Dataset for Fine-tuning
from transformers import AutoTokenizer
from datasets import load_dataset
####################
### Load dataset ###
####################
training_dataset = load_dataset("text", data_files={"train": "train.txt"})
########################
### Tokenize dataset ###
########################
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenized_dataset = training_dataset.map(tokenize_function, batched=True)
Setup Training Arguments
from transformers import TrainingArguments
...
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=5,
per_device_train_batch_size=4,
warmup_steps=500,
weight_decay=0.01,
logging_dir="./logs",
logging_steps=10,
)
Initialize LLM and Trainer for Fine-Tuning
from transformers import AutoModelForCausalLM, Trainer
...
######################
### Initialize LLM ###
######################
llm = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
##########################
### Initialize Trainer ###
##########################
trainer = Trainer(
model=llm,
args=training_args,
train_dataset=tokenized_dataset["train"],
)
Define Evaluation Criteria
Use deepeval
to define an EvaluationDataset
and the metrics you want to evaluate your LLM on:
from deepeval.test_case import LLMTestCaseParams
from deepeval.dataset import EvaluationDataset, Golden
from deepeval.metrics import GEval
first_golden = Golden(input="...")
second_golden = Golden(input="...")
dataset = EvaluationDataset(goldens=[first_golden, second_golden])
coherence_metric = GEval(
name="Coherence",
criteria="Coherence - determine if the actual output is coherent with the input.",
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
)
We initialize an EvaluationDataset
with goldens instead of test cases since we're running inference at evaluation time.
Fine-tune and Evaluate
Then, create a DeepEvalHuggingFaceCallback
:
from deepeval.integrations.hugging_face import DeepEvalHuggingFaceCallback
...
deepeval_hugging_face_callback = DeepEvalHuggingFaceCallback(
evaluation_dataset=dataset,
metrics=[coherence_metric],
trainer=trainer
)
The DeepEvalHuggingFaceCallback
accepts the following arguements:
metrics
: thedeepeval
evaluation metrics you wish to leverage.evaluation_dataset
: adeepeval
EvaluationDataset
.aggregation_method
: a string of either 'avg', 'min', or 'max' to determine how metric scores are aggregated.trainer
: atransformers.trainer
instance.tokenizer_args
: Arguments for the tokenizer.
Lastly, add deepeval_hugging_face_callback
to your transformers.Trainer
, and begin fine-tuning:
...
#############################
### Add DeepEval Callback ###
#############################
trainer.add_callback(deepeval_hugging_face_callback)
#########################
### Start Fine-tuning ###
#########################
trainer.train()
With this setup, evaluations will be ran on the entirety of your EvaluationDataset
according to the metrics you defined at the end of each epoch
.