Docs: https://phasellm.com/docs/phasellm/eval.html
This project provides a unified framework to test generative language models on a large number of different evaluation tasks.
Features:
- 200+ tasks implemented. See the task-table for a complete list.
- Support for models loaded via transformers (including quantization via AutoGPTQ), - GPT-NeoX, and Megatron-DeepSpeed, with a flexible tokenization-agnostic interface.
- Support for commercial APIs including OpenAI, goose.ai, and TextSynth.
- Support for evaluation on adapters (e.g. LoRa) supported in HuggingFace’s PEFT library.
- Evaluating with publicly available prompts ensures reproducibility and comparability between papers.
- Task versioning to ensure reproducibility when tasks are updated.
You must log in or # to comment.