Azure AI Foundry customization sample datasets
This repository contains a collection of sample datasets that can be used to test and validate various customization techniques on the Azure AI Foundry. These are curated datasets created from various public HuggingFace datasets. These datasets have been created by picking top few items from the HuggingFace samples for various kinds of data: chat, multimodal, etc.
- These datasets are meant to be used as a reference to understand how to prepare training/validation data to run finetuning and other customization jobs on the Azure AI Foundry. Do not consider these samples as complete datasets for production use.
- These datasets can be used for experimental runs, and to check the end-to-end flows while running customization jobs on the Azure AI Foundry. Note, that any training jobs can incur costs on the subscription.
- These datasets are not a true representative for you to choose one customization technique over the other. To refer to techniques and when to choose what, consult the documentation.
- The datasets work best for OpenAI models, and may require adjustments for other model families like Llama, Mistral etc.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.