Skip to content

Azure-Samples/AIFoundry-Customization-Datasets

Project

Azure AI Foundry customization sample datasets

What are these datasets?

This repository contains a collection of sample datasets that can be used to test and validate various customization techniques on the Azure AI Foundry. These are curated datasets created from various public HuggingFace datasets. These datasets have been created by picking top few items from the HuggingFace samples for various kinds of data: chat, multimodal, etc.

Purpose of these datasets

  • These datasets are meant to be used as a reference to understand how to prepare training/validation data to run finetuning and other customization jobs on the Azure AI Foundry. Do not consider these samples as complete datasets for production use.
  • These datasets can be used for experimental runs, and to check the end-to-end flows while running customization jobs on the Azure AI Foundry. Note, that any training jobs can incur costs on the subscription.
  • These datasets are not a true representative for you to choose one customization technique over the other. To refer to techniques and when to choose what, consult the documentation.
  • The datasets work best for OpenAI models, and may require adjustments for other model families like Llama, Mistral etc.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

About

Contains dataset samples for customizing models on Azure AI Foundry

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •