Dataset documentation for responsible AI: analysis of suitability and usage for health datasets
Anna Heinke, Lingling Huang, Kyongmi U. Simpkins, Fritz Gerald P. Kalaw, Apoorva Karsolia, Kiratjit Singh, Sanjay Soundarajan, Benjamin Panny, Camille Nebeker, Sally L. Baxter, Cecilia S. Lee, Aaron Y. Lee & Bhavesh Patel, on behalf of the AI-READI Consortium
npj Digital Medicine
Abstract
Artificial Intelligence (AI) is rapidly transforming healthcare, but also raising concerns about algorithmic biases that mostly stem from the training data. It is widely supported that transparent dataset documentation is key to enabling responsible AI development. Several standardized dataset documentation approaches have been established, such as Datasheet, Dataset Nutrition Label, Accountability Documentation, Healthsheet, and Data Card. However, their suitability and usage for health datasets remain unclear. In this Analysis, we compared all five approaches and evaluated their alignment with the STANDING Together Recommendations for Documentation of Health Datasets. We also investigated their real-world usage and gathered insights from generators and consumers of health datasets. Our findings reveal that none of these documentation approaches are used widely or fully suited for health datasets. We recommend developing a standard documentation approach for health datasets along with clear guidelines and automation tools to support adoption.
Citation
@Article{Heinke2026, author = {Heinke, Anna and Huang, Lingling and Simpkins, Kyongmi U. and Kalaw, Fritz Gerald P. and Karsolia, Apoorva and Singh, Kiratjit and Soundarajan, Sanjay and Panny, Benjamin and Nebeker, Camille and Baxter, Sally L. and Lee, Cecilia S. and Lee, Aaron Y. and Patel, Bhavesh and Ferryman, Kadija S. and Gasimova, Aydan and Chute, Christopher G. and Mitchell, Jessica and Bangudi, Monique S. and Lucero, Abigail and Singer, Sara J. and Bahmani, Amir and Pittock, Hanna and Alavi, Arash and McGwin, Gerald and Matthies, Dawn S. and de Sa, Virginia R. and Hurst, Samantha and York, Brittany and Evans, Nicholas and Gim, Nayoon and Owen, Julia P. and Shaffer, Jamie and hen, Yi-Ju and Cuddy, Colleen and Hallaj, Shahin and Owsley, Cynthia and Gold, Sigfried and Wang, Tao and Nayebi, Ryan A. and Drag-Erlandsen, Trym and Rodger, Majid and behalf of the AI-READI Consortium, On}, title = {Dataset documentation for responsible AI: analysis of suitability and usage for health datasets}, journal = {npj Digital Medicine}, year = {2026}, month = {May}, day = {09}, abstract = {Artificial Intelligence (AI) is rapidly transforming healthcare, but also raising concerns about algorithmic biases that mostly stem from the training data. It is widely supported that transparent dataset documentation is key to enabling responsible AI development. Several standardized dataset documentation approaches have been established, such as Datasheet, Dataset Nutrition Label, Accountability Documentation, Healthsheet, and Data Card. However, their suitability and usage for health datasets remain unclear. In this Analysis, we compared all five approaches and evaluated their alignment with the STANDING Together Recommendations for Documentation of Health Datasets. We also investigated their real-world usage and gathered insights from generators and consumers of health datasets. Our findings reveal that none of these documentation approaches are used widely or fully suited for health datasets. We recommend developing a standard documentation approach for health datasets along with clear guidelines and automation tools to support adoption.}, issn = {2398-6352}, doi = {10.1038/s41746-026-02714-2}, url={https: //doi.org/10.1038/s41746-026-02714-2}}