List of Available Datasets

India Pathology Dataset - (IPD Brain)

The IPD-Brain Dataset is a comprehensive collection comprising 547 high-resolution H&E stained slides from 367 patients, aimed at advancing the study of glioma subtypes and immunohistochemical (IHC) biomarkers. This dataset, one of the largest of its kind in Asia, is especially notable for its focus on the diverse demographics of the Indian population. Each slide is scanned at a magnification of 40x and includes detailed clinical annotations in an xlsx file, such as patient age, sex, radiological findings, diagnosis, CNS WHO grade, and IHC biomarker status, which covers IDH1 R132H, ATRX, p53, and the proliferation index Ki67. Open for use and governed by ethical guidelines, the IPD-Brain Dataset is designed to facilitate a wide range of applications, from training machine learning models to exploring regional and ethnic variations in disease


Lupus Nephritis

Lupus Nephritis (LN) is one of the most severe manifestations of an autoimmune disease due to its potential for severe renal damage and the intricate diagnostic and classification process. The Lupus Nephritis Dataset, which is part of a larger India Pathology Dataset (IPD) cohort, stands at the forefront of our efforts to propel research into the digital histopathology domain. This dataset is distinctive in its inclusion of multi-stained whole slide images (WSIs), offering an unparalleled depth of visual data. It is among the largest datasets of its kind dedicated exclusively to lupus nephritis, marking a significant milestone in the field of nephrology and computational pathology research. The data is shared in the form of TIFF files which capture a multi-resolution (5x, 10x, 20x, 40x) view of the tissue and a CSV file contains the corresponding LN subtype diagnosis and other information. To access this dataset, please fill out this form ( and accept the terms of use. You should get access to the full dataset in 48-72 hours.



There are many datasets to facilitate the development of AI-based scoring function. One of the major limitations in the existing datasets is that they are often limited to crystal structures of the protein-ligand complex, despite the widely accepted role of protein flexibility in molecular recognition. To overcome this limitation, we have employed molecular dynamics simulation to capture the conformational changes in the protein-ligand complex, thereby helping in accurate prediction of binding affinity through Molecular Mechanics with Poisson-Boltzmann and Surface Area Method (MM-PBSA). Another key feature of our dataset is the availability of detailed interaction profiles from different energy components like Electrostatic, Van der Waals, Polar, and Non-polar solvation energies that could help in the optimization of desired components during AI-based drug designing studies. At present, our group has calculated binding affinities for 5000 protein-ligand complexes and we are already focusing on building a more diverse, unique, and effective dataset.