The project focuses on developing a framework to generate synthetic healthcare records that accurately preserve the causal relationships present in real data while ensuring privacy. This research aims to address the challenges of using real healthcare data in research, such as privacy concerns and strict legal and regulatory frameworks like GDPR and HIPAA.
The proposed framework will employ Generative Adversarial Networks (GANs) coupled with Structural Causal Models (SCMs) to generate synthetic data that maintains fidelity to the underlying causal structures. Additionally, privacy-preserving techniques, such as differential privacy, will be incorporated to protect the identities of individuals represented in the real data. The research will also evaluate the trade-offs between data utility, privacy preservation, and the retention of causal relationships. By balancing these critical aspects, this project seeks to enable researchers to analyze healthcare data effectively without compromising individual privacy, facilitating advancements in fields such as medical research, policy development, and healthcare innovation.