Medical Device Makers Explore Synthetic Data Use

AI plays an important role in generating synthetic data.

AI plays an important role in generating synthetic data.

Image courtesy of Getty Images.

Data is fuel In the age of data science and AI (artificial intelligence). It drives machine learning (ML) algorithms that in turn power a new breed of products, from autonomous cars to embedded medical devices.

“The future of medical devices is set to change once again with the help of AI and ML . . . ,” according to Medical Device Network, a portal covering 40+ B2B websites.

The FDA reportedly issued accelerated approvals of medical devices with AI in 2022, and the trend was set to continue through 2023. As more AI-based tools and devices gain approval, providers are anticipated to include AI in their everyday practice (“Artificial intelligence in medical devices,” January 18, 2023).

But in the medical field, data is hard to come by. It includes private details about patients, which demands extra layers of protection. Sometimes, there isn’t enough data for ML, prompting device makers to use synthetic data.

During the COVID-19 pandemic, Washington University investigators relied on MDClone, a self-service, big data analytics system for healthcare teams, to generate the necessary data.

“Rather than take traditional steps to conceal the identities of real patients in the dataset, the software instead produces a new set of simulated patients that, in aggregate, recreates the characteristics of the real patients, such as measures of body mass index, blood pressure and kidney function,” reported Washington University School of Medicine in St. Louis in an article detailing the process (“Synthetic data mimics real health-care data without patient-privacy concerns,” June 2, 2021,

In autonomous vehicle (AV) development, a hybrid approach that uses a mix of real-world data and synthetic data (for example, real-world data captured by real sensors, augmented with synthetically generated sensor data) is becoming the norm. It’s reasonable to anticipate the same trend in healthcare. In this article, we speak to simulation software maker Ansys to understand the practice’s pros and cons of this.

Is Synthetic Data Real?

“Access to most data in healthcare is tightly controlled, which may limit innovation, development and efficient implementation of new research, products, services or systems. Using synthetic data is one of the many innovative ways that can allow organizations to share datasets with broader users,” writes Contributing Author Aldren Gonzales, from the Office of the Assistant Secretary Planning and Evaluation, U.S. Department of Health and Human Services, for a paper titled “Synthetic data in health care: A narrative review” from the National Library of Medicine (

The term synthetic implies something created in a lab, raising questions about its reliability.

“We are all different,” Mark Palmer, senior chief technologist of Healthcare, Ansys, explains. “Your femur bone is different than mine or one from any of my colleagues or any of your friends. Yet, they are all similar. From a small sample (a couple hundred) of femur bones available in medical imaging, it is numerically possible to extract a model describing the shape of any femur bones of the human population living today, in the past or in the future. We could therefore create an infinite library of patient-realistic virtual femur bones that could represent the femur bone of any human and provide valuable synthetic data for in silico testing.”

While the data available from a real sample pool may not be enough for ML, the use of simulation and mechanistic models, he explains, allows device makers to generate data with a wider variation beyond the actual sample pool. It’s one way to acquire more data in the areas where privacy concerns limit data collection.

“It is important to note that in the absence of supplemental methodologies like physics-informed neural networks, the synthetic data can only be relied upon if it is within the bounds of the data used to train the AI,” Palmer says.

Regulations on the Way

In the European Union, efforts are on the way to better define and regulate the use of synthetic data. Under the Horizon Europe Framework Program, a proposal called “Maximizing the potential of synthetic data generation in healthcare applications” calls for academic and industrial researchers to have “access to relevant, robust, and generalizable synthetic data generation methodologies, including open source when relevant, to create and share pools of synthetic patient data in specific use cases.”

In November 2023, the FDA launched The Credibility of Computational Models Program in the FDA’s Center for Devices and Radiological Health (CDRH). The agency said the goal is to help ensure the credibility of computational models used in medical device development and regulatory submissions.

“In the subdomain of synthetic data generated by simulation, mechanistic and causal models, there has been a lot of collaboration across the medical device ecosystem including regulators, industry, academia and standards organizations to establish regulations, guidelines and best practices,” Palmer says. “For synthetic data generated by AI, this is still in process.”

AI’s Role

Pattern recognition is an area where graphics processing unit-accelerated AI algorithms have shown to be quite adaptable. It’s a feature that autonomous vehicle developers heavily rely on. The same feature is also expected to accelerate synthetic data generation for medical device makers.

“For example, an AI can be trained to automate the extraction of one or more anatomical structures from medical images,” says Palmer. “This process could be applied to hundreds or thousands of patient images to generate an anatomical atlas (library) that could then be used to create a virtually infinite synthetic population of anatomies that is disconnected from patient-specific data, thus addressing patient privacy concerns.”

Cautionary Notes

In the paper titled “Harnessing the power of synthetic data in healthcare: innovation, application, and privacy” published in npj Digital Medicine (, October 9, 2023, authors Mauro Giuffrè and Dennis L. Shung point out, “[Using synthetic data] carries concerns such as the risk of bias amplification, low interpretability, and an absence of robust methods for auditing data quality . . . For instance, if a synthetic dataset is trained on a dataset of facial images that majorly includes people from a certain ethnicity, the synthetic images generated will naturally reflect this imbalance, thus perpetuating the initial bias.”

Palmer says, “In all cases, the gold standard is the real-world evidence. “At the end of the day, computational predictions are validated either with hierarchical validation (e.g. MRI safety simulation) or directly with real-world evidence.”

More Ansys Coverage

Share This Article

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.

About the Author

Kenneth Wong's avatar
Kenneth Wong

Kenneth Wong is Digital Engineering’s resident blogger and senior editor. Email him at [email protected] or share your thoughts on this article at

      Follow DE