Real-World Evidence Sources for Drug Safety: Registries and Claims Data Explained

Posted by Paul Fletcher
- 25 December 2025 0 Comments

Real-World Evidence Sources for Drug Safety: Registries and Claims Data Explained

Safety Signal Detection Calculator

Input Parameters

Enter values to calculate the minimum patient population needed to detect adverse events

in
Example: 1 in 10,000 means an event rate of 1/10,000

How It Works

Based on FDA guidance (2022), registry data requires 500,000 patients to detect a signal compared to 1 million for claims data when data completeness is 87% vs 52%.

Key Insights from Article:
  • Registries have 87% completeness for lab results vs 52% for claims data
  • Claims data requires 2x the population size to detect the same signal
  • Combining both reduces false positives by 40%

Registry Data requires -- patients

Claims Data requires -- patients

Ratio: Claims data requires --x more patients than registries

When a new drug hits the market, the real test doesn’t start until after approval. Clinical trials involve thousands of patients over a few years. But once millions of people start taking it, rare side effects, long-term risks, and interactions with other conditions can show up-things trials simply can’t catch. That’s where real-world evidence comes in. Two of the most powerful tools for tracking drug safety outside of trials are patient registries and claims data. Together, they give regulators, doctors, and drugmakers a clearer picture of how medications behave in the real world-not just in controlled settings.

What Exactly Is Real-World Evidence?

Real-world evidence (RWE) isn’t lab data or controlled trial results. It’s what happens when drugs are used by real people in everyday life. This includes everything from how often someone takes their pill, to whether they end up in the hospital, to how their blood pressure changes over five years. The U.S. Food and Drug Administration (FDA) officially defined RWE in 2018 as clinical evidence derived from real-world data (RWD)-information collected during routine healthcare.

This isn’t new. The FDA has used hospital records and pharmacy logs to spot safety issues since the 1980s. But it wasn’t until the 21st Century Cures Act in 2016 that RWE became a formal part of drug regulation. Since then, the FDA has approved 12 drugs or new uses between 2017 and 2021 where RWE played a key role. Five of those approvals relied directly on claims data or registry data.

In Europe, the European Medicines Agency (EMA) launched Darwin EU in 2021-a network connecting health databases across 15 countries to monitor medicine safety in real time. Today, it covers over 120 million people. This shift means drug safety isn’t just about waiting for doctors to report bad reactions. It’s about proactively mining massive datasets to find hidden risks before they become public health crises.

How Patient Registries Work

Patient registries are structured databases that collect detailed, standardized information about people with specific diseases or those using certain drugs. Think of them as long-term medical diaries, but for groups of patients. There are two main types: disease registries and product registries.

Disease registries track people with conditions like cystic fibrosis, Parkinson’s, or cancer. They record everything: age, genetic mutations, lab results, imaging scans, treatments tried, and how patients feel over time. Product registries focus on people taking a specific drug-recording not just the medication, but also side effects, dosing changes, and outcomes.

One powerful example is the Cystic Fibrosis Foundation Patient Registry. It helped identify that ivacaftor, a drug for cystic fibrosis, worked better-and had fewer side effects-in patients with certain genetic mutations. That detail never showed up in the original trials because those patients were too rare to be included in large numbers.

Registries are rich in clinical detail. They often include lab values, imaging reports, and even patient-reported symptoms like fatigue or pain levels. According to ISPOR, registry data has 87% completeness for lab results, compared to just 52% in claims data. That’s why they’re so valuable for spotting subtle safety signals.

But they’re not perfect. Most registries are voluntary, so participation rates range from 60% to 80%. That means the data might not represent everyone-especially those who can’t access specialty clinics or don’t speak the local language. And they’re expensive. Setting up a national disease registry can cost $1.2 million to $2.5 million upfront, with $300,000 to $600,000 a year just to keep it running. About 35% of academic registries shut down within five years due to funding gaps.

What Claims Data Tells Us

Claims data is different. It’s not collected for research-it’s generated every time a doctor bills an insurance company. Every prescription filled, every hospital stay, every lab test ordered gets coded and sent to a payer. That’s the raw material for claims data.

It includes ICD-10 diagnosis codes, CPT procedure codes, and NDC codes for medications. Because it’s tied to billing, it’s nearly complete for inpatient visits (95-98% coverage) and covers millions of people. IBM MarketScan tracks 200 million lives. Optum has 100 million. Medicare alone covers over 60 million Americans, with records going back 15+ years.

This makes claims data perfect for spotting rare events. If a drug causes a heart attack in 1 out of 10,000 users, you need a huge pool to see it. In 2015, the FDA analyzed 1.2 million Medicare beneficiaries over five years to check if entacapone (used for Parkinson’s) raised heart risks. They found no clear link. In 2014, they used 850,000 records to review olmesartan for cardiovascular risks in diabetics.

Claims data also helped approve palbociclib (Ibrance) for new patient groups in 2019. The FDA looked at claims, electronic health records, and safety reports to confirm it was safe in older patients and those with other chronic conditions.

But here’s the catch: claims data is thin on clinical detail. It tells you someone was diagnosed with diabetes, but not their HbA1c level. It shows a prescription for a blood thinner, but not whether the patient actually took it. Lab values? Only 45-60% complete. Patient-reported outcomes? Almost never included.

And coding errors are common. The Agency for Healthcare Research and Quality (AHRQ) estimates 15-20% of diagnosis codes are wrong-either due to human error, billing incentives, or vague documentation. That leads to false alarms. One study found 22% of initial safety signals from claims data turned out to be false positives after clinical review.

A city built from billing codes with alert balloons and FDA monitoring shield.

Registries vs. Claims Data: The Trade-Offs

You can’t pick one over the other. Each has strengths the other lacks.

Registries give depth. They’re like a high-resolution MRI of a small group. They capture the nuances: how a drug affects someone with liver disease, or whether a side effect worsens over time. But they cover maybe 1,000 to 50,000 patients at most.

Claims data gives breadth. It’s a satellite view of millions. It can catch a rare kidney injury in 1 in 50,000 users. But it can’t tell you why that injury happened.

For rare adverse events, registries are more efficient. Because their data is cleaner and more complete, you need about 500,000 patients to detect a signal. With claims data, you need a million. That’s because of missing lab values, unclear diagnoses, and coding gaps.

The FDA’s 2022 guidance says you can’t just run a statistical model on claims data and call it evidence. You have to account for biases-like “immortal time bias,” where patients who survive the first month get counted as “safe,” even if they later have a reaction. Proper methods can reduce this bias by 35-50%.

The best approach? Combine them. In June 2023, the International Council for Harmonisation (ICH) released new guidelines recommending hybrid analysis. When you cross-check a signal from claims data with registry data, false positives drop by 40%. That’s why the FDA now routinely asks for both when reviewing post-market safety studies.

How Regulators Use This Data Today

The FDA’s Sentinel Initiative is the gold standard. Launched in 2008, it connects 11 major healthcare systems and 3 claims processors to monitor safety across 300 million patient records. It’s not passive. Sentinel runs automated alerts for unusual patterns-like a spike in liver failures after a new drug launch.

In January 2024, the FDA released draft guidance requiring registries to meet 80% data completeness for key variables like lab results and dosing. That’s a big step. It means registries can no longer be “nice to have”-they need to meet scientific standards.

The EMA’s Darwin EU network now pulls data from 32 databases across Europe. It’s designed to respond to safety concerns in weeks, not years. When a new diabetes drug showed possible pancreatitis signals in 2023, Darwin EU used claims and registry data from 10 countries to assess risk in under 12 weeks.

Pharmaceutical companies are investing more too. In 2017, only 3-5% of pharmacovigilance budgets went to RWE. By 2023, that jumped to 8-12%. Oncology leads the way-38% of RWE submissions use registries because cancer patients are tracked closely in specialized centers. Cardiovascular drugs use claims data most often (45% of submissions) because heart disease is so widespread and well-coded in billing systems.

A scientist connecting registry and claims data with glowing bridges to reduce false alarms.

What’s Next for Drug Safety Monitoring?

The future isn’t just registries and claims. It’s blending them with new data streams. Novartis piloted using wearable devices-like smartwatches tracking heart rate and activity-to monitor safety for Entresto, a heart failure drug. If a patient’s resting heart rate drops suddenly after starting the drug, that could signal a problem before they even feel symptoms.

AI is speeding things up. A 2024 study in JAMA Network Open showed AI algorithms reduced false safety signals by 28% by spotting patterns humans miss. The FDA’s REAL program, launched in 2023, aims to standardize registry data for 20 priority diseases by 2026-starting with rare diseases where traditional methods fail.

But challenges remain. Data privacy laws like HIPAA and GDPR make sharing hard. Standardizing codes across systems still takes up 40-60% of project time. And not every hospital or clinic has the tech or staff to contribute.

Still, the direction is clear: drug safety is moving from passive reporting to active surveillance. Registries and claims data aren’t just backup tools anymore. They’re the backbone of modern pharmacovigilance.

Why This Matters for Patients

You might think this is all about regulators and drug companies. But it’s not. Every time a new safety warning appears on a medication label-like “may increase risk of liver damage in elderly patients”-it likely came from one of these systems.

A patient on a new blood pressure drug might have a mild reaction. If they’re in a registry, that gets recorded. If they’re on Medicare, it shows up in claims. If enough similar cases appear, regulators investigate. That’s how dangerous drugs get pulled or labeled correctly.

It’s also why some drugs get approved faster for rare conditions. Without registries, companies couldn’t prove a drug works in 500 patients with a genetic disorder. Without claims data, they couldn’t prove it’s safe in 10 million people with high blood pressure.

The bottom line? Real-world evidence doesn’t replace clinical trials. It completes them. And for patients, that means safer, smarter, more personalized care.

What’s the difference between claims data and registry data?

Claims data comes from insurance billing records and includes diagnosis codes, procedures, and prescriptions. It covers millions of people but lacks clinical details like lab results or patient symptoms. Registry data is collected directly from patients or clinics and includes detailed medical information-like imaging, lab values, and patient-reported outcomes-but typically covers smaller groups, often under 50,000 people.

Can claims data detect rare side effects?

Yes, but only if the dataset is large enough. Claims data can detect side effects occurring in 1 in 10,000 patients, but it usually requires a population of at least 1 million to be statistically reliable. Registries can detect the same signal with half the population because their data is cleaner and more complete.

Why do regulators prefer combining registry and claims data?

Combining both reduces false positives by up to 40%. Claims data flags potential safety signals across large populations, while registries provide clinical context to confirm if the signal is real. For example, a spike in hospitalizations for kidney failure might be a coding error in claims data-registry data can show whether those patients actually had abnormal kidney function tests.

Are patient registries reliable?

They can be, but only if well-designed. High-quality registries have standardized data collection, regular audits, and high participation rates (above 70%). Many academic registries struggle with funding and shut down within five years. National registries like the SEER cancer registry or the Cystic Fibrosis Foundation registry are considered highly reliable due to consistent funding and strict protocols.

How is real-world evidence used in drug approvals today?

Since 2017, the FDA has approved 12 drugs or new uses where RWE played a key role. Five of those approvals relied on claims or registry data to confirm safety in broader populations or to support expanded use. For example, registry data helped approve pembrolizumab for additional cancer types in 2017, and claims data supported palbociclib’s use in older patients in 2019.