Clinical performance studies are central to demonstrating that an In Vitro Diagnostic (IVD) Medical Device deliver’s clinically meaningful results in its intended context of use. Under the EU In Vitro Diagnostic Regulation (IVDR), performance claims must be supported not only by analytical robustness but by credible clinical evidence that reflects real-world decision-making. As a result, study design has become one of the most scrutinized elements of IVDR Performance Evaluation.
Notified Bodies increasingly assess whether clinical performance data is trustworthy, applicable, and aligned with intended purpose. Favourable results alone are not sufficient if study design introduces bias, uses inappropriate reference standards, or applies endpoints that lack clinical relevance. Designing studies with these regulatory expectations in mind is therefore essential to building a defensible Performance Evaluation Report (PER) under IVDR.
The Role of Clinical Performance Studies in IVDR
Within the IVDR framework, performance evaluation rests on three pillars: scientific validity, analytical performance, and clinical performance. Clinical performance studies provide the evidence that analytical accuracy translates into meaningful clinical correlation. They demonstrate how test results relate to a clinical condition or outcome in the intended population and use environment. Additionally, well-designed clinical performance studies help demonstrate real-world clinical value, strengthening confidence in performance claims among regulators and healthcare stakeholders.
Guidance such as MDCG 2022-2 reinforces that clinical evidence must be appropriate to intended purpose and risk class. Clinical performance studies are therefore a primary mechanism for justifying evidence sufficiency under IVDR Performance Evaluation, particularly for higher-risk or clinically impactful IVDs.
Intended Purpose as the Foundation of Study Design
A defensible clinical performance study begins with a precise definition of intended purpose. Intended purpose determines the clinical pathway, target population, comparator methods, and acceptable performance trade-offs. A screening test, for example, carries different evidentiary expectations than a confirmatory diagnostic test, even when measuring the same analyte.
Misalignment between intended purpose and study design is a frequent source of Notified Body questions. Clearly anchoring study populations, endpoints, and reference standards to intended use strengthens the credibility and applicability of clinical performance evidence.
Bias Control: A Core Credibility Requirement
Bias can arise unintentionally through study design choices and can significantly distort apparent performance. Common sources include patient selection bias, spectrum bias, verification bias, and observer bias. Under IVDR scrutiny, unaddressed bias often undermines confidence in otherwise strong datasets.
Manufacturers are expected to identify potential bias sources and explain how they are mitigated through inclusion criteria, blinding procedures, consistent application of reference standards, and transparent reporting. Proactive bias control is a key determinant of whether clinical performance evidence is considered reliable within a PER.
QUADAS-2 as a Review Lens
Although not mandated by IVDR, QUADAS-2 is widely used to assess the quality of diagnostic accuracy studies. Developed by the University of Bristol, it evaluates risk of bias and applicability across four domains: patient selection, flow and timing, index test and reference standard.
An authoritative overview is available at QUADAS-2 (University of Bristol). Designing studies that would perform well under a QUADAS-2-style assessment helps align evidence with how reviewers evaluate credibility, even when QUADAS-2 is not explicitly cited.
Additionally, applying QUADAS-2 principles during study design can help ensure representative patient selection, appropriate reference standards, and real-world test conditions, thereby strengthening evidence credibility, reducing bias risk, and supporting more robust and defensible Performance Evaluation under the IVDR.
Endpoint Selection and Clinical Relevance
Endpoints define how performance is measured and interpreted. Under IVDR, endpoints must reflect clinically meaningful performance rather than statistical convenience. Sensitivity, specificity, and predictive values should be selected based on how results influence clinical decisions within the intended pathway.
For example, predictive values are particularly important in low-prevalence settings, while sensitivity may be prioritized where missed diagnoses carry significant risk. Clearly justifying endpoint selection strengthens the defensibility of IVDR Performance Evaluation by demonstrating alignment between performance metrics and clinical utility.
Reference Standards and Applicability
The reference standard establishes the benchmark against which the index test is evaluated. Weak or inconsistently applied reference standards are a common source of regulatory concern. IVDR requires manufacturers to justify their selection of the reference standard and to acknowledge its limitations transparently, especially when no single gold standard exists.
Applicability is equally important. Study results must reasonably reflect real-world use, including patient demographics, disease prevalence, specimen types, and operational conditions. A defensible study explains how these factors align with intended use and where limitations may affect generalizability.
Integration into the Performance Evaluation Report
Clinical performance studies gain regulatory value only when integrated coherently into the IVDR PER. This integration requires clear mapping between claims, endpoints, evidence, and conclusions, along with balanced discussion of limitations and uncertainty.
Such coherence aligns with broader clinical and performance evaluation practices, where evidence is managed as a unified system rather than as isolated datasets.
Managing Residual Uncertainty Through Post-Market Evidence
No pre-market study can eliminate all uncertainty. IVDR addresses this through structured Post-Market Surveillance (PMS) and Post-Market Performance Follow-up (PMPF). Clinical performance study design should anticipate how post-market data will confirm or refine performance assumptions over time.
Lifecycle integration of pre-market and post-market evidence strengthens the credibility of performance conclusions and supports ongoing updates to the PER under IVDR.
Conclusion
Designing clinical performance studies under the IVDR requires more than technical execution; it requires regulatory awareness, clinical insight, and methodological discipline. Bias control, endpoint selection, justification of the reference standard, and applicability assessment directly determine whether clinical performance evidence is trusted.
When these elements are addressed systematically, clinical performance studies strengthen IVDR Performance Evaluation and contribute meaningfully to a defensible PER. Integrated with analytical performance, scientific validity, and PMS, they form part of a resilient, lifecycle-based evidence system that supports both regulatory compliance and confidence in diagnostic decision-making.
How Freyr Supports Clinical Performance Study Design Under IVDR
Designing IVDR-compliant clinical performance studies requires careful alignment between intended purpose, methodological rigor, and regulatory expectations. Freyr supports IVD manufacturers in planning and executing clinical performance studies that generate credible, applicable evidence and withstand Notified Body scrutiny under the EU IVDR.
Freyr’s experts assist with study design strategy, bias identification and mitigation, endpoint and reference standard selection, QUADAS-2–informed quality assessment, and integration of study results into the IVDR Performance Evaluation Report. For support with clinical performance study design, evidence strategy, or IVDR performance evaluation readiness, speak to a Freyr expert to discuss your regulatory challenges.