Section 2 – Clinical evaluation

Page last updated: September 2016

The following section contains information requests for establishing the clinical benefit of the codependent technologies in terms of patient health outcomes.

An integrated codependent submission may need to present more than one Section 2 to support the proposed listing of the medicine and the test. The extent of information requested is discussed in Subsection P4.1, and will be further contingent upon the availability of direct evidence or the need to use linked evidence. An overview is shown in Figure P4.2.

The following general approach to presenting a submission may be appropriate:

Approach based on direct evidence

  • Section 2a – prognostic effect of the biomarker
  • Section 2d – clinical evaluation of the codependent technologies (evidence of combined use)

and/or

Approach based on linked evidence

  • Section 2a – prognostic effect of the biomarker
  • Section 2b – performance and accuracy of the proposed test
  • Section 2c – change in clinical management
  • Section 2d – clinical evaluation of the codependent technologies (separate)

Each Section 2 should follow the steps presented in Part A of these guidelines.

Direct evidence approach

Additional Information Requests: Direct Evidence

  • 19 (O) Determine whether the biomarker test can predict differences in patient health outcomes irrespective of the clinical management provided
  • 20 (0) Indicate whether the search for direct evidence was comprehensive and whether the selection process was unbiased
  • 21 (O) Assess bias, confounding and the impact of chance on the findings presented in the direct evidence

Section 2a Evidence of prognostic effect of the biomarker

19 (O)  Prognostic effect of the biomarker

Include in Section 2

Determine whether the biomarker test can predict differences in patient health outcomes irrespective of the clinical management provided.

It is important to discriminate the background prognostic effect of biomarker status from the impact of any treatment effect modification associated with the biomarker. This requires a comparison of outcomes in patients receiving usual care conditioned on the presence or absence of the biomarker.

Use the approach described in Section 2 to systematically review the evidence of the presence or absence of a prognostic effect of the biomarker, as identified by the proposed test. Searching the literature for prognostic information is typically more complex than searching for intervention (treatment) studies. For example, literature searches would not be limited to randomised controlled trials. Advice from an information specialist is recommended.

Section 2d Clinical evaluation of the codependent technologies (combined)

Most of the information needed for this section is already covered by the information requests in Part A of the PBAC Guidelines. Additional requests are given below.

When ‘direct evidence’ is available this should be presented in the submission. Direct evidence can include the following trial designs (illustrations of the different trial designs are provided in Merlin et al,55 supplemental data 1 file):

  • Double-randomised controlled trial: A trial that randomises patients to use of the test or not, then randomises to use of the medicine or its main comparator, and then follows patients to measure the effect of the treatment on clinical (health) outcomes.
  • Single-randomised controlled trial of test: A trial that randomises patients to use of the test or not, and then follows patients to measure the effect of targeted treatment with the new medicine on clinical (health) outcomes.
  • Prospective biomarker-stratified design: A trial that prospectively tests eligible patients, then randomises those that are test positive or negative to use of the medicine or its main comparator, and then follows participants to measure the effect of treatment on clinical (health) outcomes. The ‘no test’ or ‘alternative test’ arm is not included in this biomarker-stratified design.
  • Retrospective biomarker-stratified design: A trial that randomises eligible patients to use of the medicine or its main comparator, then follows participants to measure the effect of treatment on clinical (health) outcomes, and then analyses results across subgroups of patients defined by whether they are positive for the test (or biomarker) or whether they are negative to the test (or biomarker).

The design of a double-randomised controlled trial can be used as a template within which the available direct clinical evidence can be hypothetically mapped (see Merlin et al,55 supplemental data 2 file). Identify areas where information is missing in the economic modelling in Section 3.

For example, given that a single-randomised controlled trial of a test does not provide information on the test (biomarker)-medicine relationship (ie evidence that the biomarker is a treatment effect modifier and/or has a prognostic effect), consider supplementing this evidence with information from prospective and/or retrospective biomarker-stratified study designs.

As prospective and retrospective biomarker-stratified study designs are without a ‘no testing’ trial arm (ie to determine biomarker status), the impact of false positive and false negative test findings cannot be determined from the reported patient health outcomes. Consider providing supplementary information from the linked-evidence approach described below, so that a comparison of the proposed test/test strategy and existing test/test strategy can be made with respect to their relative diagnostic accuracy or test performance.

Retrospective biomarker-stratified study designs may use archival tissue/sampling to determine biomarker status. Exercise caution when interpreting results from these studies, because biomarker status might change over time, particularly if there is evidence that an intervening treatment may modify the biomarker result.

20 (O) Selection of the direct evidence

Include in Subsections 2.1 and 2.2

Indicate whether the search for direct evidence was comprehensive and whether the selection process was unbiased. Present a systematic review of direct evidence (study designs given above) concerning the proposed biomarker test and the proposed medicine, with prespecified inclusion/exclusion criteria and study selection outlined in a PRISMA flowchart1 (ie indicating how trials were selected and the reasons why any potentially relevant trials were excluded).

21 (O) Quality of the direct evidence

Include in Subsections 2.3 and 2.6

Assess bias, confounding and the impact of chance on the findings presented in the direct evidence. Give particular attention to the impact of selection bias and confounding with respect to any subgroup analyses. For example, were the subgroup analyses prespecified (involving stratified randomisation) and was blinding maintained? Was the subgroup analysis exploratory (eg determined on the basis of retrospectively obtained samples)? Were the results adjusted for potential confounders?

Linked-evidence approach

Additional Information Requests: Linked Evidence

  • 22 (T) Describe the analytical performance of the proposed test
  • 23 (T) Define the reference standard or a gold standard against which the performance of the proposed test will be measured
  • 24 (T) Indicate whether the search for evidence on the diagnostic accuracy or predictive accuracy of the proposed test was comprehensive, and whether the evidence selection process was unbiased
  • 25 (T) Indicate whether the evidence reporting on the diagnostic accuracy or predictive accuracy of the proposed test is (i) of good quality and (ii) applicable to the requested MBS target population
  • 26 (T) Report on the performance of the proposed test in terms of its diagnostic accuracy or predictive accuracy. If several tests are proposed or no specific test is specified, indicate which test has the best performance. If test accuracy cannot be determined, calculate agreement or concordance between tests
  • 27 (T) Indicate which test is the most accessible/available/used. (Only relevant if several tests are proposed or no specific test is specified)

A full linked-evidence approach is only meaningful when the evidence for the proposed test and the evidence for the proposed medicine have been generated in similar patient populations, and so it is clinically sensible to link the two datasets. If the test identifies patients earlier or with a different spectrum of disease than the patients in whom the medicine has been trialled, then it is not clinically sensible to link this evidence. In this circumstance, present direct evidence of the impact of biomarker testing on patient health outcomes.

Section 2b Test performance and accuracy

22 (T) Analytical test performance

Include alongside Subsection 2.5

Analytical test performance assesses how accurately and how consistently the test identifies biomarker status (eg the coefficient of variation and other appropriate statistics). Present any differences across laboratories in how they characterise test results (eg a kappa statistic or other concordance statistic). Identify whether there is an external quality assurance program by which laboratories can benchmark their assays, and whether the test is performed and interpreted accurately and reliably. An assessment of the analytic validity of the evidentiary standard test, relative to other existing test options, would be helpful for decision making.

23 (T) Reference standard or a gold standard for test performance

Include in Subsection 1.1

Define the reference standard or a gold standard against which the performance of the proposed test will be measured. Provide evidence that the reference standard is considered to be accurate and is an appropriate benchmark. (This is not needed if the reference standard has already been identified and ratified by the Protocol Advisory Sub-committee [PASC].)

Note: The reference standard is not necessarily the same as the relevant comparator for the codependent test. The comparator is the current test/test strategy being used in the absence of the proposed test; this may be different to the benchmark (reference standard) test for determining test accuracy. For example, a reference standard for a new genetic test might be Sanger sequencing, but the comparator for the new genetic test might be a high-resolution melting method.

Also note that the comparator for the test is different to the comparator for the medicine.

Test accuracy
In the instance where a reference standard is available

If a reference standard is available, test performance is determined using diagnostic accuracy measures (eg using a cross-sectional study design). Compare the proposed test to the designated reference standard by cross-classifying the test results of patients who are representative of the intended population receiving the test. The proposed test will be referred to as the ‘evidentiary standard’ if it is the test used in the key evidence presented in the submission.

Use the reference standard designated by the PASC, or select and justify the choice of a reference standard if this has not been previously specified by the PASC.

In the instance where no reference standard is available

If no reference standard is available, test performance can be determined using predictive accuracy (eg using a longitudinal study design, with the clinical outcome providing the benchmark for identifying whether the patient does or does not have the condition).

If a reference standard is not available or is unacceptable for the requested use and/or the requested population, consider the various options for dealing with imperfect or missing reference standards in the guidance provided by Reitsma et al.56 If the guidance by Reitsma et al is not followed, justify the approach used.

Note that if sensitivity and specificity of the proposed test are to be estimated using a composite/constructed standard, the new reference standard should be developed independently from the analysis of results of the proposed test (ideally, in advance of collecting any specimens). Consult with statisticians and health professionals before constructing the reference standard.

If measures of concordance or agreement (positive per cent agreement and negative per cent agreement) are calculated instead of measures of test performance, ensure that the terms ‘sensitivity’ and ‘specificity’ are not used, as these estimates are not of test accuracy but of agreement between the proposed test with the nonreference standard.57

24 (T) Selection of the evidence on test accuracy

Include in Subsections 2.1 and 2.2

Indicate whether the search for evidence on the diagnostic accuracy or predictive accuracy of the proposed test was comprehensive and whether the evidence selection process was unbiased.

For example, systematically review test performance studies for the proposed test (evidentiary standard) with prespecified inclusion/exclusion criteria and a PRISMA flowchart.1 Indicate how test performance studies were selected and the reasons why any potentially relevant studies were excluded.

Note that literature searching for test performance studies will need to be more exhaustive than for treatment trials, because indexing and filtering of these studies is less reliable in bibliographic databases. Suggestions for identifying test accuracy studies in literature searches is given in Chapter 7 of the Cochrane handbook for systematic reviews of diagnostic test accuracy.58

25 (T) Quality of the test accuracy studies

Include in Subsection 2.3

Indicate whether the evidence reporting on the diagnostic accuracy or predictive accuracy of the proposed test is of good quality and applicable to the requested MBS target population.

This can be done using a QUADAS-2 assessment for each test accuracy study in terms of risk of bias and applicability for use in Australia on the domains of patient selection, index test, reference standard, and flow and timing.59 Display the results as a table or graph. Note that QUADAS-2 is a critical appraisal tool, whereas tools like STARD and the ACCE framework are used for reporting test accuracy studies and genetic test interventions, respectively.

26 (T) Performance of the proposed test

Include in modified version of Subsection 2.5

Report on the diagnostic accuracy or predictive accuracy of the proposed test. If several tests are proposed or no specific test is specified, indicate which of the tests has the best performance. If test accuracy cannot be determined, calculate agreement or concordance between tests.

Diagnostic accuracy or predictive accuracy

Provide test performance measures such as sensitivity, specificity, likelihood ratios, positive and negative predictive values, or area under the receiver-operator characteristic curve. Ensure that test failure (invalid results) for either test is documented (proportion of failures), but do not include these results in the test accuracy estimates.

Summarise (if a meta-analysis is performed) test accuracy measures and approaches, as appropriate to the available evidence base. Consider the presence of heterogeneity and/or test threshold effects. Various methods are described by Takwoingi et al.60

When interpreting the results of the studies, prioritise assessing the trade‐offs in false positive and false negative test findings. For example, consider whether there is a clinically accepted test performance level below which a new test should not be used (ie either false positives are too great or false negatives are too great) for the intended purpose.

The main issues to consider are that:

  • false negatives are of greater concern when the clinical setting of the proposed medicine is as last line with best supportive care as its comparator
  • false positives are of greater concern when the proposed medicine is being compared with effective alternatives.

If the reference standard being used to determine test accuracy is imperfect, and it is therefore unclear whether the false positives or false negatives ascertained using the codependent test are actually true positives and true negatives, provide evidence of the clinical (health) outcomes of those patients found to be false positive or false negative and report these under the ‘Direct evidence’ section, if possible.

The positive predictive value and negative predictive value should also be calculated, since these data are key to the calculation of transition probabilities in Subsection 3A.4.

Calculate estimates of sensitivity and specificity, adjusted to correct for any (verification or partial verification) bias that may have been introduced by not using the reference standard to its fullest extent (ie to verify all the results obtained with the new test).56

Agreement or concordance

If agreement data are provided, rather than test accuracy data, measures such as positive predictive value and negative predictive value (used in Section 3) cannot be calculated since the subjects’ condition (as determined by a reference standard) is unknown. In this situation, report the 2 × 2 table of results, comparing the candidate test with the nonreference standard test, and report the agreement measures along with their confidence intervals or kappa statistics. Alternatively, odds ratios could be reported indicating the likelihood of an outcome, given that particular test result.

27 (T) Test availability

Include in Subsection 5.1

Consider which test is the most accessible/available/used. (Only relevant if several tests are proposed or no specific test is specified.)

Where testing is both complex and uncommon, there are important quality and pathology laboratory performance considerations that need to be addressed – for example, biospecimens may need to be shipped to a small number of high-throughput pathology laboratories.

Where biospecimens are relatively transportable, it may not always be an access advantage to bring the test closer to the patient.

Section 2c Change in clinical management

Additional Information Requests: Linked Evidence

  • 28 (O) Substantiate whether knowledge of the test result will cause a change in the management of the patient by the treating clinician. Identify instances where management would not change, despite the test indicating that the biomarker is present

28 (O) Change in management of the patient because of knowledge of test result

Include in Subsections 2.1–2.5

Substantiate whether knowledge of the test result will cause a change in the management of the patient by the treating clinician. Identify instances where management would not change, despite the test indicating that the biomarker is present.

There may be ‘leakage’ issues identified through an assessment of the ‘change in management’ part of the linked evidence. Often a test is done to rule out use of a medicine (eg to avoid potential medicine‐related adverse events or the development of resistance), but the medicine is given anyway, or, alternatively, the test is used to select a specific medicine, but the medicine is not provided. Since codependent tests are used to guide therapeutic decisions, explicitly address this by searching for literature that reports on the management of patients identified with and without the biomarker.

Section 2d Clinical evaluation of the codependent technologies (separate)

Additional Information Requests: Linked Evidence

  • 29 (T) Identify any safety considerations that will impact on the entire process of testing
  • 30 (M) Indicate whether the search for evidence on the therapeutic effectiveness of the proposed medicine was comprehensive and whether the evidence selection process was unbiased
  • 31 (M) Indicate whether the evidence reporting on the therapeutic effectiveness of the proposed medicine is of good quality
  • 32 (O) Provide evidence (if relevant) of treatment effect modification (ie interaction) as a consequence of biomarker status
  • 33 (O) Provide evidence (if relevant) that using the test results in better targeting of patients that are likely to respond most to the medicine (ie by using the prognostic effect of the biomarker to determine the baseline risk of disease or condition progression)
  • 34 (O) Indicate whether the effect of the medicine, as conditioned by the test or biomarker result, has a clinically important and statistically significant effect on patient-relevant health outcomes (both safety and effectiveness)

29 (T) Safety concerns regarding the proposed test

Include in Subsection 2.7

Identify any safety considerations that will impact on the entire process of testing. For example, patient contraindications to the testing procedure, required biospecimen size, additional risk of harm (with reference to Item 16), or processing time impacting on treatment initiation.

30 (M) Selection of the evidence on the therapeutic effectiveness of the medicine

Include in Subsections 2.1 and 2.2

Indicate whether the search for evidence on the therapeutic effectiveness of the proposed medicine was comprehensive and whether the evidence selection process was unbiased.

This evidence should include:

  • the therapeutic effectiveness of the medicine when conditioned by the test or biomarker result
  • the therapeutic effectiveness of the medicine in unselected patients (where biomarker status has not been determined).

For example, present a systematic review of the available comparative clinical evidence of the proposed medicine versus its comparator in patients with and without the biomarker, as well as the available comparative clinical evidence of the proposed medicine versus its comparator when patient biomarker status is not known.

Ensure that the systematic review has study inclusion/exclusion criteria delineated, and include a PRISMA flowchart1 indicating how trials were selected and the reasons why any potentially relevant trials were excluded.

31 (M) Quality of therapeutic effectiveness evidence

Include in Subsection 2.3

Indicate whether the evidence reporting on the therapeutic effectiveness of the proposed medicine is of good quality.

Assess bias, confounding and the impact of chance on the results. Particular attention should be given to the impact of selection bias and confounding on any subgroup analyses. For example, were the subgroup analyses prespecified (stratified randomisation) and was blinding maintained? Or was the subgroup analysis exploratory (determined on the basis of retrospectively obtained samples)? Were the results adjusted for potential confounders? Where the study design involves biomarker positive patients only, assess study quality according to the usual guidance in Subsection 2.3.

Depending on the study design, confounding may occur where biomarker status is a prognostic factor and when there are imbalances in biomarker status in the proposed medicine and comparator medicine trial arms.

32 (O) Evidence of treatment effect modification

Include in Subsection 2.6

Provide evidence (where available) of treatment effect modification (ie interaction) as a consequence of biomarker status.

For example, is there evidence of substantial variation in a measure of relative treatment effect between the proposed medicine and comparator/usual care trial arms after stratifying on biomarker status?

Treatment effect modification in this setting identifies a relationship between the biomarker and the medicine, which is likely to be unique or limited to companion tests assessing a particular biomarker and medicines with a particular mechanism of action (cross-reference to Item 9). This means that both technologies are needed to produce or optimise a clinical benefit.

33 (O) Evidence of prognostic effect

Include in Subsection 2.6

Provide evidence (if relevant) that using the test results in better targeting of patients that are likely to respond most to the medicine (ie by using the prognostic effect of the biomarker to determine the baseline risk of disease or condition progression).

For example, is there evidence of minimal variation in a measure of relative treatment effect between the proposed medicine and comparator/usual care trial arms, but determining biomarker status helps identify patients at greatest risk of an event, which, in turn, helps maximise the absolute treatment effect?

Amalgamate with Item 19 if this issue has been addressed there.

If an improvement in treatment effect is a result of better targeting of those patients that are likely to respond most, this identifies a relationship between the biomarker and a potentially broader range of existing and future treatment options (potentially including nonmedicine treatment options) than is likely to apply for treatment effect modification. This may allow reimbursement of either the test or the medicine of both technologies.

This apparent improvement in treatment effect is simply because a certain patient subgroup (flagged by a specific biomarker) will always do better, so the biomarker is considered prognostic.

It is possible for both treatment effect modification and prognostic effect to coexist. In this case, to assess the unique contribution of the medicine, an assessment of its effect must be made relative to usual care and an adjustment made for the background prognostic effect of the biomarker.

34 (O) Size of the treatment effect on patient-relevant health outcomes

Include in Subsections 2.6 and 2.8

Indicate whether the effect of the medicine, as conditioned by the test or biomarker result, has a clinically important and statistically significant effect on patient‐relevant health outcomes (both safety and effectiveness). Relate this to the following factors:

  • factors intrinsic to the proposed medicine
    • treatment effect modification when prognostic effect is not present in the medicine/biomarker relationship (see Item 32)
    • absolute treatment effect when prognostic effect is present in the medicine/biomarker relationship (see Item 33)
  • the factor intrinsic to the proposed test
    • accuracy of identification of biomarker status given the test result (ie positive predictive value and negative predictive value), and the impact of inappropriately treating or not treating patients who received an inaccurate biomarker test result.

When the proposed MBS listing either cannot include the test used in the evidence base or also encompasses other test options, delineate the consequences of using the other test options in place of the evidentiary standard test for health outcomes and the provision of subsequent health care resources in Subsection 2.7.

Applicability of the effectiveness of the codependent technology

Additional Information Requests

  • 35 (O) Indicate whether the evidence supporting the clinical effectiveness of the codependent technology is applicable to the Australian population and to the circumstances of using each of the technologies

35 (O)  Applicability of the evidence

Include in Subsection 2.7, with any economic implications included in Subsection 3A.3

Indicate whether the evidence supporting the clinical effectiveness of the codependent technology is applicable to the Australian population and to the circumstances of using each of the technologies. For example, is the biomarker prevalence in the trial similar to that in the target MBS population? Is the medicine, dosage and frequency of use in the trial similar to that proposed for the target PBS population? How are any inconsistencies identified in the submission addressed?