What should potentially be evaluated and how?

Although monitoring can sometimes be used to reduce uncertainties about the impacts of an option, on its own it may not be adequate, and an evaluation may be necessary. Monitoring does not necessarily indicate whether an option has had an impact on the indicators measured: these will almost always be influenced by factors other than the actual option that is being implemented. This makes it extremely difficult to determine what may have caused the changes that are observed. For example, although monitoring may reveal a performance improvement over time, the actual implementation of the option may not be the only causal factor or may not have caused the change at all.

Evaluation of an option should, as far as possible, measure all outcomes, both desirable and undesirable, for which there is substantial uncertainty and which are important to those affected (i.e. where the results of the evaluation could conceivably affect a decision about whether the implementation of an option is worthwhile).

An impact evaluation must estimate what would normally occur in the absence of implementing an option, and then compare this to an estimate of what happens when an option is implemented. Ideally, an evaluation should be built into a programme during the design phase to ensure that the evaluation is planned and implemented as early as possible. Policy briefs can help to ensure that appropriate consideration is given to the need for impact evaluations and to when and how they should be conducted.

It is important that evaluation methods and findings are as reliable as possible. Attributing an observed change to the implementation of an option requires a comparison between the individuals or groups exposed to the option and those who are not. The groups compared should be as similar as possible in order to ensure that only those influences related to the option are evaluated and not others. The most effective way to do this is to conduct a randomised trial in which individuals or groups of people (e.g. within specific geographic areas) are randomly allocated to either receive the option or not to receive.^5,6,7,8 Randomised trials can be conducted as pilot projects before a programme is introduced at a national level, or they can be undertaken in parallel with full scale implementation, for example, by randomly allocating the districts in which an option will be implemented first and then comparing the results to other districts where the implementation has been delayed.

Randomised trials, however, may not always be feasible. Alternative approaches include interrupted time series analyses and controlled before-after studies.⁹ An interrupted time series analysis can be used when data are collected from multiple time points both before and after the implementation of the option. If the necessary data are available, interrupted time series analyses are relatively easy to conduct and are advantageous both because a control group is not needed and because their design controls for trends over time and variability in indicators over time. The most important disadvantage of such analyses, however, is that influences other than the option that is being evaluated may also affect the observed changes.

In controlled before-after studies, changes before and after the implementation of an option are compared with the changes observed during the same time period in areas where the option has not been implemented (e.g. in neighbouring districts or countries). The main advantage of controlled before-after studies is that they may sometimes be the only practical option in situations where, for example, randomisation is not feasible for practical or political reasons and where it is not feasible to collect data at multiple time points. However, controlled before-after studies rarely provide reliable estimates of impacts. This is because known or unknown differences between the compared groups may exert more influence on the outcomes measured than the actual option that is being evaluated. Consequently, it is generally difficult – if not impossible – to attribute with confidence any observed changes (or lack of change) to the implementation of an option.

Other study designs may sometimes be used to assess the impacts of health policy options. However these are often not feasible for assessing the impacts of health policies (e.g. cohort studies and case-control studies) and they rarely provide compelling evidence (particularly before-after studies, historically controlled studies and cross-sectional studies).^1,8 Although qualitative studies (as well as other quantitative designs such as surveys) can provide valuable evidence to explain how an option worked or why it did or did not work, beyond gathering the perceptions of those who were interviewed or surveyed, they are unable to generate the kind of data that can be used to estimate the effect of an option.

A list of different evaluation designs with definitions and a summary of their strengths and weaknesses are included in the ‘Additional resources’ section of this guide.

Rigorous impact evaluations can be expensive and budget, time, or data constraints may severely limit the ability to undertake them. These constraints can reduce the reliability of impact evaluations because:

The overall validity of the results may be threatened by a number of factors including insufficient planning or follow-up, a lack of baseline data, a reliance on inadequate data sources, or the selection of an inappropriate comparison
Samples may be inadequate such as convenience samples that are not representative, samples that are too small, or if inadequate attention is given to contextual factors

It may be possible to address budget, time, and data constraints by, starting the planning process early and reducing the cost of data collection. However, for an impact evaluation to be worthwhile, it is important to ensure that neither the threats to the validity of the results nor the limitations of the sample are such that the results of the evaluation fail to provide reliable information. Before implementing an evaluation, an assessment should therefore be made as to whether an adequate evaluation is possible. If it is not, an assessment should be made as to whether the programme should be implemented without evaluation, given the nature of the uncertainty about its potential impacts.

Several models have been described for assessing the extent to which an adequate evaluation is possible and this is sometimes referred to as an “evaluability assessment”.¹⁰ An evaluability assessment can help to determine whether the intended objectives of an evaluation can be achieved with the resources and data available and within the specified time horizon of the evaluation.¹¹ The purpose is not to see if the logic of a programme is sufficiently clear for an evaluation design to be constructed, but rather to check if the particular level or levels of government (or non-governmental organisations) are able to begin collecting, analysing, and reporting evaluation data so as to inform the decision-making process.

A policy option is evaluable if:

The goals and priority information needs are well defined
The goals are plausible
Relevant data can be obtained at a reasonable cost, and
The intended users of the evaluation results are able to agree on how they will use the information ¹²

A variety of methods can be used to assess the evaluability of a programme, including interviews, document reviews, and site visits.⁹ A link to a worksheet for assessing the need for evaluating the impact of an option (and, if so, how to do this) is provided in the ‘Additional Resources’ section of this guide.

Decisions about whether to proceed with an option when there are important uncertainties about its impacts and it is not evaluable will depend on judgements about the size of the problem, what the alternatives are, the expected impacts of the programme, and the extent of uncertainty about those impacts. They will also depend on the value, costs, feasibility, and acceptability of the option.^{13, 14} The SUPPORT Tool on dealing with insufficient research evidence is provided in the ‘Additional resources’ section of this guide.

Workshop materials and a presentation on clarifying uncertainties and needs for monitoring and evaluation are available in the ‘Additional resources’ section of this guide.

What should potentially be evaluated and how?

This page was last updated November 2011.