Important Considerations for Interpretation of Outcomes.
The following important considerations should be borne in mind when reading these findings:
The data reported in this chapter represent pre and post programme measurements.
Pre and post measurement is carried out at the start and finish of programmes but other elements of care, simultaneous interventions, time, medications etc. may also play a part (any effects cannot be solely attributable to clinical programme intervention).
Where appropriate to the analysis of outcomes, paired sample t-tests are used to determine if, across the sample, post-scores are statistically significantly different from pre-scores. Where a t-test is not appropriate the non-parametric alternative, a Wilcoxin Signed Rank test is used. Statistical significance indicates the extent to which the difference from pre to post is due to chance or not. Typically the level of significance is set at p > 0.05 which means that there is only a 5% probability that the difference is due to chance and therefore it is likely that there is a difference. Statistical significance provides no information about the magnitude, clinical or practical importance of the difference. It is possible that a very small or unimportant effect can turn out to be statistically significant e.g. small changes on a depression measure can be statistically significant, but not clinically or practically meaningful.
Statistically non-significant findings suggest that the change from pre and post is not big enough to be anything other than chance but does not necessarily mean that there is no effect. Non-significant findings may result from small sample size, the sensitivity of the measure being used or the time point of the measurement. As such non-significant findings are not unimportant; rather they provide useful information and an invitation to investigate further.
Practical significance indicates how much change there is. One indicator of practical significance is effect size. Effect size is a standardized measure of the magnitude of an effect. This means effect sizes can be compared across different studies that have measured different variables or used different scales of measurement. The most common measure of effect size is known as Cohen’s d. For Cohen’s d an effect size of:
- 0.3 is considered a “small” effect
- 0.5 a “medium” effect
- 0.8 and upwards a “large” effect
As Cohen indicated ‘The terms ‘small,’ ‘medium’ and ‘large’ are relative, not only to each other, but to the area of behavioural science or even more particularly to the specific content and research method being employed in any given investigation. In the face of this relativity, there is a certain risk inherent in offering conventional operational definitions for these terms for use in power analysis in as diverse a field of inquiry as behavioral science. This risk is nevertheless accepted in the belief that more is to be gained than lost by supplying a common conventional frame of reference which is recommended for use only when no better basis for estimating the ES index is available.” (p. 25) (Cohen, 1988).
Clinical significance refers to whether or not a treatment was effective enough to change whether or not a patient met the criteria for a clinical diagnosis at the end of treatment. It is possible for a treatment to produce a significant difference and medium to large effect sizes but not to demonstrate a positive change in the service user’s level of functioning.