COVID-19 and pupil assessment

Empty exam hall

GCSE exams were due to take place over the next few weeks in Wales, but have been cancelled due to COVID-19. In the second of our blogs about the impact of the pandemic on young people’s education I look at the replacement of formal examinations with teacher assessments. As qualifications bodies will be relying more heavily on teacher assessments in the awarding of grades and qualifications in Wales it is of increasing importance that we examine biases and the varied impacts they might have on pupil assessment.

Bias, and particularly unconscious bias, exists in all spheres of life. When academics peer-review the work of others in their field, when doctors decide what treatments to give to their patients, and what evidence policy-makers listen to most. This is no different for teachers.

One particular form of unconscious bias is what is known as the anchoring effect – this can be especially problematic when assessing someone’s skills or achievement. During my first-year undergraduate lectures I would ask new education degree students who thought they were above average in the class. Almost everyone thought they were. The reason for this was because most of them did not know what the ‘average’ was; they had no anchor point to compare themselves against. More concerning, very few of the students acknowledged this, instead they were happy to just guess!

Of course, with practice and experience it gets easier to determine what ‘average’ is or what a First-class essay is, or what a C grade is in an English GCSE assessment. And some subjects, such as mathematics, are easier to grade because a grade is based on the number of correct/incorrect answers.

In addition to the anchoring effect there is also another important form of bias – discrimination. This is when the assessor is biased in favour (or against) a learner based on their knowledge and experience of that learner. This is probably the most challenging bias to identify and address since many assessments specifically require the assessor to use their knowledge and experience of the learner when making their assessment. Think of assessments of group working, or presentations, or any kind of practical assessment – very difficult to measure in objective and transparent ways.

Whilst there may be ways of acknowledging and mitigating such bias, ultimately assessments have to be based on professional judgement. And it is important to acknowledge that these professional judgements are likely to be more accurate than the assessments of others, including parents! But this really gets problematic when systematic bias exists; assessments that favour particular groups or kinds of learners over others.

Conscious and unconscious discrimination against particular groups of learners is widely known to exist, but it is difficult to pin-point when this occurs and which assessors contribute to it. Typically, the only way this can be identified is through closely controlled experiments or from analysing large numbers of outcomes. For example, we know that predicted grades for A Level students can discriminate against young people who were eligible for free school meals. However, the impact of this discrimination is not uniform across the grades. So, we often see patterns like this: FSM students predicted D grades might get C grades in practice, FSM students predicted A grades might get A* grades, but FSM students predicted Cs and Bs are likely to still get Cs and Bs.

A less well known phenomenon is the impact of bias across different subjects. In some recent analysis we did using the Millennium Cohort Study (a UK-wide nationally representative cohort of young people all born in 2000 and 2001) we studied teachers’ assessments of the cohort members by subject. The table below shows the proportion of the cohort who teachers judged to be above average at age 11. The table also shows the same proportion but just for cohort members living in Wales (and attending schools in Wales). The results are very revealing. In the core subjects of English and Maths approximately 43-45% of the cohort were considered to be ‘above average’ – in line with what we would expect from a nationally representative sample. However, in Science, Art and Music this proportion was much lower, suggesting teachers were less confident about the ‘anchor’ point from which to compare children. Furthermore, there was even greater divergence by subject when we considered just those children and teachers in Wales1.

As qualifications bodies will be relying more heavily on teacher assessments in the awarding of grades and qualifications in Wales, and the rest of the UK, this year and next it is of increasing importance that we study and examine all these potential biases and their varied impact. However, it may only be in a few years’ time when we really begin to see what biases, if any, will have occurred. Consequently, mitigating policies must be used even if we are unsure of what impact bias may have on teacher assessments. This includes blind double marking of existing work, profiling of current grades against previous grades based on month of birth, gender, ethnicity and socio-economic status, moderation and the use of no-detriment calculations in the final awards.

Teacher assessments at age 11 – percentage reported as being above average (compared to being average or below average)



Wales MCS










Art and design






Physical Education (PE)



Information and Communication Technology (ICT)






1The results for Welsh are also particularly interesting – only 25.7% of children were regarded as being above average, much lower than we should expect. This could reflect great uncertainty about what is regarded as average ability in Welsh amongst teachers, although it could also reflect deep-rooted concerns about competence and confidence in the use of the Welsh language.

Image credit: Robert Moore, iStock