Many Shades of Purple - Exploring Comparative Judgement to Assess Writing

Assessment in the new Curriculum for Wales is a hot topic. The practice of making a 'best-fit' judgement against (at best woolly) level descriptions is no more, and many would say good riddance to these and even more so to the additional nonsense that was sub-levels! 

As a result, teachers are now having to grapple with the hugely challenging task of working out exactly what progress along a continuum of Progression Steps looks like, ensuring their curriculum is built to enable this, and deciding how it will be assessed, in all six Areas of Learning (AoLEs) and the disciplines within these. There is a lot of guidance from the Welsh Government on the Hwb platform and in the Progression Code document, but the emphasis is on school and cluster-based agreement on how learners will be enbled to demonstrate their progress.

Against this background, and as the lead for LLC (Languages, Literacy and Communication) in our federation, I jumped at the chance, provided by our regional consortium, to trial comparative judgment as a way of assessing writing. This was facilitated by 'No More Marking' (NMM) using their 'Assessing Primary Writing' project. The 'shades of purple' activity on their 'Demo' page illustrates powerfully how completing a series of comparisons is far more reliable than making absolute judgements. Daisy Christodoulou, the Director of Education at NMM, has written extensively about assessment and the validity and reliability of different methods in her blog. I also attended a webinar with Daisy as an introduction to the project. During this, attendees were invited to take part in a live judging session where we were shown pairs of scripts and had to click left or right, to choose which we thought was the better example. We were encouraged to go with our (professional) 'gut' feeling. In many cases, it was immediately obvious which script was better, and for most others, a decision could be made relatively quickly by reading through the first section of each script. Each script was 'judged' at least 10 times by the group as a whole, and through comparison with different scripts. The whole process took minutes and resulted in the scripts being placed in rank order. We were told that collectively we had achieved a reliability score of 0.93; in other words, there was a high degree of consistency in the judges' decisions. Any score above 0.8 indicates a high degree of reliability. 

Along with a rank order, the NNM comparative judgement engine produces a scale score, and uniquely, a writing age .

Everything I had heard and read persuaded me that this was an approach to assessment that was completely in keeping with the ethos of Curriculum for Wales, whilst improving the reliability of the assessment of writing and having a positive impact on teacher workload. 






Next post:Comparative Judgement - Did it live up to expectations?


Comments

Popular posts from this blog

Comparative Judgement - What Next?

Comparative Judgement - did it live up to expectations?