Finding and Judging the Misfit

[An insight into the process: the first is the randomised text, itself a meaningful found prose-poem, and the second the found poem from the randomised text]:



Detect misfit,
their judgements

separated from

Measured and

robust to

misfit data.
Scale shares

scale the

which can be useful.

be required.

The traits are

being used,

measuring progress
of the misfit model.

Possible pairwise
pairwise possible


A judge
judges if

there is no consensus
amongst judges over

the quality of the


Nail Your Colours to No More Marking

colour demo

This is a grave new world of assessment. In my post yesterday I referenced via an article in Schools Week the assessment company No More Marking [a ruse of a title if I have ever heard one] and as is so often the case with me, such has rankled my sensibilities about teaching, learning and assessment.

There are two things I want to quote from the No More Marking website for any interested readers to check out and judge for yourselves [I think you are entitled to judge for yourself, without access to a graph/matrix/model….].

The first, which I found mildly amusing, is their ‘Colours Test Demo’, here, which is meant to prove the hypothesis as follows [and quoted in yesterday’s posting]:

Marking does not work when it involves any degree of human judgement. This is due to a simple principle.

“There is no absolute judgment. All judgments are comparisons of one thing with another”. (Human Judgment: The Eye of the Beholder by Donald Laming, p.9).

I can confirm that when I completed the demo I had been unable to retain the information needed to make the ‘correct’ judgement about the sequence of colours. I am flummoxed by how this relates to and proves that I cannot effectively compare and comment on writing from across a range of writing? With 30 years of human teaching experience I feel I have an expertise to do so [accepting there will be variations in aesthetic appeal/expectation – what makes writing, especially creative, what it is in its infinite variety] and by extrapolation I reckon that if I had experienced the sequence of squared colours used for 30 years I would then be able to sequence them precisely as originally sequenced.

Second, and to leave readers with, is the following. On the one hand, in not understanding this I could just be hugely out of my comfort zone in not comprehending the mathematics/statistics of it all [and of course I am!]; on the other, it could just be totally ridiculous, an emperor’s new clothes of assessment gobbledygook that sums up its meaninglessness to me as a human English teacher in its meaningless to me as a human English teacher. I will of course be making a found poem out of this stuff:

Following a series of pairwise judgements we can establish a measurement scale using a statistical model. The most commonly used model is the Bradley Terry model (Hunter, 2004) which predicts the outcome from any comparison. The statistical model enables us to build a measurement scale without having to make all the possible pairwise comparisons that would otherwise be required.

The measurement scale that results from a CJ study has some powerful characteristics. The Bradley Terry model is algebraically equivalent to the Rasch model (Rasch, 1960), so the measurement scale shares the advantages of a Rasch measurement scale. The scale is linear, robust to missing data, has estimates of precision, detects misfit, and the parameters of the objects being measured can be separated from the measurement instrument being used.

A CJ scale can therefore be examined in terms of its reliability and consistency: a high value of reliability would suggest we could replicate the scale. The linear scale means that CJ studies can be anchored together using a sub-set of common items, which can be useful, for example, in measuring progress over time. Misfit to the model can be detected both for objects being measured and for the judges doing the measurement. An object may misfit if there is no consensus amongst judges over the quality of the object. A judge may misfit if their judgements are not consistent with the overall measurement scale. Misfit is useful in understanding the traits under consideration and the interactions between judges and the traits (Pollitt, 2012).

© No More Marking [colour chart and quoted sections] –


The Balls of Finite Outcomes

As a former secondary school English teacher and Head of Department I had to suffer the annual assumed assessment and judgement of ‘stagnant’ – or worse – student progress from Key Stage 2 to Key Stage 3.

I was and always will be on the defensive about this. One of my more considered rejections [above and beyond a simple disregard for the miserable nonsense of KS 2 and 3 external testing in English] was the nature of KS3 English testing, when it existed, which did suddenly require students to read and write quite differently to the discrete kinds of testing at KS2, though this too suffered the ludicrous narrowing of prescriptive expectations for student responses.

As reported in today’s Schools Week, a new organisation No More Marking has reported forty-two per cent of year 7 pupils either stood still or “regressed” in English, based on their assessment software [no comment on this methodology, yet…].

It is too soon to simply reject, yet again, such an ‘assessment’ organisation and its assertions, but also too soon to warm entirely to the company’s director of education Daisy Christodoulou who is reported to have stated the following:

Year 7 may also have “particular issues” around transition, she said.

“They’re suddenly studying a lot more subjects, they’ve got lots of new teachers, new peers – there’s a lot more going on there.”

No More Marking tested more than 28,000 year 7 pupils using “open-ended” questions in English and maths, which could not be revised for and required a creative grasp of concepts…

This is sensible about the personal and social phenomena of the transition for most students, but I do also wonder especially at the seemingly poignant reference to students needing a creative grasp of concepts to respond to their open ended questioning.

At KS2 testing in English, and therefore the explicit teaching to this, there is absolutely nothing that encourages being creative or independent or anything other than robotic in dealing with the discrete and closed nature of, in particular, Grammar, Punctuation and Spelling testing.

Suddenly at KS3 in English, students will be reading and writing much more widely – like they used to at KS2 lest colleagues there think I am denigrating their curriculum. I’m not. The target culture that judges so clinically at KS2 would seem to have narrowed the curriculum to the robotics of test preparation, as reported so thoroughly – from schools, not the DfE – over recent years.

Obviously as I write I do not know what the exact nature of the No More Marking assessment is. That said, I have to conclude on a note of worry and concern when I quote the following guiding principle from the No More Marking organisation:

Marking does not work when it involves any degree of human judgement. This is due to a simple principle.

“There is no absolute judgment. All judgments are comparisons of one thing with another”. (Human Judgment: The Eye of the Beholder by Donald Laming, p.9).

Laming has shown that at best our judgments are ordinal. We can place things in an order, but scarcely more than this. Ask two people to apply a mark scheme and you will most likely get different marks. Ask people to place two scripts in order, and you will get more consistency.

Having just come through the process of examining GCSE with regular standardisation through seeding, I am not a novice when it comes to questioning personal judgements [and haven’t been before the online seeding process]. That said, I can’t imagine assessing English properly without it – unless, of course, we think English teaching and learning is only concerned with finite outcomes,

excusing the complete bollocks of the last two words in the above paragraph.


National Poetry Day, 28th September, 2017: Freedom


For at least the last decade, and probably much longer, I have produced creative writing ideas for National Poetry Day. Usually shared through Teachit, I am this year making these freely available for download on this site. Nothing would give me greater pleasure than to know these are being productively used in classrooms to help support students to write poetry.

As is usual, I design and promote the writing of list poems for this special day. List poems are an easy structure to follow/copy, and the repetition of lines can produce quite an impact on the page as well as being read aloud.

The following two paragraphs are from the teacher’s notes supplied below, and further explain:

The three ideas offered here are to support students writing poems for this year’s National Poetry Day. All three provide structures and models to aid writing list poems: these are straightforward and impactful with the repetitions building detail, pace and overall meaning.

These ideas are also essentially writing aids. Students can and should talk through overall thoughts about the theme of Freedom and how each creative writing idea presents this, but the focus is on the practical activity of writing – hopefully getting quickly into the spirit and crafting of the approaches.

Click on the links below. This will take you to a pdf copy of the individual resource that can then be downloaded.

Please feel free to share as widely as you can with other teaching colleagues:

0National Poetry Day – TN

1Freedom to

2Freedom to

3Freedom to

4Freedom to

5Freedom is Randomised

6Freedom is Randomisedb

7Freedom is Where

8Liberty poem

9liberty poem edited

The following PowerPoint has 5 slides in support of the above resources [the Tagore poem is resourced from the internet and I sincerely hope it is entirely accurate in its original Bengali language]:

National Poetry Day PP

The following is an additional resource, added today [10.9.17]:

10People of the World

11A Song poem


Centre Volatility

Most centres
display volatility,

for example,
managing C

or straddling
and straddling it,

or status

or with selective
meals factors,

or little

comparable factors

the same students.

deprivation index;
stable deprivation:

differential effect
of deprivation.

Or selective,
comfortably within

grades A*/9
which do not seem

differential, this
entry stability.

One might
expect this

one might,
or not.

Progress 8

Es will attract
additional points
but not
if Easy


New performance measures
a point score,

see step below,

not C or below/above, mind you,


but Strong pass.

We measure
and 8 measures
every increase
in approved list
double weighted
double weighted

and compare
compare and

put simply:

our first step
an academic core
based on their key stage
then used
to calculate

to calculate.

Student Influx Warning


Above from today’s Guardian.

Apart from this graph’s headline sounding like a Daily Mail immigration scare story, what does it actually tell us?

Not much, and quite a bit.

Significantly, very little has changed between 2016 and 2017 in terms of the two English and Maths results – the main subjects [along with Science usually, but not this year] by which schools are judged to have performed.

Why is this significant?

  • It suggests Gove’s [and currently The Gibb’s] pontificating about ‘tougher’ exams making it harder to achieve the top grades – but also, by extrapolation, most grades, especially B-C now 6-4 – hasn’t been realised;
  • And/or the grade boundary adjustments undertaken by Awarding Bodies, presumably with Ofqual blessing, have retained the status quo.

What else does it tell us about GCSE English, and Literature [my examining subject] in particular?

  • That students, teachers and schools have had to work extra hard adapting to changes – closed-book, new texts, increased content, pressures to hit targets/Progress 8 especially for Eng Lit that was included in school performance/progress data – was an increase that did not [if my Maths is up to 2017 scratch-ability] warrant the ratio of bloodsweatandtears to the reality of securing the status quo.

Anything else?

  • What we don’t know is how any individual students and individual schools will have fared with these changes: that is, whilst nationally percentage passes across grades have remained remarkably similar, they could be very wrong for some/many students and schools.
  • None of this addresses the bigger issues of why there is a perception that GCSE examinations need to be tougher/more robust [apart from satisfying the compulsion for such regular political rhetoric], and why we continue to examine students like this at age 16.

Another graph, from Schools Week, for visual learners: