Bill Warburton
ISS,
Dr. Ian Harwood
‘I looked through as much literature as I could, as
to what… you should do if… you get a catastrophic failure. And the only four … were one or two spare
computers- … keep colleagues informed- … don’t panic … have … paper
copies up your sleeve for the worst possible scenario. And that’s it, that’s all the literature
suggests you do’.
(Tutor, talking
about a failed CAA test)
Assessment is a
sensitive issue, and converting traditional methods of assessment to
Computer-Assisted Assessment (CAA) is acknowledged to be a risky activity (Harwood & Warburton 2004; Zakrzewski &
Steven 2000). As
students become more litigious (Baty 2004; QAA 1998) and competitive pressures increase Universities
cannot afford mistakes when implementing new assessment strategies.
The
This paper reflects
on the differences between these two failed tests and thereby presents a novel
view of the issues inherent in implementing CAA on an institutional scale. Only
one study of ameliorating genuine CAA failures was found in the literature {Harwood, 2004 139 /id} and most published studies are of
small-scale CAA practice where the risks are perhaps more apparent and more
easily contained, rather than the full-scale institutional implementation which
is the context of this study (Kennedy 1998).
A case study
approach is adopted using cross-case analysis (Yin 2003). Semi-structured interviews were
conducted with the staff involved in all the pilot applications a few weeks
after each assessment with the aim of capturing directly the richest possible
feedback before memories faded (Heyl 2001), and also to give the tutor time to
reflect on the entire process. Respondents were asked about their impressions
of the CAA process from authoring to publication and delivery, but were free to
talk in detail about anything relevant that impressed on them during the
project. The interviews were transcribed verbatim and checked by the
respondents. The interviews, together with notes kept by the CAA Officer
(CAAO), and in some cases feedback from participants, were developed into case
studies. The case studies of the two assessments
that went wrong are now compared to show how resilience can be built into future
assessments.
A learning
technologist (LT) in School A, who had previous experience with Perception,
contacted ISS regarding the possibility of administering formative calculations
tests using Perception. There was a
summative element in that although participants were to be given continuous
access to the tests, they had to attain scores of 100% in order to pass their
courses.
School A created
a pool of ‘hard coded’ multiple choice (MCQ) items drawn from real world
practice, and the tests consisted of numeric questions drawn at random from the
pool. Two versions were produced, one for
practice, and the other a summative element on which students were required to
score 100%. As with all the tests
administered during the Southampton pilot project, a question-by question,
‘save as you go’ (QXQSAYG) template was used, providing a degree of resilience-
if a test was interrupted for some reason, it could be resumed at the last
saved point with the balance of
remaining time.
School A ran the
tests without further contact with Information Systems Service (ISS) or the Centre
for Learning and Teaching (CLT), and several thousand iterations were delivered
asynchronously to several hundred participants over the internet from their
work placements or from home, and in some cases from public workstations on
campus, without any reported problems.
After the tests had been running asynchronously for some weeks, ISS received
a phone call at lunchtime from School A reporting that there was a serious
problem with a Blackboard test. The
Blackboard administrator ascertained that the test, which had been launched
from Blackboard, was actually a Perception test, and that at the end of their first session, most
participants had been unable to continue to a second test because Perception was
returning ‘server unavailable’ messages.
The tutors were
understandably disappointed with the performance of the CAA system. As far as they were concerned, the system had
let them down and it was immaterial whether the problems had been technical or
procedural.
A previous CAA
pilot project using WebCT was terminated in School B when the WebCT licence
expired early in 2003. A tutor responsible for a large Year 1 undergraduate
course had participated in an earlier CAA pilot project based on WebCT which
used several hundred objective items from one of the course set texts, and
engaged a research assistant to put all the questions into WebCT format. The WebCT tests were constructed as random
selections from the different topic pools.
Another School B tutor (the second author of this paper) took the course
over and attended a half-day Perception course at the beginning of September
2003, and used the Windows Authoring tools to construct a set of practice and
summative assessments that drew from the same large question pools. 280 School
B students took the practice test in excess of three thousand times over the
next few weeks up until the day of the summative test early in November 2003,
which was timed to happen a few days after the problems experienced in
submitting responses by participants in a separate synchronous Perception test
run by School A.
School B used the
Perception system to run invigilated medium stakes summative tests for 280
students that were to be delivered in two roughly equal sittings. The first
sitting was distributed between four of the largest public workstation clusters
which are housed in two buildings far apart. A security alarm went off right
outside the largest workstation room and students worked through the noise. A Security person hung his hat on the siren
in an attempt to muffle it, but nerves were frayed by the time participants
were ready to finally submit their answers, but most were unable to submit
their assessments. Most were unable to
take any further action because they were using Perception Secure Browser (PSB.
The workstations were subsequently reset and the test terminated because more
than 100 impatient students were waiting to begin the second sitting.
The second
sitting students were admitted and attempts made to start their assessment, but
the system responded so slowly that even after 20 minutes, only around 15% of
the students had begun so the assessment was terminated. Analysis showed that
although the majority (168 out of 171) of students from the first sitting appeared to have had
their results saved normally, three were completely absent but were later found
in Perception’s local ‘progress’ files by which time the academics had decided
to abandon the assessment due to the risk of appeals. This occurred on a Friday
afternoon, and the tutor developed an alternative assessment strategy over the weekend.
Having given an
overview of the two ‘failure’ cases, Table 1 gives a cross-case analysis (Yin 2003) of the key differences between them.
Case 1 (School A) |
Case 2 (School B) |
Communication
gaps existed between tutors and local learning technologist, and between the
School and the CAAO |
Tutor maintained
direct contact at all times with the CAAO |
Internal
communication gap within ISS between the Oracle DBAs and the CAA Officer |
Communication
gap between DBAs and Perception administrator had been fixed |
Tutors uninformed
about risks associated with CAA |
Tutors
risk aware |
Tutors unimpressed-
expected to simply turn up and invigilate.
Tests prepared for them by a LT |
Tutors
were enthusiasts- arranged central workstation bookings and produced lists of
which students in which room- took a lot of care. |
Locally-
booked workstation room- no timetabling clash |
Procedural
issue of a centrally-booked workstation room timetabling clash |
No
environmental problems |
Alarm went
off- an unpredictable environmental event exacerbated a difficult situation |
Paper-based
testing is more attractive. |
Paper-based
alternative means more essays to mark, which is unattractive. |
Inflexible
approach- didn’t work once, so dropped it |
Tutor suggested ‘optional substitution’ of formative marks |
Outcome:
CAA abandoned |
Outcome: graceful recovery |
Table 1- Two failed CAA tests compared
The main learning points from this research fall into
two categories: human factors and technical strategies.
Human
factors
The
key element in the graceful recovery of the School B assessment exercise
appears to have been the flexible approach adopted by the tutor. His strategy
of promptly offering the students a choice between the best of their practice
scores or retaking the test on paper undoubtedly forestalled a rebellion (no
appeals have been received six months after the test). His strategy of reducing
the impact of assessment failures by having smaller sessions with reasonable
gaps between them appears sound, and should form part of the advice given to
tutors who are planning large-scale CAA assessments {Harwood, 2004 139 /id}.
The shift in
complexity is being managed and the communication gaps are being closed through
suitable CAA procedures: a challenge for the future may be to ensure these are
properly adopted by the local CAA community.
Technical strategies
ISS will conduct
regular and frequent load-testing of the Perception system, with the aim of
giving reasonable warning of service reductions so that the risk of a failure
at the point of delivery can be minimised.
Closing comments
A close
partnership has grown up between the University and Question Mark in the
aftermath of joint efforts made to resolve these problems. However, it may be worth asking when such partnerships should properly
begin. If a proactive approach is taken with building partnerships and sharing critical
information with vendors before
catastrophes occur (Mohr & Spekman 1994) then it may be possible to pre-empt them. Experiences such as these should be shared
within the CAA community. The extent of
the damage caused by these two cases to the further uptake of CAA within the
institution is yet to be assessed.
References
Baty, P. 2004 Litigation fees top £15m as academic disputes grow. Times
Higher Education Supplement . 12-3-2004.
Harwood, I. 2004 When
summative computer-assisted assessments go wrong: disaster recovery after a
major failure. Submitted to British
Journal of Educational Technology .
Harwood, I. &
Warburton, W. 2004 Thinking the unthinkable: using project risk management when
introducing computer-assisted assessments. 8th International Computer-assisted
Assessment (CAA) Conference, 6th & 7th July 2004, Loughborough University,
UK.
Heyl, B. S. 2001,
"Ethnographic Interviewing," in Handbook
of Ethnography, P. Atkinson et al., eds., Sage, London, pp. 369-383.
Kennedy, N. (1998)
Experiences of assessing LMU students over the web. Accessed at http://www.ulst.ac.uk/cticomp/Kennedy.html
on 3-4-2004
Mohr, J. &
Spekman, R. 1994, "Characteristics of Partnership Success: Partnership
Attributes, Communication Behaviour and Conflict Resolution", Strategic Management Journal, vol. 15,
no. 2, pp. 135-152.
QAA (1998) University
of Bath Quality Audit Report, May 1998.
Accessed at http://www.qaa.ac.uk/revreps/instrev/bath/comms.htm
on 3-9-1904
Yin, R. 2003, Case study research: design and methods,
3rd edn, Sage, Thousand Oaks.
Zakrzewski,
S. & Steven, C. 2000, "A Model for Computer-based Assessment: the
catherine wheel principle", Assessment
& Evaluation in Higher Education, vol. 25, no. 2, pp. 201-215.