Implementing Perception across a large University: getting it right

 

Bill Warburton

ISS, University of Southampton

wiw@soton.ac.uk

 

Dr. Ian Harwood

School of Management, University of Southampton

iah@soton.ac.uk

 

‘I looked through as much literature as I could, as to what… you should do if… you get a catastrophic failure.  And the only four … were one or two spare computers- … keep colleagues informed- …  don’t panic …  have  … paper copies up your sleeve for the worst possible scenario.  And that’s it, that’s all the literature suggests you do’.

(Tutor, talking about a failed CAA test)

 

Introduction

Assessment is a sensitive issue, and converting traditional methods of assessment to Computer-Assisted Assessment (CAA) is acknowledged to be a risky activity (Harwood & Warburton 2004; Zakrzewski & Steven 2000).  As students become more litigious (Baty 2004; QAA 1998) and competitive pressures increase Universities cannot afford mistakes when implementing new assessment strategies.

The University of Southampton is developing a managed learning environment (MLE).  It ran a pilot Perception CAA project across the institution in preparation for the launch of Perception as a full-scale University CAA service, in anticipation of its integration into the MLE. Many Perception tests were run during this project, but two different large simultaneous tests failed irretrievably.  In one case the outcome was positive, whilst the other was less so.

This paper reflects on the differences between these two failed tests and thereby presents a novel view of the issues inherent in implementing CAA on an institutional scale. Only one study of ameliorating genuine CAA failures was found in the literature {Harwood, 2004 139 /id} and most published studies are of small-scale CAA practice where the risks are perhaps more apparent and more easily contained, rather than the full-scale institutional implementation which is the context of this study (Kennedy 1998).

Methodology

A case study approach is adopted using cross-case analysis (Yin 2003). Semi-structured interviews were conducted with the staff involved in all the pilot applications a few weeks after each assessment with the aim of capturing directly the richest possible feedback before memories faded (Heyl 2001), and also to give the tutor time to reflect on the entire process. Respondents were asked about their impressions of the CAA process from authoring to publication and delivery, but were free to talk in detail about anything relevant that impressed on them during the project. The interviews were transcribed verbatim and checked by the respondents. The interviews, together with notes kept by the CAA Officer (CAAO), and in some cases feedback from participants, were developed into case studies.  The case studies of the two assessments that went wrong are now compared to show how resilience can be built into future assessments.

Case 1 overview

A learning technologist (LT) in School A, who had previous experience with Perception, contacted ISS regarding the possibility of administering formative calculations tests using Perception.  There was a summative element in that although participants were to be given continuous access to the tests, they had to attain scores of 100% in order to pass their courses.

School A created a pool of ‘hard coded’ multiple choice (MCQ) items drawn from real world practice, and the tests consisted of numeric questions drawn at random from the pool.  Two versions were produced, one for practice, and the other a summative element on which students were required to score 100%.  As with all the tests administered during the Southampton pilot project, a question-by question, ‘save as you go’ (QXQSAYG) template was used, providing a degree of resilience- if a test was interrupted for some reason, it could be resumed at the last saved point  with the balance of remaining time. 

School A ran the tests without further contact with Information Systems Service (ISS) or the Centre for Learning and Teaching (CLT), and several thousand iterations were delivered asynchronously to several hundred participants over the internet from their work placements or from home, and in some cases from public workstations on campus, without any reported problems. 

After the tests had been running asynchronously for some weeks, ISS received a phone call at lunchtime from School A reporting that there was a serious problem with a Blackboard test.  The Blackboard administrator ascertained that the test, which had been launched from Blackboard, was actually a Perception test, and that at the end of their first session, most participants had been unable to continue to a second test because Perception was returning ‘server unavailable’ messages. 

It turned out that School A had chosen that day to run a series of invigilated medium/high stakes tests for 100 students using the Assessment component of the tests.  The LT was away from the department and uncontactable when the problems occurred.  ISS and the CAAO were unaware of this event, and were caught by surprise. Perception’s logs showed that the Oracle database server used to store the students’ responses had ‘disappeared’ at the same time the Perception problem occurred.  Investigation revealed that the students’ responses had been saved, but this couldn’t bring back participants who had abandoned their attempts and gone home.

When the CAAO asked the Oracle database administrators (DBAs) if they knew why the database connection was lost, they reported that the central Oracle server (a large system shared with many other important high-volume services) had been restarted at that time with the aim of rectifying a reduction in performance.

The tutors were understandably disappointed with the performance of the CAA system.  As far as they were concerned, the system had let them down and it was immaterial whether the problems had been technical or procedural.

 

 

 

 

 

 

 

 

Case 2 overview

A previous CAA pilot project using WebCT was terminated in School B when the WebCT licence expired early in 2003. A tutor responsible for a large Year 1 undergraduate course had participated in an earlier CAA pilot project based on WebCT which used several hundred objective items from one of the course set texts, and engaged a research assistant to put all the questions into WebCT format.  The WebCT tests were constructed as random selections from the different topic pools.  Another School B tutor (the second author of this paper) took the course over and attended a half-day Perception course at the beginning of September 2003, and used the Windows Authoring tools to construct a set of practice and summative assessments that drew from the same large question pools. 280 School B students took the practice test in excess of three thousand times over the next few weeks up until the day of the summative test early in November 2003, which was timed to happen a few days after the problems experienced in submitting responses by participants in a separate synchronous Perception test run by School A.

School B used the Perception system to run invigilated medium stakes summative tests for 280 students that were to be delivered in two roughly equal sittings. The first sitting was distributed between four of the largest public workstation clusters which are housed in two buildings far apart. A security alarm went off right outside the largest workstation room and students worked through the noise.  A Security person hung his hat on the siren in an attempt to muffle it, but nerves were frayed by the time participants were ready to finally submit their answers, but most were unable to submit their assessments.  Most were unable to take any further action because they were using Perception Secure Browser (PSB. The workstations were subsequently reset and the test terminated because more than 100 impatient students were waiting to begin the second sitting. 

Furthermore the largest room was double-booked with another class, and the software they required was only available in that room. Students from that class turned up time expecting to finish an assignment that was due in at the end of that day.  Some were so unimpressed that they expressed their disappointment plainly to the tutor, who was by this time under considerable strain.

The second sitting students were admitted and attempts made to start their assessment, but the system responded so slowly that even after 20 minutes, only around 15% of the students had begun so the assessment was terminated. Analysis showed that although the majority (168 out of 171) of students  from the first sitting appeared to have had their results saved normally, three were completely absent but were later found in Perception’s local ‘progress’ files by which time the academics had decided to abandon the assessment due to the risk of appeals. This occurred on a Friday afternoon, and the tutor developed an alternative assessment strategy over the weekend.

 

 

 

 

 

 

 

 

 

 

A comparison between the two cases

Having given an overview of the two ‘failure’ cases, Table 1 gives a cross-case analysis (Yin 2003) of the key differences between them.

Case 1 (School A)

Case 2 (School B)

Communication gaps existed between tutors and local learning technologist, and between the School and the CAAO

Tutor maintained direct contact at all times with the CAAO

Internal communication gap within ISS between the Oracle DBAs and the CAA Officer

Communication gap between DBAs and Perception administrator had been fixed

Tutors uninformed about risks associated with CAA

Tutors risk aware

Tutors unimpressed- expected to simply turn up and invigilate.  Tests prepared for them by a LT

Tutors were enthusiasts- arranged central workstation bookings and produced lists of which students in which room- took a lot of care.                                                                         

Locally- booked workstation room- no timetabling clash

Procedural issue of a centrally-booked workstation room timetabling clash

No environmental problems

Alarm went off- an unpredictable environmental event exacerbated a difficult situation

Paper-based testing is more attractive.

Paper-based alternative means more essays to mark, which is unattractive.

Inflexible approach- didn’t work once, so dropped it

Tutor suggested ‘optional substitution’ of formative marks

Outcome: CAA abandoned

Outcome: graceful recovery

Table 1- Two failed CAA tests compared

 

Key learnings and recommendations

The main learning points from this research fall into two categories: human factors and technical strategies.

Human factors

The key element in the graceful recovery of the School B assessment exercise appears to have been the flexible approach adopted by the tutor. His strategy of promptly offering the students a choice between the best of their practice scores or retaking the test on paper undoubtedly forestalled a rebellion (no appeals have been received six months after the test). His strategy of reducing the impact of assessment failures by having smaller sessions with reasonable gaps between them appears sound, and should form part of the advice given to tutors who are planning large-scale CAA assessments {Harwood, 2004 139 /id}.

The shift in complexity is being managed and the communication gaps are being closed through suitable CAA procedures: a challenge for the future may be to ensure these are properly adopted by the local CAA community.

Technical strategies

Two approaches are being taken to resolve the issue of workstations ‘freezing’ during Perception sessions.  The first is to avoid using the ‘autosave’ feature of the QXQSAYG template, which threw a dialogue box when the Oracle server could not respond quickly enough. The other is to use Questionmark Secure (QS) rather than PSB which hid dialogue boxes, thereby completing the illusion that Perception had crashed. 

The resilience of the system is being improved in three ways: firstly, the original Perception server will be joined by a second and the two then load-balanced and dedicated to delivering assessments.  The preview function will devolve to a third dedicated ‘preview’ server. Secondly, a high-availability Oracle server will be dedicated solely for Perception databases: this addresses the possibility of congestion due to the Oracle server having to swap other client applications. Thirdly, the Perception servers will be upgraded to Windows 2003 in order to use Microsoft’s IIS 6.0 web server, which is more configurable, and allegedly more robust, than the current Windows 2000 Server/IIS 5.0 configuration.

ISS will conduct regular and frequent load-testing of the Perception system, with the aim of giving reasonable warning of service reductions so that the risk of a failure at the point of delivery can be minimised.

 

Closing comments

A close partnership has grown up between the University and Question Mark in the aftermath of joint efforts made to resolve these problems.  However, it may be worth asking when such partnerships should properly begin. If a proactive approach is taken with building partnerships and sharing critical information with vendors before catastrophes occur (Mohr & Spekman 1994) then it may be possible to pre-empt  them.  Experiences such as these should be shared within the CAA community.  The extent of the damage caused by these two cases to the further uptake of CAA within the institution is yet to be assessed.

 

References

Baty, P. 2004 Litigation fees top £15m as academic disputes grow. Times Higher Education Supplement . 12-3-2004.

Harwood, I. 2004 When summative computer-assisted assessments go wrong: disaster recovery after a major failure. Submitted to British Journal of Educational Technology .

Harwood, I. & Warburton, W. 2004 Thinking the unthinkable: using project risk management when introducing computer-assisted assessments. 8th International Computer-assisted Assessment (CAA) Conference, 6th & 7th July 2004, Loughborough University, UK. 

Heyl, B. S. 2001, "Ethnographic Interviewing," in Handbook of Ethnography, P. Atkinson et al., eds., Sage, London, pp. 369-383.

Kennedy, N. (1998) Experiences of assessing LMU students over the web.  Accessed at http://www.ulst.ac.uk/cticomp/Kennedy.html on 3-4-2004

Mohr, J. & Spekman, R. 1994, "Characteristics of Partnership Success: Partnership Attributes, Communication Behaviour and Conflict Resolution", Strategic Management Journal, vol. 15, no. 2, pp. 135-152.

QAA (1998) University of Bath Quality Audit Report, May 1998.  Accessed at http://www.qaa.ac.uk/revreps/instrev/bath/comms.htm on 3-9-1904

Yin, R. 2003, Case study research: design and methods, 3rd edn, Sage, Thousand Oaks.

Zakrzewski, S. & Steven, C. 2000, "A Model for Computer-based Assessment: the catherine wheel principle", Assessment & Evaluation in Higher Education, vol. 25, no. 2, pp. 201-215.