Putting together a half-decent achievement test
If you work at a JHS, HS, college, senmon gakkko, or university in Japan you have probably just completed several year or semester end achievement tests. After all, you need grades for your students so some kind of evaluation is required. But this is an area in which a lot of mistakes are made, a lot of educational principles violated...
I'd like to think that testing is something I know a little about, an area that I've become at least a little sophisticated with. It was one of my specializations during my MA days as well as one of those areas in which I've kept up the research level, so I'm hoping that a few of the things I mention below might carry some weight above and beyond the 'some guy on the internet' level of credibility.
First point-
Achievement tests are not placement tests nor, usually, are they proficiency tests.
In an achievement test you are evaluating the students' course work. That means the focus of test content must be upon what students have, or were supposed to have, covered in the course. This means that any content that was not dealt with in the course should not be part of the test. It means that the skill emphasis should match the skills that you were trying to teach in your class. Test tasks should resemble those tasks which were practiced during the course. You are not gauging the students' overall English ability or general skill- which would be more representative of a placement or proficiency test- so don't try to. The test should measure a student's ability to meet the specific course goals as set out in the syllabus.
Second point-
If you are an educator the test should have an educational function.
It should have a pedagogical purpose as well as an evaluative function. Students should be learning from their tests. This means that students must know what they did right, what they did wrong and be given a chance to fix it. In other words a good achievement test has a diagnostic function. This has several administrative implications:
1. You must give the test back to the students. It belongs to them.
2. There must be some type of review or feedback for the students.
3. You shouldn't give the test in the final class or else you can't review it.
4. Students should be able to find out what the correct or model answers are.
5. Students who did poorly should be made to do a re-test, or two, until they show that they have learned the material (or skill).
6. Why not have students obtain good or correct answers on those sections where they did poorly by checking with peers? I do a 'test interview' where students ask one another those questions they didn't answer correctly and if the partner knows the proper answer, they can teach (not just 'tell') it to the other student.
Third point-
You can and should diagnose your own teaching effectiveness from the test results.
If students do poorly on the test, or on specific items on the test, it is very likely because either 1) the question, task, or entire test was invalid ( the test didn't actually test what is was supposed to) or unreliable (if a similar test was given to the similar students at a different time and place scores would be very different- meaning that happenstance affected the test results, usually as a result of poor test design).
2) you didn't teach whatever it is that you were testing well enough.
This should be telling you sometyhing. After all, tests test the teacher's effectiveness as well as the students'.
Fourth point-
You need to test more than just recognition (memory) and discrete-item knowledge.
Memory is a limited skill. Not only that but memory is not just recognition (the most passive, receptive aspect of memory) but also recall (contextual understanding), and reproduction (application). If you were teaching a class that was expected to focus on developing productive skills but give a test that measures only memory-recognition you have an invalid test.
Likewise, language is not just a collection of discrete-item knowledge. It is a dynamic system that involves numerous social and pragmatic considerations. So again, if your class was expected to develop student skills in using English within meaningful and/or practical contexts, if you focus mainly (or solely) on discrete-items you will have made an invalid test, since the skills you are supposedly trying to inculcate will have escaped the net of evaluation.
Fifth point-
The test can easily be used as a study and/or review experience
Open-book tests are great. Students can once again review material and find those things that the teacher wants them to understand. Open-book test success also relies more on a general comprehensive understanding of a subject as opposed to memorizing discrete items. Of course, given that the test is open-book we should also expect standards to be high. I have come to notice that students who are well-organized and think actively succeed at these tests while the laggards who weren't paying much attention or making much of an effort all year rarely rise above their 'stations'- at least on the first test. This doesn't always happen on discrete-point knowledge-based TOEIC-type tests.
Providing students with the test tasks or questions or old exams in advance (they'll usually get them from their seniors anyway) can help too. By letting students know what to study for, you focus their energies on those things you really want to inculcate and leave less to random chance, circumstance or wasted/misguided student effort.
Sixth point-
Ongoing evaluation, especially if you are using a variety of evaluative means and measures, is more effective than the traditional 'one final paper exam' format.
Language learning is a process and so the evaluation should be process-based and focus less on the one, final 'this-is-your-official-result' mode of testing. Using a variety of testing methods and means allows students who respond differently to different challenges to strut their stuff. Not all 'good' students are sharp at paper tests and may do much better on a role-play, report, or some type of visual/tactile task. Ideally, using all test types you can get a panoramic view of their all-round skills, and therefore a more accurate reading of their English abilities (assuming that you are trying to educate them in holistic way, that is).
Weighting tests is also important. Putting something like 80% on a final test might not be a good indicator of actual student ability over the entire course of the class. Breaking evaluation up into 20% increments allows for more types of evaluation and widens range of the criteria. It also tends to keep students alert and focused.
Seventh point-
Let students have some say in the test content
Productive, open-ended tasks are to be encouraged as these allow for some self-expression and variety, letting students use the language while actively thinking and engaging it. Most teachers will tell you that in terms of marking, these tasks and problems are easier to grade- and tend to provide a more comprehensive view of actual student abilities. Even better, allow students to make some tests themselves. This will allow for a good review of content and also show the teacher what students have learned (or not), or feel is important (or not). And what a teacher learns from this can be applied to next year's lesson plans.
I allow my students to appeal their test grades too- as long as they do so in English. If they feel that the grade on a 'subjective' test or item was unfair they have the opportunity to explain to me why their score should be higher, a process which demands that they consider both the test result and content but also how they will plead their cases in front of me.
Reader suggestions on testing are more than welcome in the comments section.