(Teacher made test for the 1st semester of the 1st Class at SMPN 4 Jambi 2004/2005 School Years)
I. Speaking Test
It is held on Monday, December 27th, 2004. Teacher had divided the students into groups. Then teacher asked each group to make conversation about “Introduction” and performed it in front of the class. Speaking Test for Class 1 A and 1 B are joined together.
II. Written Test
It is held on Tuesday, December 28th, 2004. The written test is arranged as follow:
A. Listen to the song and find words about job! Put the job words in their right places!
In Penny Lane there is a (1)_______________ showing photographs,
of every head he’s had the pleasure to know.
And all the people that come and go,
stop and say hello.
On the corner is a (2) _______________ with a motorcar,
the little children laugh at him behind his back.
And the (3) ___________________ never wears a mack,
in the pouring rain, very strange.
Penny Lane is in my ears and in my eyes.
There, beneath the blue, suburban skies,
I sit, and meanwhile back.
In Penny Lane there is a (4) ________________ with an hourglass,
and in his pocket is a portrait of the Queen.
He likes to keep his fire engine clean,
it’s a clean machine.
Penny Lane is in my ears and in my eyes.
A four of fish and finger pies,
in summer, meanwhile back.
Behind the shelter in the middle of a roundabout,
the pretty (5) ________________ is selling poppies from a tray.
And tho’ she feels as if she’s in a play,
she is anyway.
In Panny Lane the (6) _______________ shaves another customer,
we see the (7) _______________ sitting waiting for a trim.
And then the (8) _________________ rushes in,
from the pouring rain, very strange.
Penny Lane is in my ears and in my eyes.
There, beneath the blue, suburban skies,
I sit, and meanwhile back.
Penny Lane is in my ears and in my eyes.
There, beneath the blue, suburban skies,
B. Write a description about person at the picture!
C. Choose the correct answer!
1. These are my friends. _______ ______ Vira and Anwar.
a. You are b. They are
c. We are d. She is
2. I have a sister. ________ _____ smart.
a. I am b. He is
c. She is d. It is
3. I have a dog. It has a house. ______ house is small.
a. My b. Your
c. Its d. Our
4. Friend : Hi, how are you?
You : _____________
a. Very well, and you? b. Thank you.
c. Nice to meet you. d. Have a nice day.
5. Look at the picture!
X : Excuse me, where is the Language Lab please?
Y : It is __________________________
a. Beside the Grade 1B room. b. In front of music room
c. Beside the library d. Between the hall and library
6. Andrew never _____________ about his schedule.
a. think b. thinks
c. thought d. is thought
7. How __________ Anna and July ___________ to school?
a. do, go b. does, go
c. are, go d. is, go
8. A living room is __________________
a. a place to wash b. a place to grow flower
c. a place to cook d. a place to relax to talk
9. A dining room is __________________
a. a place to eat b. a place to cook
c. a place to keep a car d. a place to sleep
10. We need ______________ sugar to make ________________ of coffee.
a. a loaf, a glass b. a spoon, a plate
c. a loaf, a plate d. a spoon, a cup
Answer the questions based on the following text for questions number 11 – 15!
11. Tania save their money _______________
a. before rest time b. at rest time
c. after rest time d. in the morning
12. How long is the bank open?
a. 3 hours b. less than 2 hours
c. 2 hours d. 4 hours
13. Tania buy her daily needs __________________
a. in the morning b. in the afternoon
c. in the evening d. before dinner
14. What does Tania do from 3:45 pm – 4:30 pm?
a. She goes to computer class b. She plays musical instrument
c. She accesses the internet d. She buys sports equipment
15. How long does Tania take a rest?
a. Forty-five minutes b. Thirty minutes
c. One hour d. One and a half hours
RELIABILITY AND VALIDITY TEST
(BACHMAN AND PALMER (1996) FRAMEWORK OF EVALUATION)
The Bachman and Palmer 1996 framework evaluation of test usefulness will be used to evaluate the Teacher English made test for the 1st semester of the 1st class at SMPN 4 Jambi 2004/2005 School Years. The questions for logical evaluation of usefulness as posed by Bachman and Palmer will be identified in italics.
1) To what extent do characteristics of the test setting vary from one administration of the test to another?
All students take the tests without air-conditioning, comfortable classrooms with minimal background noise. All students did the speaking test in December 27th, 2004. Written test is held in December 28th, 2004.
2) To what extent do characteristics of the test rubric vary in an unmotivated way from one part of the test to another, or on different forms of the test?
Instructions and questions are clear for all questions, but the question ten part C is missed the word “of” (We need __________ sugar to make _________ of coffee. Choice of the answer: (a) a loaf, a glass; (b) a spoon, a plate; (c) a loaf, a plate; (d) a spoon, a cup).
3) To what extent do characteristics of the test input, vary in an unmotivated way from one part of the test to another and on different forms of the test?
The input is from text book which is used by the students based on the goals of the curriculum. This is satisfactory.
4) To what extent do characteristics of the expected response vary in an unmotivated way from one part of the test to another or on different forms of the test?
Response for speaking test is conversation. In written test, responses for part A are short answers, by putting the missing words through listening to the song. Responses for part B is writing a descriptive paragraph. The responses for part C are multiple choices. There are grammar questions, vocabulary questions, and reading questions. The vocabulary questions are mixed with the grammar questions. Questions 1 to 3 and 6 to 7 are grammar questions. Questions 4 to 5 and 8 to 10 are vocabulary questions. Thus 11 to 15 are reading questions. It would be far better, logical and easier for students if the grammar questions were clustered together.
5) To what extent do characteristics of the relationship between input and response vary in an unmotivated way to one part of the test to another, or in different forms of the test?
The input does not vary. But the responses are varies, they are conversation, finding a missing words through listening to the song, writing a descriptive paragraph, and multiple choices.
6) Is the language ability construct for this test clear and unambiguously defined?
Yes, it is. There are reading, grammars, vocabulary, and conversational components, but they are a few written questions, which is not a valid interpretation of English ability at all.
7) Is the language ability construct for the test relevant to the purpose of the test?
The purpose of the semester test of the 1st class at SMPN 4 Jambi 2004/2005 School Years is to rank students and to set a pass level to proceed to 2nd class. Students are ranked and compared among classes. There is speaking, listening, and writing component to the test. A multiple choice test is going to test their reading, grammar, and vocabulary ability. It is active and passive test construct.
8) To what extent does the test taker reflect the construct definition?
The speaking test reflects conversational ability. The written test reflects listening ability, writing ability, reading ability, and grammatical ability.
9) To what extent do the scoring procedures reflect the construct definition?
It is fair to mark the students performed in speaking, writing, listening, and chose the correct answer in multiple choice. The multiple choices have at least one alternative correct answer. Only one answer is marked correct. In writing good test questions for multiple choice exams, evaluating multiple choice tests states, "Make sure there is only one correct answer".
Speaking, writing, listening, and multiple choices scores is adequate enough to assess a range of English skills, depth of skills, or to indicate if a student is ready to progress to the 2nd class.
10) Will the scores obtained from the test help us to make the desired interpretations about the test takers' language ability?
It is active and passive test. Although it is just a few questions, but the listening, speaking, and writing ability is tested. The results can help in interpreting learners' English ability.
11) What characteristics of the test setting are likely to cause different test takers to perform differently?
All students complete answers on a folio paper. All conditions are the same. This aspect is satisfactory.
12) What characteristics of the test rubric are likely to cause different test takers to perform differently?
The instructions for individual questions are in English and are basic in structure. But there is no instruction can cause test takers to perform differently.
13) What characteristics of the test input are likely to cause different test takers to perform differently?
Incorrect grammar, words, spelling, and punctuation are input that can cause problems with test takers. But this is not occurs in these tests. Incorrect grammar is just found in questions 10 part C. Incorrect test questions invalidate the test.
14) What characteristics of the expected response are likely to cause different test takers to perform differently?
Once again, having several possible correct answers in the question, whilst only marking one correct, can cause major problems with test takers. Response for speaking and writing test of each student are different according to their competence. These answers are scored by using rubric scoring. For listening and multiple choice questions are having one possible correct answer.
15) What characteristics of the relationship between input and response are likely to cause different test takers to perform differently?
The test construct included Indonesian boys' and girls' names which does not cause problems when trying to identify gender or subject. This is quite good. There is no question that has cultural bias. The questions have input commonly known to Indonesians.
16) To what extent does the description of tasks in the TLU [target language use] domain include information about the setting input, expected response and relationship between input and response?
The setting is included in some reading questions, but absent on grammatical and vocabulary questions. Question 7 epitomizes the confusion over expected response and input. The expected response by the test constructor is "a) which" [photographs]. However, "b) what" and even "c) whose" is grammatically acceptable. The expected response must match the input given in an authentic valid test. Kehoe (1995, para. 3) states, "As a rule one is concerned with writing stems that are clear and parsimonious, answers that are unequivocal and chosen by the students who do best on the test.." Question 28 lacks input for the response. There is no specific information to base the answer of "b". In essence, the TLU needs more contextual support and only one correct response. Question 42 relies heavily on knowledge of an Indonesian folk tale. Without knowledge of the folk tale, construction of the paragraph could differ from the expected response. There is a very small minority of students in Indonesia who do not know this folk tale, for example, foreign nationals sitting the tests.
17. To what extent do the characteristics of the test task correspond to those of the TLU tasks?
Conversational items tested as multiple choice items are far from authentic! Thus, question 53 in reality could be a, b, c, or d depending on context. There are also many grammatical mistakes making the test non authentic.
18. To what extent does the task presuppose the appropriate area or level of topical knowledge, and to what extent can we expect the test takers to have this area or level of topical knowledge?
As previously mentioned question 42 presupposes topical knowledge of an Indonesian folk tale. Overall, the topical knowledge is appropriate for 14 to 15 year old students, for example, the areas of media, sickness, and sport. There is generally no great need to use specific topical knowledge to answer questions.
19. To what extent are the personal characteristics of the test takers included in the design statement?
The design is for final year students of junior high schools in Indonesia in the Jakarta district. It is assumed all test takers are Indonesian, aged 14 to 15, and all speak Indonesian. All have done the pre-UAN test and it is assumed all have completed nine years of formal schooling. It is to be noted that there is a tiny minority of foreign nationals who also sit the test, but the government assumes and expects that they get schooled in international schools.
20. To what extent are the characteristics of the test tasks suitable for test takers with the specified personal characteristics?
The test tasks are very much appropriate for the average and lower ability students. In this regard the test is suitable, but it fails to take into account the higher ability students to which the tasks are at a functional level far below their ability. That is, some students who achieve excellent results in native speaking English tests, for the same educational level in English, are tested on tasks that do not challenge nor address their level of ability. Year nine students at the school where I teach have performed well above average for the 2001 year 9/10 Australian English schools' competition test items, with one student scoring 100%. However many students from Indonesia fail the UAN test. Thus there is a big range of ability, whereas the test tasks do not cover the whole range.
21. Does the processing required in a test task involve a very narrow range of areas of language knowledge?
As discussed previously, the tasks engage a very limited range of language knowledge.
22. What language functions, other that the simple demonstration of language ability, are involved in processing the input and formulating a response?
23. To what extent are the test tasks interdependent?
They are not. All questions are dependent on the specific questions or reading passage and are independent of other items.
24. How much opportunity for strategy involvement is provided?
A pre-UAN test is administered to all students under the same test conditions two weeks prior to the UAN test. There is also available a text with past year UAN test items available to teachers in all schools to prepare students. The construction of tests is very similar year to year and thus provides students and teachers with ample time to prepare.
25. Is this test likely to evoke an affective response that would make it relatively easy or difficult for the test takers to perform at their best?
No. The topics are culturally sensitive and non-emotive.
26. To what extent might the experience of taking the test or the feedback received affect characteristics of test takers that relate to language use?
This test is passive and the language to be tested is done so testing only understanding, neglecting higher skills such as processing, comparing, debating and even production of language. Hughes (2003, p. 1) claims, "If the skill of writing, for example, is tested only by multiple choice items then there is great pressure to practise such items rather than practise the skill of writing itself. This is clearly undesirable." The UAN test aims to test grammar, but students are not required to construct any sentences. The students are to learn conversational conventions, but not tested orally. Research by Hadiatmaja, cited by Somantri (2003, para. 6) observes that Indonesian school students learning English "are passive and receptive only [translation]." Thus the backwash effect of the UAN tests can be seen in students' passive and receptive skill focus with problems in construction of discourse in speaking and writing.
27. What provisions are there for involving test takers directly, or for collecting and utilizing feedback from test takers directly, or for collecting and utilizing feedback from test takers and the design and development of the tests?
There are no known provisions. Students do not have the opportunity to provide any feedback or have any input into the development of the test.
28. How relevant, complete, and meaningful is the feedback that is provided to test takers?
Correct answers, and students' responses are given showing their mistakes. A final score and school ranking is also given. There are just statistics and students are not given any explanation to why test items are correct. No information is given on their language ability or mastery of subject matter. It is difficult for the individual teacher to provide good feedback due to the amount of alternative correct answers.
29. Are decision procedures and criteria applied uniformly to all groups of test takers?
Yes. All schools follow the same criteria of the UAN score and scores are objective, independent on participation, attendance, attitude or other factors.
30. How relevant and appropriate are the test scores to the decisions to be made?
The test score is the single factor in determining the grade and to determine if the student can proceed to senior high school.
31. Are test takers fully informed about the procedures and criteria that will be used in making decisions?
32. Are these procedures and criteria actually followed in making the decisions?
Yes. There are no exceptions, though those who fail may sit for the test again.
33. How consistent are the areas of language ability to be measured with those that are included in teaching materials?
The teachings materials of teachers usually match the language ability to be measured, as is the case in the majority of schools. However, schools such as my school do not follow the national curriculum per se and go way beyond including active skills and including listening, speaking, and writing skills, in addition to the reading and grammar of the national curriculum. These schools, the teachers, and students feel uncomfortable with the test as it does not meet their learning content nor does it test most of their ability.
34. How consistent are the characteristics of the test and test tasks with the characteristics of teaching and learning activities?
This is dependent on the individual teacher. Due to the passive nature of the tests, a lot of students learn English in a passive manner and as a result Artsiyanti (2002, para. 6) claims, "Students do not know when structures [grammar] have to be used and how to apply them in everyday life [translation]." The test tasks contribute to a negative backwash effect in the classrooms.
35. How consistent is the purpose of the test with the values and goals of teachers and of the instructional program?
The test is far from achieving the goals of English at IPEKA. Due to its limitations in passive receptive skills it is also not consistent with goals of other schools' English courses, even though it is consistent with the national curriculum.
36. Are the interpretations we make of the test scores consistent with the values and goals of society and the education system?
If wages are a reflection on worth, society does not value the worth of teachers in Indonesia in comparison to western countries. The average wage of a teacher is Rp 700,000 to Rp 800,000 (just over $100 AUD) a month (Sistem pendidikan harus dirombak secara radikal, 2004). Schools are often dilapidated and some students cannot afford their tuition. Many language teachers do not have adequate mastery of English to teach effectively and efficiently in schools in Indonesia. Somehow test scores are regarded as highly valid and respected by most as the major measure of performance in English and as a means to determine the academic progression of students to the next level.
There are more pressing concerns here of terrorism, hunger, and work. The acceptance by society and the educational system of the test scores should not equate with the usefulness of the test. The UAN needs reform!
37. To what extent to the values and goals of the test developer coincide or conflict with those of society and the education system?
There is agreement with the education system and most of society.
38. What are the potential consequences, both positive and negative, for society and the education system, of using a test in this particular way?
The backwash effect contributes to passive learners and English speakers not confident in production of English, of which is the case in Indonesia today.
39. What is the most desirable positive consequence, or the best thing that could happen as a result of using the test in this particular way, and how likely is this is happen?
The test could act as a motivating factor for some in mastering passive English. This is still not likely.
40. What is the least desirable negative consequence, or the worst thing that could happen as a result of using the test in this particular way, and how likely is this to happen?
As mentioned previously, many students will learn an understanding of reading and grammar in a passive and receptive manner without learning active skills and to the exclusion of speaking, listening, and writing. This is highly likely as it is already evidenced throughout the country.
41. What type and relative amounts of resources are required for: (a.) the design stage, (b.) the operationalization stage and, (c.) the administration stage?
There is not much money available for the UAN tests, nor time, nor expertise. The design is done by a few local English teachers with no resources provided by the government, apart from the syllabus and test construction design. The operation is done by a central team by Scanton computer marking. The administration of the tests is by individual schools.
42. What resources will be available for carrying out a. b. and c. above?
Teachers, computers, printers and paper are available. Resources are very limited in Indonesia due to its massive student population and limited budget.
The UAN is not very useful. It is not valid, authentic nor interactive and has negative impacts on learning. It is however, reasonably reliable and practical.
The purpose of the UAN is to measure a level of English competence to progress to senior high school. This obviously fails that. It is necessary first to determine the aim or goal of the test. Kitao and Kitao (1996b. para. 2) state, "The goal of the test is what you want to measure." There are many unmeasured skills that can be tested. Listening, writing, and speaking can all be assessed in addition to reading and grammar. In regard to grammar, Kitao and Kitao (1996a , conc. para.) state, "While the testing of grammatical knowledge is limited - - it does not necessarily indicate whether the testee can use the grammatical knowledge in a communicative situation - - it is sometimes necessary and useful."
Indonesian schools are moving towards an outcome based curriculum. A criterion reference test (CRT) could well be an excellent alternative to the present UAN test. Gorsuch (1997, para. 25) claims, "Only CRTs will allow teachers to set standards, measure achievement and give students valuable feedback at the course level." This could then present the opportunity for positive backwash so that students are active users of English and confident in all skills.
All in all the UAN fails to be useful because of its test construction which is riddled with mistakes and contains many alternative multiple choice answers that are correct. Hughes (2003, p.2) claims, "Students' true abilities are not always reflected in the test scores that they obtain." This is the case with the UAN test.