日本テスト学会誌 Vol.11 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(11-1)>Abstract

JART Vol.11 No.1

▶ General research  
Attractive Distractors Depend on Proficiencies of Examinees: In Multiple-Choice Reading Comprehension Tests in English
Takahiro Terao1, Kazuhiro Yasunaga2, Hidetoki Ishii1, Hiroyuki Noguchi1
1Nagoya University, 2Japan Society for the Promotion of Science, The University of Tokyo
This study aims to examine the effect of attractive distractors depending on examinees’ proficiency. In Study 1, 16 participants were required to comment why each distractors are incorrect such as to determine the structure of attractive distractors, using entrance examinations administered before in private universities in Japan. In Study 2, 366 examinees took multiple-choice reading comprehension tests in English. Multinomial logistic regression analysis and the analysis of residual deviance revealed that low proficiency group chose distractors which included negative expressions and included causality without description in the passage, while middle proficiency group chose distractors which included negative expressions and causality with some descriptions in the passage. It was also evident that in the high level, examinees chose distractors using antonyms with some descriptions in the passage. Implications to item writing were discussed.
Keywords: multiple-choice tests, reading comprehension tests in English, attractive distractors, item analysis
▶ General research  
Some Important Things for Developing Multiple-Choice Items ― on the basis of the results of a survey of experts on item construction ―
Sayaka Arai
Research Division, The National Center for University Entrance Examinations
Tests are used for selection and qualification and they have a big effect on individuals and society. Therefore, tests and test items used to construct tests must be appropriately developed.
The aim of this study is to reveal the things that are most important when developing multiple-choice items by taking a survey of experts involved in item construction. There are two parts to this study. In the first study, I compared several item-writing guidelines and asked the experts what they thought about each item in the guidelines, and in the second study, I asked them what the important things for item construction were. The results showed that we do not necessarily have to follow all the guidelines; it depends on the objective of the test. The results also suggested that the important things for developing multiple-choice items were: 1) the items properly reflect the objective of the test. 2) The items properly measure the ability of examinees which item-writers intend to measure. 3) The items are instructive to examinees.
Keywords: item construction, multiple-choice items, guidelines
▶ Case study  
How confidently can we assess the competence of 5th-year medical students in clinical clerkship training?
Manabu Miyamoto1, Ayako Miyazaki1, Seiichi Ishii2
1Education Center, Faculty of Medicine, Osaka Medical College, 2Office of Medical Education, Tohoku University Graduate School of Medicine
We investigated the reliability of the assessment of 5th-year medical students by faculty in clinical training. To this end the generalizability theory was employed, which used Students, Rotations (Raters) and Items as factors. A total of 203 5th-year medical students of Osaka Medical College (OMC) underwent clinical training in the academic years of 2010 and 2011. Students rotated all the 26 clinical departments at the OMC hospital with each rotation lasting one or two weeks, which made up 41 weeks of clinical training a year. Of the 26 departments, we selected five rotations (raters) whose assessment reliability was presumably higher than others’ for this study with the criteria that students stayed two weeks and one single doctor assessed all students over the two-year period in the department (rotation). The clinical evaluation form was comprised of 16 items of three major domains: 4 items for knowledge (cognitive domain), 8 for skills (psychomotor domain) and 4 for attitudes (affective domain), each of which was scored by a 4-point Likert scale. The generalizability study showed that Student-Rotation (Rater) interaction was the largest variances in all three domains while variances of Student were small. G-coefficients of the domains ranged from 0.31 to 0.43, and D study revealed that we would need 15.60 to 26.24 rotations (raters) to obtain the G-coefficient of 0.7.
To overcome the low reliability of student assessment by faculty in our clinical training, we recommend extending each rotation period because fragmented clinical training would not give students the chance to foster clinical competence by working as a member of a medical team and longer observation periods may lead to improved reliability of student assessment by faculty instead of increasing the number of rotations.
Keywords: work-place assessment, G-coefficient, D-study, 5th-year medical student, clinical clerkship
▶ Case study  
An Investigation of the Variability among Japanese Schools in the Scales on Teacher Co-operation in TALIS 2013
Yasuhito Hagiwara, Kenji Matsubara
National Institute for Educational Policy Research
In the two scales on teacher co-operation in OECD’s TALIS, teachers were asked about the frequency of their co-operative activities with other teachers. It is thought that the frequency that a teacher implemented might depend on other teachers’ co-operation in his/her school and the framework of the school. Therefore, the variability among schools in the scales on teacher co-operation is more likely hypothesized compared to the scale on constructivist beliefs, which come from individual teachers. This study analyzes the variability in these scales with the random-intercepts, random-loadings model. Contrary to the result in the scale on constructivist beliefs, the result in the scale on exchange and coordination for teaching or the one on professional collaboration shows the residual variance of the intercept for every item. Furthermore, the result in the scale on professional collaboration also shows the residual variance of the factor loading for “Observe other teachers’ classes and provide feedback”.
Keywords: Teaching and Learning International Survey, teacher co-operation, factor analysis, measurement invariance, random-intercepts, random-loadings model
▶ Case study  
An evaluation of using Latent Rank Theory to construct an item bank for the CBT for clinical hospital practice in nursing colleges
Haruhiko Mitsunaga
Organization for Educational and Student Support, Shimane University
Building an item bank which most effectively facilitates the evaluation of eligibility for clinical hospital practice in nursing colleges is clearly desirable. However, the 2PL IRT model, commonly used to standardize item parameters, requires more than 300 examinees in order to estimate stable item parameters (Toyoda, 2012). Although Mitsunaga, et al (2014) used some prior distribution to obtain stable item parameters-based estimates from smaller datasets, in practice the relevant prior information is not always available. In this paper, eight CBT test forms were administered to eight groups, each of which had fewer than 200 examinees. To obtain feasible parameter estimates using such small datasets, latent rank theory (LRT; Shojima, 2009) was applied. The results suggest that relatively accurate LRT estimates are possible without any prior distribution. This can be achieved by setting up a group of small datasets which are conducive to IRT analysis where evaluation of item characteristics and examinee ability estimates can be carried out by the comparison of item parameters.
Keywords: practical nursing, latent rank theory, item response theory, item bank
▶ Case study  
On the Utility of Cognitive Diagnostic Models: Application to the Kyoukenshiki Standardized Achievement Test NRT
Masayuki Suzuki1, Tetsuya Toyota2, Kazuhiro Yamaguchi3, Yuan Sun4
1Showa Women's University, 2Aoyama Gakuin University, 3the University of Tokyo, 4National Institute of Informatics
Most traditional tests, which only report a small number of content-based subscores, total scores, or T-scores, are almost no use for providing diagnostic information about students’ strengths and weaknesses. In recent years, cognitive diagnostic modeling, which has been developed to provide detailed information concerning the extent to which students have mastered study contents, has been attracting a great deal of attention. In this paper, we applied several cognitive diagnostic models to the Kyoukenshiki standardized achievement test NRT and investigated its utility in educational practice. The results showed that we could obtain diagnostic information about students’ knowledge states, which was not possible to attain from the content-based subscores and total score. In addition, we discussed the problems in applying cognitive diagnostic models and issues to be addressed in the future.
Keywords: cognitive diagnostic model, G-DINA model, Kyoukenshiki standardized achievement test NRT, mathematics
▶ Case study  
Examining the Revised Placement Test Using the Rasch Model
Eri Banno1, Tomoko Watanabe2
1Okayama University, 2Hiroshima University
This article examines the placement test for a Japanese language course that was revised in 2012 using the Rasch model. Using data from the placement test that was used before 2012 and the one that was revised in 2012, we investigated how the revision affected the test results. The participants were 487 international students who were enrolled in a Japanese program at a university in Japan. The results indicate that the revised test was more difficult than the old one, which was the intent of the revision. At the same time, the results show that taking out the rubi from each kanji did not change the difficulty level of the questions. Furthermore, the results suggest that it was necessary to increase the number of difficult items in the revised test because the test was still easy for the examinees, and also to revise the multiple-choice items that did not work well.
Keywords: placement test, Japanese course, Rasch model, classical test theory, difficulty
▶ Case study  
Revising test items in order to improve the item quality: A practical study using BJT Business Japanese Proficiency Test data as an example
Wakana Onozuka1, Kiyokata Kato2, Yumiko Umeki3, Akiko Echizenya4, Shin-ichi Mayekawa5
1Tokyo Fuji University, 2Tokyo Gakugei University, 3Utsunomiya University, 4Tokyo University of Agriculture and Technology, 5Tokyo Institute of Technology
The purpose of this study is to show that those items that did not meet the statistical criteria to be entered into the item bank can be improved in terms of item statistics by rewriting them with a focus on item quality. We (1) selected those items that did not meet statistical criterion from the past Business Japanese Proficiency Test (BJT) administration, (2) rewrote those items by enhancing their quality, and (3) found that the revision did improve the statistical characteristics of those items based on experimental data.
Keywords: revision of test items, Item Response Theory, large scale test of Japanese language, BJT Business Japanese Proficiency Test, item bank maintenance
▶ Review  
A Review of Uniform Test Assembly Methods for e-Testing
Takatoshi Ishii1, Maomi Ueno2
1Tokyo Metropolitan University, 2The University of Electro-Communications
To record and score examinees’ responses, ISO/IEC 23988:2007 provides global standards on the use of IT to deliver assessments to the examinees. For high-stakes test, this standard is recommending to use uniform test forms in which each form comprises a different set of items but which still must have equivalent specifications such as equivalent amounts of test information based on item response theory (IRT). However, assembly of uniform test forms suffers NP-hard problem because it is a combinational optimization problem of selecting items from an item bank. Therefore, the test assembly has made rapid progress in recent years, through the aid of information technology advances. In this paper, we introduce some typical uniform test assembly methods and compare to each other to explain their advantages and disadvantages.
Keywords: e-Testing, uniform test assembly, Item Response Theory, Optimization Problem