日本テスト学会誌 Vol.9 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(9-1)>Abstract

JART Vol.9 No.1

▶ Invited Paper  
Past, present and future views of testing technology —It's direction and necessary tasks—
Hiroshi Ikeda
Rikkyo (St. Paul's) University, Professor Emeritus
Test technology has greatly progressed from a small hand writing to a large scaled objective type testing in the 20th century with the help of optical mark reader (OMR) and high-speed computer devices. They stimulated the development of test theories and a lot of multivariate statistical analyses. Recent innovation of information and communication technologies (ICT) has made possible individualized test responses with free from time and places. Future testing will or should change it's major role from the present personnel selection to the lifelong and endless learning tool. Looking back on the past and present stage of testing, this article overviews future direction of test developments and some problems to be solved by test researchers.
Keywords: assembling and summarizing technique, objective type test, mark-sense reader, e-testing, mastery map
▶ General research  
Computerized Adaptive Test in situations that information about items can't be obtained sufficiently
Ryuichi Kumagai1, Joji Goto2, Naoko Nakaune3, Tadashi Shibayama1, Yoshikazu Sato2, Hiroyuki Noguchi4
1Tohoku University, 2Niigata University, 3National Institute for Educational Policy Research, 4Nagoya University
The development of a computerized adaptive test (CAT) system in situations that information about items can't be obtained sufficiently is described. The CAT system consists of 464 items assessing four content domains of mathematics. It uses a simple item presentation technique and an original scoring method. Nineteen people were involved in the project to develop this test, which took 2 years and 6 months to complete. We conducted two studies for validating this system. In Study 1, college students took the CAT, as well as a paper and pencil test. In Study 2, high school students took the CAT.
Keywords: computerized adaptive test, CAT, CBT, test development, neural test theory
▶ General research  
Relationships between Essay Tests and Subject Tests in University Entrance Examinations
Sayaka Arai, Tsunenori Ishioka, Hisao Miyano
Research Division, The National Center for University Entrance Examinations
As a part of university entrance examinations, essay tests are often carried out in addition to subject tests. However, the relationships between essay tests and subject tests are vague. In this study, we administered two types of essay tests and eight subjects of National center test for university admissions to the same examinees and analyzed the relationships between their test scores each other. The results showed that the correlation between the two essay tests was medium and that the correlations between the essay tests and National Center Tests were low. Principal factor analysis with promax rotation was performed on test scores, and three factors, namely, essays, liberal arts subjects, and science subjects were extracted. These results also suggested that performance on essay tests was related to that on liberal arts tests, but that the former type of test measured abilities different from those assessed by subject tests.
Keywords: Essay tests, National center test for university admissions, Subject tests, College admissions, Factor analysis
▶ General research  
The Application of Graded Response Model to the Test Data Violated the Assumption of Local Independence: Comparison with the Results of 2PLM
Tsuyoshi Izumi1, Shinji Yamanoi2, Tsuyoshi Yamada3, Takatomo Shirakawa4, Hideki Tsushima4
1Tohoku University Graduate School, 2The Japan Institute for Educational Measurement, Inc., 3Okayama University, 4Benesse Corporation
In this study, Graded Response Model(GRM) and 2 Parameter Logistic Model(2PLM) were examined to analyze the actual test data that did not satisfy the local independence assumption. And the estimates of item parameters, ability parameters, and test information curve obtained from these 2 models were compared. Q3(Yen, 1984) was used to assess the extent of local dependence. From the results of 2PLM, extremely large values of discrimination parameter estimates were obtained from items that had local dependence. Such a extremely values may influence the ability estimates or test information. About estimate of ability parameters, it was indicated that subjects having the same estimated ability parameter on 2PLM may have about ±1 vary estimated ability parameter on GRM.
Keywords: item response theory, local independence, local dependence, Graded Response Model, Q3
▶ Case study  
Development of Multi-selective Reaction Test for Assessing Performance in Non-normal or Emergency Situations
Ken Kusukami
East Japan Railway Company
With the installation of automated safety signaling and other systems, human error in non-normal or emergency situations has become more apparent. A psychological aptitude test was, therefore, developed to assess employee performance in these situations. Three reaction modes (color, shape, and sound) each with three stimuli were designed. One of the above nine stimuli is presented and the testee is required to push the button corresponding to the stimulus as fast and accurately as possible. If the testee's reaction is slow, a feedback signal of the delay is given to impose additional stress on the testee. Criterion-related validity is demonstrated by significant correlation coefficients between test scores and scores of performance in non-normal or emergency situations estimated by simulator trainers, superiors, and drivers themselves. Effective indicators for screening are also identified through analysis of variance, distribution, and correlation of indicators.
Keywords: psychological aptitude test, emergency performance, validity, railway accident, human error
▶ Case study  
Development of General Purpose Computerized Adaptive Testing System
Kenichi Kikuchi
Toho University
Even if test administrators have test items for computerized adaptive testing (CAT), they need to develop testing system for administration. Difficulty of testing system development causes a barrier to utilize CAT in educational or clinical psychology field. To solve this current situation, we developed general purpose computerized adaptive testing system which is easy to administrate small-scale CAT. Because this system is a Microsoft Windows application, it can work as stand-alone only if we copy it to a hard disk drive or a USB flash drive. System users are able to set up the system by themselves just to prepare test questions as Microsoft Word document files (or image files) and information of items as Microsoft Excel document files. Teachers and researchers which have test contents have not administrated CAT easily before. However this system is possible to carry out small-scale CAT for them.
Keywords: computerized adaptive testing, computer based test, item response theory, item response model, graded response model
▶ Case study  
The historical background of English listening comprehension tests in the National Center Test and their evaluation
Teruhisa Uchida, Tatsuo Otsu
Research Division, the National Center for University Entrance Examinations
The Nation-wide University Entrance Examinations are high-stakes tests because their results determine students' futures. Therefore, English listening comprehension tests in the National Center Test have carefully been investigated and discussed before they were implemented since the era of the Joint First-Stage Achievement Test. This report clarifies the lines of discussions in the last quarter of the past century aiming to explain the background and reasons the implementation of listening comprehension tests had repeatedly been postponed, and to unravel the motives for the abrupt announcement of inaugural listening comprehension tests in 2003. The current listening comprehension tests are then examined and evaluated. Finally, this study explores advantageous directions toward reforming large-scale examinations.
Keywords: nation-wide achievement test, the Joint First-Stage Achievement Test, the National Center Test, listening comprehension test
▶ Case study  
Differential Item Functioning Analysis of the Spatial Ability Test for Myanmar Middle School Students: Gender and Ethnicity based comparison
Nu Nu Khaing1, KazuhiroYasunaga2,3, and Hidetoki Ishii4
1Sagaing Institute of Education, 2Japan Society for the Promotion of Science, 3Tokyo University, 4Nagoya University
The main purpose of this study is to analyze differential item functioning (DIF) items of a spatial ability test (S.A.T) for Myanmar middle school students. In this study, we examined the DIF items in the S.A.T across gender and ethnicity by using three DIF analysis methods; Lord's Chi-square (LC) method, Logistic Regression (LR) method, and Mantel-Haenszel (MH) method. Of the 40 items of the S.A.T, there were eleven significant DIF items each for both of ethnicity based comparison and gender based comparison. Especially, most of DIF items for ethnicity based comparison were identified in Block Rotation task of S.A.T, and most of DIF items for gender based comparison displayed in Paper Folding task.
Keywords: item response theory (IRT), differential item functioning (DIF) analysis, spatial ability, ethnicity, gender
▶ Review  
Factors which affect listening comprehension test performance for EFL Learners: A comprehensive framework
Kozo Yanagawa
Hosei University
The purpose of this study was to propose a comprehensive framework of contextual parameters for EFL (English as a Foreign Language) learners and contribute to the development and the validation of listening comprehension tests for them. A contextual parameter (Weir, 2005) refers to a linguistic, social, or situational variable, which is likely to affect language test performance for EFL learners, and can be used as a criterion for test validation purpose. A thorough framework of contextual parameters was compiled through literature review, and each parameter was mapped out on the four major characteristics of test tasks proposed by Bachman and Palmer (1996, 2010) – the rubric, the input, the expected response, and the relationship between input and expected response. This framework will inform both the development and the validation of listening comprehension tests for EFL learners.
Keywords: EFL learners, listening comprehension, contextual parameter, test validity
▶ Review  
On the Necessity for Linking Scores of NCT
Naoki T. Kuramoto
Tohoku University
There is no doubt that the National Center Test (NCT) is playing an indispensable role in the freshman admission process of universities in Japan. However, a major controversial issue has been overlooked in the usage of the subject scores, and this is undermining the basic structure of the examination system. Raw scores are provided and used under NCT's so-called “a la carte” administration system, even though the equivalency of these scores is not technically guaranteed. This problem was caused by a critical change in the fundamental examination design being overlooked when the current NCT system took the place of the Joint-First Stage Achievement Test. This situation seems to be getting worse year by year. The present study reveals ambiguities in the meaning of raw scores by showing the fluctuation in the distribution of subject scores by using percentile ranks for each year. Moreover, our results show the limitation of the coverage of a single subject test by considering the number of perfect scores. There is an urgent need for cooperation among universities and NCUEE for developing a methodology of linking scores for multiple subjects in order to restore fairness to the scoring process.
Keywords: National Center Test, Joint First-Stage Achievement Test, linking, a la carte administration, fairness
▶ Review  
Measurement Problems and Their Treatments in Essay-Type Tests
Satoshi Usami
Japan Society for the Promotion of Science・University of Southern California.
There has been increased awareness of the importance of assessing “higher-level abilities” such as abilities of thinking, written expression, and creativity in recent years. For this purpose, the use of essay-type tests has become popular, especially in university entrance and personnel examinations. However, many researchers have pointed out that essay-type tests are subject to measurement problems such as reliability, validity, and bias, while these measurement problems and their treatments have not been fully addressed so far. This paper overviewed these measurement problems in the literature, and summarized how these problems can be related to each phase in the construction and operation of essay-type tests. Based on this review, this paper provided test practitioners and measurement specialists with critical issues to be considered and how they can be addressed in practice.
Keywords: essay-type test, educational measurement, educational assessment, performance assessment
▶ Review  
Current Situation and Trends regarding Test Item Disclosure in Japanese National Examinations
Masako Wakabayashi, Kazunari Sugimitsu
The Association of Intellectual Property Education
It has been pointed out that test item disclosure after implementation is one of the characteristics of the "Japanese Test Culture" in Japanese official large-scale examinations. It has been considered that one of the causative factors is that the Ministry of Internal Affairs and Communications requires the test item disclosure in national examinations. However, the situation regarding test item disclosure has not been investigated comprehensively until now. In this study, first, we comprehensively investigated whether the test items were disclosed, then we analyzed the relationship between test item disclosure and the characteristics of the examination. As a result, we ascertained that the real test items were not disclosed in several examinations, instead only sample items were disclosed. It is suggested that these examinations show similar trends in points of (1) frequency of implementation per year, (2) announcement of the pass-fail score before or after implementation and (3) examination fee.
Keywords: National Examinations, Test Item, Disclosure, Japanese Test Culture, Test-Standard