日本テスト学会誌 Vol.7 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(7-1)>Abstract

JART Vol.7 No.1

▶ General research  
Comparing Test Difficulties of NCT English Examinations using Non-linear Factor Analysis
Tatsuo Otsu, Takamitsu Hashimoto
The National Center for University Entrance Examinations, and JST CREST
The authors compared difficulties of English examinations of the NCT in consecutive two years. They used a nonlinear-factor analysis (NLFA) model, which was adapted to missing values under MAR (Missing At Random) condition, for an analysis of the “monitor experiments” of the NCT. The participants of the experiment were freshmen of national universities in Tokyo Metropolis. The authors designed an English supplemental examination of NCT to be an anchor variable. And they compared two usual examinations of English. Participants of the experiments achieved rather higher scores than participants of the NCT on average. Although the marginal distributions were different from each other, the proposed method generated good estimations on relative difficulties of the examinations, where information of common participant was not available. Although NLFA generated rather good estimates, it could not surpass estimates based on 2PL IRT, where responses of test items were used for estimation. Biases of NLFA estimates were larger in upper and lower bounds of scores compared to IRT estimates.
Keywords: Non-linear Factor Analysis, Linking, NCT, English Listening Comprehension Test
▶ General research  
A Correction for the Mean and Sigma Method in Common Examinees Designs
Hiroyuki Noguchi1, Ryuichi Kumagai2
1Nagoya University. 2Tohoku University
The Mean and Sigma Method is often used for estimating the equating coefficients, when equating tests in common examinees designs. However, when the number of items included in two IRT scales to be equated is very different, it may not be possible to get an appropriate estimate using the Mean and Sigma Method, because the size of the error components included in estimated scale values may be different between the scales. The present study suggests a correction method for removing error components from estimated scale value variance of common subjects by estimating their sizes.
This correction method was examined using simulation data. The results indicated that this method was highly effective when the error variance of the estimates was very different between the scales to be equated, and when the discrimination power of items in the scales was generally small.
Keywords: common examinees design, equating coefficients, Mean & Sigma method, correction
▶ General research  
Cross-Year Comparison of Test Score Distributions under a Testing Culture Characterized by Post-Hoc Disclosure of All Test Items: Linking Nationwide and Prefectural Tests
Hidetoki Ishii1, Kazuhiro Yasunaga1,2
1Nagoya University. 2Research Fellow of the Japan Society for the Promotion of Science
This paper aimed to: 1) propose a means of cross-year comparison of test score distributions under a testing culture characterized by post-hoc disclosure of all test items, 2) consider its feasibility, and 3) examine its actual application collecting and analyzing real data. A practical method was proposed that enabled cross-year comparison by appropriately designing administration of nationwide and prefectural tests and linking their test scores. It was examined that the utility of the data, the need for collaboration between prefectures, and whether it was necessary to identify test taker identities. Finally, a comparison of the language test scores of ninth-graders over a three year time gap between 2006 and 2009 was conducted. Results indicated that there was little difference between test score distributions over time and what small differences there were, would be attributable to “application” not “knowledge". Moreover, the feasibility of expanding our proposed data collection method over the number of tests, their versions, and regions of administration were also discussed.
Keywords: linking, cross-year comparison, testing culture, nationwide test, prefectural test
▶ General research  
Beyond Cronbach's alpha: a comparison of recent methods for estimating reliability
Kensuke Okada
Senshu University
Many former studies have reported the tendency of Cronbach's alpha to underestimate the true reliability. Recently, separate studies have reported the better performance of McDonald's omega, or of reliability estimation with structural equation modeling (SEM) technique. In this study, we compared the performance of these two approaches to estimate reliability by simulation study. Our results showed that omega with three specific factors resulted in biased estimates. SEM estimation with tau equivalent and congeneric models and omega (total) with one specific factor revealed good performance. If the model structure is known a priori, SEM estimates with true models also gave good results.
Keywords: Cronbach's alpha, reliability, McDonald's omega, structural equation modeling, simulation study
▶ General research  
How learning strategies are affected by the attitude toward tests: Using competence as a moderator
Masayuki Suzuki
Graduate School of Education, The University of Tokyo / Japan Society for the Promotion of Science
In the present study, we investigated relationship between values of test and learning strategies from various aspects. In addition, we put test approach-avoidance tendency as mediator variables for evaluating the above relationship. Data was collected from 391 undergraduates using a self-reported questionnaire. It was indicated that people who consider test as effective to improve their learning strategies and create a learning program preferably adopt the desirable learning strategies, through a mediation effect of test approach. On the other hand, the value “enforcement” exhibited a direct relationship with learning strategies. Furthermore, to evaluate an effect of competence for tests on those relationships, we performed multi-group mean and covariance structure analysis and the results indicated that people with higher competence showed relatively positive attitudes toward tests and showed higher score for scales of test approach tendency.
Keywords: values of test, test approach-avoidance tendency, learning strategy, competence, multi-group mean and covariance structure analysis
▶ General research  
Linking Supplementary Examinations to Main University Entrance Examinations: Using the Tucker Linear Equating Method
Naoki T. Kuramoto
Tohoku University
The present study proposes a linking method for supplementary examinations to main university entrance examinations at an individual university. Score adjustments for university entrance examinations are conceptually a part of any linking. However, what constitutes a suitable method varies according to the situation. In the case of supplementary examinations run by an individual university, equating the scores of the different tests is easier than some other cases, such as linking scores from optional subject tests within a subject area. On the other hand, supplementary examinations involve many fewer applicants than most other exams because supplementary examinations only given in exceptional cases, such as to applicants who encountered some unavoidable disastrous situation. A complicated computational process is undesirable because that would increase the risk of an erroneous score being produced. Our proposed linking method is an application of the Tucker Method with an anchor test using two additional assumptions. Together, these allow an easy computation that adds the same constant adjustment score for all applicants taking supplementary examinations while omitting unstable parameters derived from those exams. A bootstrap simulation study was executed using a national university’s entrance examination scores as data generating populations. The results showed that average bias remained almost zero; however, variance of the estimation error was not negligibly small when the sample size was small. The swapping-rates criterion showed that the effectiveness of adjustment differed according to the distribution of the main entrance exam scores in the population. The proposed method is expected to restore some equity to the selection process.
Keywords: score adjustment, linking, supplementary examinations, equity, university entrance examinations
▶ General research  
An Evaluation of Standard Error of Equating by Bootstrap Method in Equating Tests Consisting of Subtests
Yoshikazu Sato1, Tadashi Shibayama2
1Niigata University. 2Tohoku University
When equating tests which consist of subtests for the single-group design, we can consider (a) equating tests by total scores and (b) summing equating scores after equating corresponding subtests. In this article, we call the former “equating by total scores” and the latter “equating by subtest scores”. The purpose of the study was to compare those equating strategies especially in terms of the standard error of equating. Thus, we conducted a simulation study using the results of three multiple-choice subtests of the Japan Law School Admission Test (JLSAT). Simulation results showed that the equating scores by those equating strategies were not much different. In addition, “equating by subtest scores” had smaller errors in terms of the standard error of equating in all simulation conditions and had a wider range of scores which could be equated below a certain error level. We conclude that “equating by subtest scores” have an advantage in the standard error of equating within the scope of the current simulation setting.
Keywords: subtest, standard error of equating, equipercentile equating, single-group design, bootstrap method, item response theory (IRT)
▶ Case study  
Toward Validity Argument for Test Interpretation and Use Based on Scores of a Diagnostic Grammar Test for Japanese Learners of English
Rie Koizumi1, Hideki Sakai2, Takahiro Ido3, Hiroshi Ota4, Megumi Hayama5, Masatoshi Sato6, Akiko Nemoto7
1Tokiwa University. 2Shinshu University. 3Waseda Junior & Senior High School. 4Komazawa Women's University. 5Dokkyo University. 6Ichikawa High School, Yamanashi. 7Atago Junior High School, Ibaraki
The purpose of this paper is to review recent trends in validity and validation, particularly with regard to the argument-based approach to validity, and by using this approach, to construct a validity argument for test interpretation and use based on scores of the English Diagnostic Test of Grammar (EDiT Grammar) for Japanese learners of English. The EDiT Grammar focuses on the knowledge of basic English noun phrases, especially their internal structures. Using Chapelle, Enright, and Jamieson’s (2008a) validation framework, we first formulated the interpretive argument, and conducted two studies, one using verbal protocol analysis and the other using Rasch analysis. The results suggest that test-taking processes employed by test-takers are in accordance with the expectations drawn from the test specifications and that all the items and test-takers fit the Rasch model. These two positive results provided two strains of evidence to support inferences of evaluation and explanation, which are synthesized into the validity argument for the EDiT Grammar.
Keywords: validity, diagnostic test, noun phrases, verbal protocol analysis, Rasch analysis
▶ Case study  
Development of Kanagawa Prefecture High School English Tests With a Common Item Design: Test Evaluation by IRT Equating
Chisato Saida1, Kozo Yanagawa2
1Yokohama National University. 2Kanagawa Prefectural Odawara Senior High School
Prefecture-wide English achievement tests for high school students have been conducted twice a year in Kanagawa. The English committee of Kanagawa prefecture has created the English test at four difficulty levels. 43,546 students in 73 senior high schools took the 2009 spring tests on the same day. To counter the current decline in the number of test takers, the English committee decided to modernize the testing system in order to compare test scores among forms and to show the progress of students’ English ability on a common scale. The committee developed the four forms of the 2009 spring tests with a common item design. We equated them using item response theory. As a result, the test scores and the item parameters became comparable among forms. By including some items of the 2008 fall test in the 2009 spring test, the scores of students who took the both tests became comparable. This research demonstrates the importance of creating a common scale for large-scale achievement tests. Ways to improve such a teacher-based assessment system are also discussed.
Keywords: Prefecture-wide English achievement tests; Common item design; Test development; Item response theory; Test equating