日本テスト学会誌 Vol.6 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(6-1)>Abstract

JART Vol.6 No.1

▶ Invited Paper  
NONPARAMETRIC ONLINE ITEM CALIBRATION: IN COMPUTERIZED ADAPTIVE TESTING ENVIRONMENT
FUMIKO SAMEJIMA
UNIVERSITY OF TENNESSEE
The author developed nonparametric estimation methods and approaches for estimating the operating char-acteristic of a discrete item response since 1970s, the rationale of the conditional p.d.f. approach was presented in Psychometrika (1998, Vol. 63, pages 110-130) together with that of the bivariate p.d.f. approach. This paper presents the conditional p.d.f. approach that was adjusted to the environment of computerized adaptive testing. Efficiency is challenged in this paper, so simulated data were created with only 1,202 hypo-thetical examinees, and 300 dichotomous items were selected from actually developed, used and calibrated items following the three-parameter logistic model by the Law School Admission Council. Truncated logistic model is proposed to avoid the noise provided on the lower levels of the latent trait, where the item response information function assumes negative values for the correct answer when the three-parameter logistic model is used. Two stopping rules were adopted, that is, to stop presentation of a new item from the item pool 1) when 40 items have been customized and presented to an examinee from the item pool, and 2) when the standard error of estimation has reached 0.32, to compare their outcomes. Challenge was also made to include 10 nonmonotonic item characteristic functions of different degrees of nonmonotonicity were included in the total of 25 new items whose item characteristic functions were to be estimated. It turned out that the latter stopping rule provided better outcomes, and estimated item characteristic functions are, in general, closer to the truth curves. Notably, nonmonotonicities of the truth curves were well depicted even when the degree of complexity of the truth curve is high.
Keywords: item response theory, latent trait models, ability measurement
▶ General research  
Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient
Yasuo Miyazaki1, Taketoshi Sugisawa2, Edward Wolfe3
1Virginia Polytechnic Institute and State University, 2Niigata University, 3Pearson
Reliability generalization, a meta-analysis for reliability coefficient, is a useful technique for summarizing information on score reliability. Since it is a relatively recent idea, the standardized effect measure for the reliability coefficient has not established yet. Several transformations of the reliability coefficient have been proposed as the standardized effect measure, but there are no studies to suggest which transformation should be used. Thus, in this article, a simulation was conducted to compare the performance of several transformations of the reliability coefficient for a fixed-effects meta-analysis model of reliability coefficients in terms of parameter recovery and tenability of the normality assumption for various true values of the reliability coefficient and sample sizes. A log-transformation performed best for both aspects consistently across the simulation conditions and all transformations examined in this study showed better performance than did no transformation.
Keywords: Reliability generalization study, meta analysis, standardized effect measure, transformations, Monte Carlo method
▶ General research  
The Effect on latent Trait Estimation in Case of the Assumption of Local Item Independence Violated
Naoya Toudou
Graduate School of Education, University of Tokyo
In this study, using two models, Two-Parameter Logistic Interaction Model(2PLIM) and Two-Parameter Logistic Copula Model(2PLCM), which model local item dependence, simulations were conducted to investigate the effect on latent trait estimation in case of the assumption of local item independence violated. Through simulations, it was found that local item dependence increased the average of bias, root mean square error and decreased that of correlation coefficient between estimated latent trait and its true value. In addition, a relation between estimated latent trait and its true value was also affected by local item dependence. Besides, the number of subjects and items influenced latent trait estimation.
Keywords: item response theory, local item dependence, latent trait
▶ General research  
Quantitative Analysis abut the Professional Training of Testing in Japan
Takuya Kimura
Nagasaki University
The pourpose of this paper is to reexamine the professional training of testing in Japan. The problem around the professional training of testing is new and old one. Nowadays, the professional of testing is required for the quality assurance for education. The most of professional of testing didn't belong to science course in university, though they need the talent of mathematics and statistics. They will become the professional of testing after getting job or entering master course. And they study the knowledge of testing alone or in the training afer getting job. They are tought by the professor whose maior is education (inculuding educational measurement). However, the number of the professors whose major is educational measurement is declining. This is because the difinition of psychology has been changing in the revision of the act of teacher’s license. As teacher training system is improved in postwar ara, the pshychology, which include educational measurement and educational statistics, is defined as only counceling, clinical psychology and student guidance.
Keywords: The Professional of Testing, Professional Education, Quantitative Analysis, The Act of Teacher's License
▶ General research  
Construction of a Simulation Model and an Example of Error Estimates in Linking for the Single-Group Design
Yoshikazu Sato1, Tadashi Shibayama2
1Niigata University, 2Tohoku University
In this article, we construct the model that can be used for simulations of linking for the single-group design in order to contribute to error estimates in linking. One can simulate the cases of equating, concordance, and linkage when using the constructed model. Moreover, we provide an illustrative example of error estimates in linking aided by the bootstrap method based on the constructed model. As the example, the case in which Japan Law School Admission Test administered in 2008 (JLSAT2008) is linked to JLSAT2007 is supposed. Simulation results show that the linking error curves shaped like a bimodal distribution are obtained. And it is also revealed that the linking error curves may be jagged depending on the conditions of the point allocation and the rounding errors in score transformation. We conclude that the constructed model is helpful for error estimates in linking because of reasonable simulation results.
Keywords: standard error of linking, simulation model, equipercentile equating method, bootstrap method, item response theory (IRT)
▶ General research  
Estimation of Correlation Coefficient to Correct Selection Bias in Small Samples — Comparison of Maximum Likelihood and Bayesian Estimation —
Kensuke Okada1, Kazuo Shigemasu2
1Senshu University, 2Teikyo University
When one calculates the correlation between entrance examination score and GPA score, in most cases the estimated correlation coefficient is smaller than expected; this problem is commonly known as selection bias. To estimate the true correlation coefficient under selection, maximum likelihood and Bayesian estimation procedures have been proposed in former studies. In this paper, we evaluated the behavior of both type of point estimates under small sample conditions by numerical simulation study in which sample size, true correlation coefficient and the proportion of the missing data are manipulated. The results can be summarized as follows: (1) Estimations are satisfactory when sample size is large (2) In terms of the closeness between the average estimates and the true values, Bayesian estimation performed better (3) In terms of the mean squared error, maximum likelihood estimation performed better. The R codes used in this study are shown in Appendix.
Keywords: selection bias, correlation coefficient, maximum likelihood estimation, Bayesian estimation, small
▶ Case study  
A data analysis of English test in elementary schools under the variability of each school’s activities and its number of class hours
Yasuhito Hagiwara
National Institute for Educational Policy Research
In this study, I analyzed the data on English listening test which had been administered nationwide in 2007, before "Foreign Language Activities" in elementary schools was treated as a separate chapter in the new Course of Study. Therefore, it was not made compulsory at elementary schools at that time. An item response model which assumes school-level variability of each item's difficulty was applied to the data. I also incorporated the dummy variables of each school's number of class hours as predictors into the model. The results showed that the multilevel item response model which decomposed the latent trait into school-level and pupil-level, and which allowed school-level residual variances in the threshold parameter of each item fitted the data better. As for the number of class hours, the latent trait of schools having less than 30 units a year for English activities was lower than that of schools having 30-35 units a year with the 95% credible interval. On the other hand, the posterior mean of the latent trait of schools having more than 35 units a year was higher than that of schools having 30-35 units a year, but the difference was not credible with the 95% credible interval.
Keywords: item response theory, multilevel modeling, Course of Study, listening, elementary school pupils
▶ Case study  
The effects of variations in placing diagrams within items of a Japanese language reading test
Kazuhiro Yasunaga, Hidetoki Ishii
Graduate school of Education and Human Development, Nagoya University
The purpose of this study was to examine if the manner of presentation affects item difficulty in a Japanese language reading test. Study 1 identified items with low difficulty and discrimination. Junior high school students in South Korea were administered a translated version of the Gunma Prefecture Achievement Test (GPAT), for comparison with the data of 2006 GPAT gathered in Japan. Item difficulty and item discrimination of this test were analyzed, and it was found that such items featuring a diagram composed of two blank spaces presented by two words which were reversed from the original order in the text, and accompanied by a hint alongside these spaces, showed low difficulty and discrimination. The effects of the manner in which blanks and hints were placed were examined, through four distinct versions. Study 2 examined whether this item presentation affects item difficulty and discrimination. Students in Japanese junior high school were assigned to one of test A, test B, test C and test D, varied by how blanks and hints were presented. Item difficulty increased more than approximately 0.3 points, and it was observed that both difficulty and discrimination were high when two blank spaces were placed in order of text. It was concluded that the presentation manner of items affects difficulty and discrimination on testing.
Keywords: reading test, item analysis, item difficulty, item discrimination, diagram placement
▶ Case study  
Structure of academic skills measured by the listening comprehension test in National Center Test
Teruhisa Uchida1, Taketoshi Sugisawa2, Kei Ito1
1Research Division, the National Center for University Entrance Examinations, 2Niigata University
The study examines relationships between the English listening comprehension test and other subjects in the National Center Test; then discusses the results with validity of the listening test using self-evaluation questionnaires and washback effects on English curriculums. The results of each subject were mapped on the two-dimensional chart with the axes of comprehensive academic achievement and of humanities-science interests of the examinees. Listening tests showed different locations on the chart from paper-based English tests, but closer to Japanese Language tests. Although listening tests correlated with paper-based tests, listening tests showed higher correlation with self-evaluated achievement of listening behaviors. This tendency may imply construct validity of the listening test. In addition, the listening tests seem to have positive washback effects on English curriculum such as active encouragement of listening, grammar, and logical understandings.
Keywords: National Center Test, listening comprehension test, structure of academic skills, validity, washback effect of testing
▶ Case study  
Validation of a Non-curriculum-based Ability Test by Comparison with Subject Tests and Self-evaluation Ratings
Kei Ito1, Atsuhiro Hayashi1, Kumiko Shiina2, Masaaki Taguri1, Ken-ichiro Komaki1, Haruo Yanai3
1The National Center for University Entrance Examinations, 2Kyushu University, 3St. Luke's College of Nursing
Recent years, merits and usefulness of new types of comprehensive examinations, which evaluate examinee’s performance in problem solving and task handling, aptitude for higher education in a specific academic courses etc., have been discussed in regard to the diversification of selection method for university admission. A comprehensive test aims to measure non-curriculum-based abilities such as logical thinking ability, reading comprehension, expressiveness and related abilities, whereas a subject test aims to measure academic achievement. In order to verify the validity of non-curriculum-based ability test (NCBAT) we investigated the relationship between NCBAT scores and academic achievement, the latter evaluated by the subject tests and self-evaluation ratings, using factor analysis based with the experimental data obtained from university students. In this paper we demonstrate the existence of a distinctive factor differentiating the non-curriculum-based ability from the subject-based ability, and report the correlations between this factor and information comprehension, logical thinking, and expressive abilities which underlie problem solving and task handling abilities. These results have high reproducibility and can be taken as evidence of the construct validity of the NCBAT.
Keywords: non-curriculum-based ability test, aptitude test, logical thinking ability, reading comprehension, expressiveness, validity, factor analysis