日本テスト学会誌 Vol.19 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(19-1)>Abstract

JART Vol.19 No.1

▶ General research  
Proposal of a Pairwise Comparison Type Test Model with Item-Response Category Characteristic Curves
Hideki Toyoda1, Kenichi Sasaki2
1Waseda University, 2Waseda University
A model of a pairwise comparison type test using item response theory is proposed and analyzed using data from a recruitment test. First, we estimated the discriminative power, threshold, and social desirability, which are the population numbers that define the model. Next, cross-validation of the estimates was conducted using data different from those at the time of estimation, and it was confirmed that the estimates were sufficiently stable. The questions about social desirability were shown to be valid. The obtained constant was treated as a fixed parameter and the scale value was estimated. Comparison of the obtained estimates with the test scores obtained by addition and subtraction confirmed that there was a high correlation between the two. In addition, a theoretical test characteristic curve was constructed with the obtained fixed population as a given, and a numerical test characteristic curve was constructed using another set of data, and the two were compared. The results showed that the two were generally in agreement, indicating the high prediction accuracy of the theoretical test characteristic curve.
Keywords: Pairwise Comparison Type, IRT, item response category characteristic cuve, reaction faking, social desirability
▶ General research  
Rotation in Constrained Joint Maximum Likelihood Estimation for Multidimensional Item Response Theory Model
Keiichiro Hijikata1, Kensuke Okada1
1The University of Tokyo
Multidimensional item response theory (MIRT) model is an extension of the unidimensional IRT model. Chen, Li and Zhang (2019) developed an estimation method called constrained joint maximum likelihood estimation (CJMLE) for the parameter estimation of the MIRT model. Moreover, they proved that its estimator of item parameter is asymptotically consistent under a rotation when the numbers of individuals and items simultaneously increase toward infinity. However, the rotation requires the true parameter. Hence, it cannot be applied in practice, when the true parameter values are unknown. In this paper, we investigate through simulation studies how the discrepancy between standardized true item parameter and rotated CJMLE solution changes under several rotation methods which seek the simple structure and are applicable in practice. The simulation results showed the non-decreasing discrepancy, and this could be attributed to the fact that standardized true item parameter does not have a simple structure.
Keywords: Multidimensional Item Response Theory, Maximum Likelihood Estimation, Joint Estimation, Rotation, Simple Structure
▶ General research  
Conversion of IRT Item Response Models
Shin-ichi Mayekawa1
1Tokyo Institute of Technology (professor emeritus)
When we construct an item bank, it is important to choose item response models carefully. However, due to various reasons, we may have to convert the item response models from one model to another. Furthermore, when we wish to compare several item response models, it is helpful if we can find a model which best matches the item characteristics of another. In this article, we developed a method to convert an item response model to another, implemented the method as an R package, and showed some numerical examples using both artificial and real data sets.
Keywords: IRT, item response models, polytomous items, model conversion, model comparison, normal ogive model
▶ Case study  
Estimation of the score rate of the National Center Test from Essay and Comprehensive Questions: Case study of AO entrance examination at Kochi Medical School
Yasutaka Seki1, Ryusuke Yamashita1, Yutaka Hatakeyama1, Tomoko Otsuka2, Seisho takeuchi3, Hiromi Seo3
1IR Center in Medical Education, Kochi Medical School, 2Admission Unit, The Section for Educational Planning and Research, Center for Creative Learning Development, Kochi University, 3Department of General Medicine, Kochi Medical School Hospital
In the process of selecting university applicants, it is crucial to ensure that the selection methods align with the admission policy. Kochi Medical School has been administering a two-stage Admissions Office (AO) entrance exam I (now called the Comprehensive Selection I). However, until now, it has been challenging to compare and assess the essay and comprehensive questions in the first stage with those of other entrance exams. In this paper, we present a method for estimating the percentage of scores on the National Center Test using the results of the first stage of the AO exam I. Results from logistic regression using estimated values showed that a score of 82% on the National Center Test gives a 50% chance of passing the first stage of the AO exam I. It is concluded that estimating the score rate of the National Center Test, which has a high reliability and a large candidate pool, from the original admission selection process, is useful for evaluating selection methods based on the admission policy.
Keywords: Admissions office entrance examinations, National Center Test for university admissions, logistic regression
▶ Case study  
A Comparative Study on Lockdown Browsers in Computer Based Testing:What Administrators and Test-takers should Do
Takahiro Terao1, Dai Nishigori2, Hidetoki Ishii3, Satoshi Kimura4, and Ryosuke Harima4
1The National Center for University Entrance Examinations, 2Saga University3Nagoya University, 4Kyushu Institute of Technology
The study aimed to investigate two lockdown browsers in computer-based testing; Take a Test app (Microsoft) and Safe Exam Browser (ETH Zurich). Using a lockdown browser, test takers are prohibited to use pre-installed functionalities and applications during the testing, and to access hardware utilities. This study investigated the basic characteristics and the use of two lockdown browsers, from viewpoints of test administrators’ configurations and test takers’ use. We also addressed mobile device management (MDM) to configure testing environment in two lockdown browsers more effectively and with a minimum effort of testing organizations.
Keywords: computer-based testing, lockdown browser, Take a Test app, Safe Exam Browser, mobile device management
▶ Case study  
Handling Subscales in the Interpretation of Achievement Test Results ―Empirical Verification Using TIMSS2003 Grade8 Science Data―
Yutaro Sakamoto1
1Recruit Management Solutions, Co., Ltd.
The purpose of the present study was to obtain guidelines on how achievement test results should be interpreted and utilized, focusing on the subscales in achievement tests. To empirically verify the results, we conducted multilevel analysis focusing on the subscales using the TIMSS2003 grade8 science data in Japan. The results showed that the influence of the determinants on the ability specific to the subscales was limited. The results indicate that psychometric support for the interpretation of the test results is based on the assumption of the existence of a unidimensional ability, with reference to the characteristics of each subscale in addition to it. The results of this study, which can only be obtained by hybridizing psychometric techniques and an educational sociological approach, indicate its effectiveness as a validation approach in various test utilization situations in which subscales are set, which was discussed as the significance of this study.
Keywords: subscale, item response theory, bi-factor model, achievement test
▶ Case study  
An Attempt to Classify the Characteristics of Tests for Grasping Basic Academic Abilities in University Entrance Selection
Kumiko Shiina1, Sayaka Arai1, Kei Ito1, Hirohito Sakurai1, Yusaku Otsuka2
1Research Division, National Center for University Entrance Examinations, 2School of Psychology and Healthcare Management at Akasaka, International University of Health and Welfare
The “written or CBT-based simple tests to grasp basic academic ability” (“test”) required in the Comprehensive Selection and Admission by School Recommendation were divided into four groups based on terms included in the test name, and the test characteristics in each group were analyzed with reference to the subject(s). The group including the term “academic ability” demonstrated a very strong tendency to evaluate basic academic ability based on specific subject(s) in both the Comprehensive Selection and Admission by School Recommendation. For other groups, more universities conducted tests without reference to specific subject(s) in the Comprehensive Selection than in the Admission by School Recommendation, while test based on specific subject(s) tends to be conducted in the Admission by School Recommendation. The application guidelines of several universities refer to the usage of basic questions, limitation of the field of subjects(s), or refer to high-school textbooks when specifying the level of difficulty in tests.
Keywords: basic academic ability, aptitude test, Comprehensive Selection, Admission by School Recommendation
▶ Case study  
Features of Large-Scale Longitudinal Surveys of Academic Achievement in the United States A Case Study of Surveys Conducted by the National Center for Education Statistics
Yuko Nonoyama-Tarumi1, Toshiaki Kawaguchi2, Norihiro Nishi3
1Musashi University, 2University of Teacher Education Fukuoka3Osaka University
The purpose of this paper is to review and analyze the features of large-scale achievement longitudinal surveys conducted by NCES in the US to draw implications for how Japanese Government may design large-scale achievement longitudinal surveys. We review 10 longitudinal surveys with the following questions: (1) How has the NCES longitudinal surveys evolved and advanced since 1970, (2) What is the organizational structure and capacity that is needed to implement these longitudinal surveys, (3) How is inequity perspective embedded in the study design of the longitudinal surveys.
We find that there is a large gap between Japan and US in the accumulation of achievement longitudinal surveys. This gap is not only due to the difference in “test culture”, but also in the structure and employment practice of personnel needed to conduct large-scale longitudinal surveys, and the lack of clear purpose of conducting achievement studies.
Keywords: Large-Scale Survey of Academic Achievement, Longitudinal Data, United States, National Center for Educational Statistics, Interview
▶ Case study  
Development of an Assessment to Measure the Ability to Think Fostered Through Cross-Curricular Learning: The Structure of the Content Domain and Creation and Evaluation of Trial Items.
Tomoya Watanabe1, Wakana Onozuka1, Yuki Nozawa1, Yu Taizan2
1Benesse Educational Research & Development Institute, 2Naruto University of Education
One of the main points of the Course of Study, a national syllabus for formal education at secondary schools in Japan announced in 2017 is that the development of "the ability to think, make judgements, and express themselves" (hereinafter referred to as "the ability to think") must be interrelated with the skills to be acquired not only in each subject area but also in a cross-curricular perspective. The aim of this study is to prepare for the development of a summative assessment to measure the achievement of the ability to think fostered through cross-curricular learning. First, this study clarified the areas of the ability to think to be measured based on the Can-do Statements of the ability to think, which embody the goals of the ability to think extended from the theoretical framework of "thinking skills" shared by major subjects. Second, A prototype version of the assessment items was developed. Third, based on the results of the analysis of verbal protocol data during answering items and item analysis of its quantitative response data, the authors attempted to collect evidence of the validity of the items and implications for item revision.
Keywords: thinking skills, the Courses of Study, Can-do Statements, item development, think-aloud, item analysis
▶ Review  
A review of process data studies in educational assessment
Daiki Hojo1, 2
1Benesse Educational Research and Development Institute, 2the University of Tokyo
This study reviewed articles on process data (logfiles) in educational assessment, especially computer-based testing, from the following perspectives: (1) the definitions of and differences between process data and logfiles, and the types of process data used in practice; (2) the main objectives of process data studies; (3) statistical analysis and modeling for process data; and (4) the validity of process data based on an evidence-centered design. Finally, we proposed avenues for future research on process data from an academic perspective and from the standpoint of ethical issues. The study found that, despite the many issues currently faced while conducting research using process data, research findings should be accumulated to support the ongoing development of computer-based testing.
Keywords: process data, log data, logfiles, CBT, educational assessment
▶ Review  
An overview of research topics and recent developments in computer-based testing
Kyosuke Bunji1
1Kobe University, 1Benesse Educational Research and Development Institute
This paper selectively reviews the outline and current developments on theoretical aspects of computer-based testing (CBT) in various fields. Six major research topics were introduced in detail: score comparability between PBT and CBT, adaptive testing, innovative item types, frauds and their countermeasures in online testing, utilizing log data, and reasonable accommodation. In addition, following the previous predictions on the development of CBTs and the literature review, the future direction of research on CBTs was discussed from three perspectives: validity, test anxiety, and automation.
Keywords: computer-based testing, mode effect, adaptive testing, technology-enhanced items, cheating, log-data