日本テスト学会誌 Vol.20 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(20-1)>Abstract

JART Vol.20 No.1

▶ General research  
Confidence Level Estimation in Neural Automatic Scoring Using Multitask Learning of Regression and Classification
Yuto Takahashi1, Masaki Uto1
1The University of Electro-Communications
Essay or short-answer questions are commonly used in various assessments. However, especially in large-scale examinations, the substantial cost of scoring and the reduced reliability of evaluations due to rater biases can be problematic. To overcome these challenges, automatic scoring models using machine learning technologies have gained significant attention. In recent years, a variety of deep neural network-based automatic scoring models have been proposed, achieving high accuracy. However, even the most accurate current neural scoring models are still prone to potential scoring errors, posing a barrier to their adoption in high-stakes examinations. Addressing this issue, some recent studies have explored automatic scoring models that provide confidence levels alongside score predictions. This study introduces a novel neural scoring model that improves upon such conventional models, offering enhanced performance in both confidence estimation and score prediction. Experiments with actual data demonstrate that our model achieves equal or superior performance to previous models in both score prediction and confidence estimation.
Keywords: Constructed response assessment, automatic essay scoring, automated short answer grading, deep neural networks, multi-task learning, natural language processing
▶ Case study  
Prospects of viewpoint-based assessment for learning process and its utilization in university entrance examinations: A survey for high school teachers charged in career guidance and educational affairs
Takuya Nagano1, Yuto Terashima1, Haruna Tachibana1, Hidetoki Ishii1
1Nagoya University
Based on the revised curriculum guidelines announced in March 2018, student guidance record in high school and secondary school have added columns in each subject which contain scores of viewpoint-based assessments for learning process. Students enrolling in 2022 or later are assessed with this new method, and their evaluation report is written based on these scores. Since this evaluation aims to assess students' learning process from different viewpoints and analytically comprehends their levels of achievement, these scores can be utilized in university entrance examinations where multifaceted and synthetic evaluation is expected. In this paper, we conducted a questionnaire survey for high school and secondary school teachers regarding this assessment at the end of 2022 school year, which was the first year of this assessment, and investigated possibilities to use these scores in university entrance examinations, and the achievements and problems. The results showed that teachers even in schools already employing this assessment considered this assessment had several problems to be solved and improvement should be necessary to utilize in actual university entrance examinations.
Keywords: Curriculum guidelines, Viewpoint-based learning assessment for learning process, High school education, Division of duties (Career guidance / Educational affairs)
▶ Case study  
Our experience of crises in the recent university entrance examinations: Pandemics, natural disasters, stabbing, cheating and other unforeseen risks
Takahiro Terao1, Teruhisa Uchida1, Hidetoki Ishii2, Atsuhiro Hayashi3, Hiroyuki Nakamura4, Yosuke Tatewaki5, Dai Nishigori6, Tomohiro Miyamoto7, Saori Kubo7, and Naoki Kuramoto7,
1The National Center for University Entrance Examinations, 2Nagoya University, 3Nagoya Institute of Technology, 4Ehime University and Admissions Center for Shikoku National Universities, 5Kyushu University, 6Saga University, 7Tohoku University
This study aimed to review the policies and procedures employed in response to various crises that faced the recent university entrance examinations in Japan: COVID-19, the eruption of a submarine volcano in the southern Pacific Ocean, a stabbing outside the University of Tokyo, and cheating with technology. Such incidents revealed our unconscious assumptions and problems with the national university admissions system. We provided the basic principles and guidelines of crisis responses and obtained the implications with the aim to establish university admissions systems that are resilient in the face of unforeseen risks.
Keywords: university entrance examinations, crisis response, COVID-19, natural disasters, stabbing, cheating
▶ Case study  
Accuracy of Re-estimation of the National Assessment of Academic Ability (Parent Questionnaire and Long-Term Trend Survey) Using Multidimensional Item Response Theory and Plausible Values
Toshiaki Kawaguchi1
1University of Teacher Education Fukuoka
In the "Parent Questionnaire" and "Long-Term Trend Survey" of the National Assessment of Academic Ability conducted by the Ministry of Education, Culture, Sports, Science and Technology, there is a problem that not all data from the Parent Questionnaire can be used for analyzing. This is because the data from the Long-Term Trend Survey are sampled by subject area. To overcome this problem, a method has been proposed to re-estimate academic ability by combining data from the Parent Questionnaire, the Long-Term Trends Survey, and the National Assessment of Academic Ability using multidimensional item response theory and plausible values. In this article, I confirm the accuracy of this estimation method through a simulation study. In conclusion, I found that this method can effectively recover population parameters such as mean and standard deviation of subpopulations, correlation coefficients between socioeconomic status (SES) and academic achievement, as well as coefficients derived from regression analyses.
Keywords: National Assessment of Academic Ability, Parent Questionnaire, Long-term Trend Survey, Multidimensional Item Response Theory, Plausible Values, Simulation Study
▶ Case study  
Deficiencies and Support of Thinking Process in the Use of Learning Strategy “Lesson Induction” to Promote Learning from Errors
Satomi Shiba1, Yuri Uesaka1
1Graduate School of Education, The University of Tokyo
A learning strategy called "Lesson Induction" facilitates learning from errors after problem-solving. However, previous studies have reported that some learners struggle to induce effective lessons that they can use for subsequent problem solving. The present study examined the deficiencies of thinking that leads to the induction of ineffective lessons and how to address them through qualitative methods. Ten eighth-grade students participated in this study. First, students used Lesson Induction after mathematical problem-solving and reported what they thought while inducing their lesson. Results of the protocol analysis of students who induced ineffective lessons suggested three types of deficiencies, such as not sufficiently comparing one's solution with the correct answer. Second, the researcher provided interactive support and asked students to induce their lesson again. The protocol analysis results suggested that learners could induce higher quality lessons by comparing one's idea of problem-solving and the idea of the correct answer and then analyzing the reason for errors. Finally, we discussed the thought process that leads to the induction of effective lessons for problem-solving.
Keywords: Learning from Errors, Lesson Induction, Problem Solving, Thinking Process, Protocol Analysis
▶ Case study  
Assessment of the reliability and validity of psychological scale items generated by the ChatGPT
Kenichi Sasaki1, Hideki Toyoda2
1Waseda University, 2Waseda University
Item creation in psychology is a critical factor that significantly influences the outcome of research. The task of creating appropriate and reliable scale items requires a substantial amount of time and effort, presenting a challenge for many researchers. In recent years, with the advancement of AI, its application has expanded across various fields. However, the utilization of AI in creating psychological scale items has not yet been fully explored. This study attempts to generate psychological scale items using ChatGPT, assessing their reliability and validity based on actual response data. The results indicate that item generation using ChatGPT is promising. Furthermore, considerations for designing prompts to generate high-quality items were also discussed. This research presents new possibilities for psychological measurement methods, offering directions to enhance both efficiency and quality in scale development.
Keywords: psychological scale, item generation, automatic item generation, large language models, ChatGPT, reliability, validity
▶ Review  
A Selective Review on Novel Dimensionality Assessment Methods for Measurement Model Users
Kazuki Hori1, Naomichi Makino1
1Benesse Educational Research and Development Institute
Determining the “correct” dimensionality is crucial in data analysis using measurement models such as factor analysis and item response theory. This is significantly important because parameter estimates could be substantially biased when dimensionality is misspecified, especially in case of underfactoring. Various methods for assessing dimensionality have been proposed; however, no golden standard has been established. Moreover, new methods continue to emerge in the literature, drawing on novel theories and techniques from research areas beyond psychological and educational measurement. This methodological proliferation poses a challenge for applied researchers to catch up on the state-of-the-art methods. The present study therefore aims to bridge this gap by offering a selective review on the novel dimensionality assessment methods, with a particular focus on (a) parallel analysis and its variants, (b) model selection approach, (c) network psychometrics approach, and (d) machine learning approach. We discuss the advantages and disadvantages of these approaches and suggest potential directions for future research.
Keywords: dimensionality, factor analysis, item response theory, parallel analysis, network psychometrics, machine learning