日本テスト学会誌 Vol.4 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(4-1)>Abstract

JART Vol.4 No.1

▶ General research  
From the Measurement between Individual ComparisonsTo the Measurement within Individual Changes — A Need of Paradigm Shift for Testing Research and Practice —
Hiroshi Ikeda
The Japan Institute for Educational Measurement, Inc.
Major Japanese testing practice has been focused primarily on relative.comparison of individual differences. Testing has been widely used and producod effective outcomes for grading individual attainment in school selection and promotion in industry, assessment of aptitude in guidance, and so forth. The test scores can be utilized for ordering persons at their relative standing on the same test-takers group. However, the test scores derived from different tests or different sort of test takers are not directly compared with each other. Ordinal test scores, mostly raw or Z-scores,do not tell us the information about the amount of ability changes due to growth and learning of which we really want to know.To assess the reliable change of individuals or group performance, we need to use advanced theory and technology of test development. Necessary conditions and requirements for future Japanese testing practice are discussed referring to modem test technologies like IRT, test equating, CBT, item banking, etc.
Keywords: standard score, common scale, unchangeable measure, measurement of change,item generating technology, collecting and scoring support system
▶ General research  
An exploratory method for determining measurement designbased on the posterior predictive distribution
Taichi Okumura
Department of Educational Psychology, The University of Tokyo/Japan Socriety for the Promotion of Science
In this article, an exploratory method for determining repeated measurement design based on the posterior predictive distribution is proposed. By sampling the future observations repeatedly via the posterior predictive distribution, we can determine the measurement design necessary for the mean range of confidence interval of true scores falls within a prespecified value with a certnin probability. This method takes into account uncertainty about the true parameter values, and can be carried out with introductory programming skills. This method is applicable to various situations in psychological research although it may take a long time for computation in certain conditions.
Keywords: repeated measurement, measurement design, posterior predictive distribution, classical test theory
▶ General research  
An lnvestigation of an Oral Placement Test of Japanese Language Using Generalizability Theory
Eri Banno
Okayama University
This study investigates the potential roles of generalizability theory in investigating oral performance tests.The purposes of this study are to examine the contributions of candidates, raters, tasks, and their interactions to the variance of test scores and to find an optimal number of raters and tasks of the test using generalizability theory. Sixty-one JSL teachers evaluated six chinese students'oral test,whick consisted of the three tasks.The results of the analysis indicated that the test worked well for spreading candidates out along a continuum of oral proficiency. However, with a one-rater and one-task design, some extraneous effects on test scores that could be a source of measurement error were found, and the results indicated that in order to have higher reliability, more raters and tasks are needed for the test.As the optimal number of raters and tasks, the author suggested a two-rater and two-task design for this oral placement test. The study shows that generalizability theory is a powerful tool for investigating and developing oral performance test.
Keywords: oral performance test, placement test, generalizability theory, reliability, Japances Language
▶ General research  
Expectation of graduates'pass-fail possibilities on the NationaI Examination by Discriminant Analysis of the results on the Graduationand Integrated Examinations
Manabu Miyanmoto1, Yoshiaki Mori2, Takahniro Kubota1,2, Yasuichiro Nishimura1,3, Hiroshi Yoneda1,4
1Center for Medical Education, 2Physiology, 3Mathematics, 4Neuropsychlatry, Osaka Medical College
All graduates can not always pass the National Examination for medical practice. However, too many fails are serious problem for medical school. Then,the estimation of the pass or fail of the national examination was tried by using Discriminant Analysis from the result of the examinations of graduate Examination and Integrated Examination in the latter term of 6th academic year. (1) We conducted a survey of the National examination pass or fail expectations on 894 graduates from 1998 to 2006 academic year. (2) The Relationship between the National Examination pass or fail expectations and the actual pass or fail results in the target were examined by using Discriminant Analysis from the result of the National Examination,Graduation Examination and Integrated Examination. (3) The mean percentage of the pass group as expected was 83.2% and that of the fail group as expected fail was 7.9%.The correct expectation rate was 91.1%.The mean percentage of the pass group as expected fail was 7.9% and that of the fail group as expected pass was 0.9%.The siss expectation rate was 8.8%. (4) The passing expectation probability is 0.922(S.D.0.12:n=744)in the pass group as expected pass. It can be said that most students have passed calmly,because most of them ranged in 0.9-1.0 The mean value in the fail group as expected fail was 0.092(S.D.0.14:n=71). (5) Integrated Examination and Internal medicine(1)have a serious responsibility for our results.
Keywords: graduates' pass-fail possibilities, National Examination, discriminant analysis, Graduation and integrated examinatins
▶ General research  
Relationship of learner's personality traits and learning styles to Englishreading and listening test scores
Uchida Teruhisa, Sugisawa Taketoshi, Shiina Kumiko
Research D vision, the National Center for University Entrance Examinations
We investigated the relationship of leamer's personality traits and learning styles to his or her English test scores. The 348 participants, all first-year university students, took an examination administered by the National Center for University Entrance Examinations. The English section included grammar and reading tests and listening comprehension tests.After completing the examination,each participant completed a questionnaire based on the Big Five personality scale and an inventory regarding the participant's styles of learning English. This learning style inventory consisted of three factors: improving communicative skills, inferring the meaning of unfamiliar terms on the basis of the context to grasp the main points, and focusing on vocabulary and grammar. Path analysis of personality traits, learning styles, and English test performances suggested that a learner's personality traits may affect his or her leaming styles. Furthermore, the learning styles may affect his or her overall performance in English as well qs the performance patterns between the grammar and reading score and the listening comprehension score.
Keywords: National Center Test, listening comprehension test, personality trait, Big Five, learning style
▶ General research  
Development and Practice of an Integrative e-Testing System
Pokpong Songmuang, Maonmi Ueno
Graduate Schnool of Information Systems, the University of Electro-Communications
The purpose of this study is to develop a practical e-testing system which is consistently designed to unify various functions of the traditional computer based testing systems. The system is consists of Item Authoring System, Item Bank, Test Delivery System, e-Testing Construction Support System, Test Database, Data Analysis System, and Adaptive Testing System. 'Ihe advantage features of the integrative system are 1. The test data stored in the server is automatically divided into each function and utilized for test analysis, item analysis, test construction, and adaptive testing, and 2. The system has various functions, therefore is used for various test purposes (entrance examination, ability measurement, formative assessment, self-assessment, assessment in distance-education, e-leaming, and so on). Furthermore, some evaluations from actual practices using the system by several teachers show that this system is not just a proto-type but a no-nonsense system for actual practical uses.
Keywords: e-testing, computer based testing, CBT, test construction support system, adaptive testing
▶ General research  
A Method to Estimate Examinee's Skill from Time-Series Motion Data
Hiroyuki Ogata1, Saeko Yamamoto2
1Faculty of Seience and Technology, Seikei University, 2Graduate Schoo1 of Engineering, Seikei University (now working for IHI Corp.)
Though performance testing is an effective way to assess examinees' skill in sports or manufacturing, its CBT implementation is not progressing. Taking golf putt swing as an example, this paper discusses a method to assess the skill level of an examinee automatically from his motion data. In our previous paper, we used some characteristic postures extracted from the motion data for assessment. However, this method cannot take the timing of motion or the process between the postures into account. Here, we propose using a recurrent neural network (RNN) to deal with this problem.We applied the quasi-Newton method to accelerate the leaning process, and the minimum description length principle to decide the network configuration. We verified the effectiveness of the proposed method by using actual examinees' motion data and assess their skill with RNN.
Keywords: Performance Testing, Skill Assessment, Vector Time Series Analysis, Recurrent Neural Network Motion Capture Device, Putt Swing
▶ General research  
What factor moderates the effect of handwriting quality on essay testscoring: Investigation by meta-analysis and experiment
Satoshi Usami
Graduate School of Education, University of Tokyo
No consistent results have been shown about whether the handwriting quality affects essay test scoring. We hypothesized that the following factors may moderate the effect of handwriting quality: (1)degree of freedom for answer, (2)ages of examinees, (3)skills of scorers. Then, we performod a meta-analysis to evaluate the effect of these factors. The result suggested that the younger the examinee,the larger the effect of handwriting quality. Based on this result, we hypothesized that the factor of ages virtually means the factor of quality of essay,and that the quality of essay mediates the moderator effect of age. To test the hypothesis, we let 20 participants score essay tests with different levels of essay quality and handwriting quality.The result of a two-way ANOVA showed no interaction effect, which indicated that essay quality may not mediate the effect of handwriting quality.
Keywords: essay test, bias, handwriting quality, meta-analysis, error74
▶ General research  
An attempt of parameter estimation for the Rasch modelby parallel Markov chain Monte Carlo
Yoshikazu Sato1, Eiji Muraki2
1Miyagi National College of Technology, 2Tohoku University
One of the purposes of this paper is achieving the automatic seale adjustment of the proposal distribution in the random-walk Metropolis-Hastings algorithm and the automatic convergence detection of Markov chains. In order to realize the purpose, the parallel Markov chain Monte Carlo algorithm based on the idea suggested by Gelman, Roberts & Gilks (1996) is preposed. The remarkable feature of the proposed algorithm is that effective samples can be obtained immediately after the scale adjustment of the proposal distribution and the convergence detection of Markov chains are completed simultaneously. Another purpose of this paper is to apply the preposed algorithm to the parameter estimation of the Rasch model which is one of the itm response models. Simulation results show that the item difficulties of the Rasch model can be estirnated preperly by the parallel single-component random-walk Metropolis-Hastings algorithm.
Keywords: item response theory (IRT), Rasch model Bayesian estimation, Markov chain Monte Carlo (MCMC), parallel algorithm
▶ General research  
Prediction of Pass Ratio of Students Taking New National BarExamination for Each Law School based on National Admission Testfor Law Schools — Consideration for Two-Year and Three-Year Courses —
Kumiko Shiina, Taketoshi Sugisawa, Ken-ichiro Komaki, Katsumi Sakurai
The National Center for University Entrance Examinations
The pass ratio of students taking the new national bar examination (Bar Exam) for each law school was predicted based on their score on the National Admission Test for Law Schools(NATlas)The previous model which assumes that the only student whose NATLas score is larger than a common threshold value ca asss the bar Exam was adapted for recent situation.As for students in a two-year course of each enrolled year at each law school. a cumulative pass ratio of the Bar Exam is estimated.The pre vious model succeeds in presicting the estimated values.The estimated threshold values of the NATLas score for passing the Bar Exam in the shortest years are shown to be stable between different enrolled years.For students in a three-year course, the previous model is modified by adding another threshold value of the NATlas score. Students whose NATlas scores are higher thean the threshold value were assumend to have sufficient ability to take the Bar Exam in three years.The modified model indicates that students enrolled in a three-year course required much better NATLas scores to succeed in passing the Bar Exam compared to
Keywords: National Admission Test for Law Schools, New National bar examination, validitythose in a two-year course
▶ General research  
Efficient Scoring Weights under Three-Parameter Logistic Modelfor Quick Estimation of Ability Parameter
Sayaka Arai1, Chen Wei2, Shin-ichi Mayekawa1
1Tokyo Institute of Technology, 2Hitachi, Ltd.
Under item response theory, the examinee's ability θ is esimated as MLE or EAP from his/her item response patterns. However, it is possible to estmate θ as the EAP conditional upon the weighted total score. For example, it is known that, under the two'parameter logistic model, if we use the item discrimination parameter as the weight for each item, the conditional mean of θ given the weighted total score is the same as the usual EAP estimate of θ. The weighted total scores are easy to compute, but it is not clear what types of weights are best under the three'parameter logistic model. In this study, we searched for the optimal scoring weights for the three'parameter logistic model. We compared the efficiency of 14 types of weights in terms of the closeness to the usual θ estimate and their stability. The results showed that the expectation of the so-called best weights is the best.It also indicated that the estimation stability is not good at a lower ability level.
Keywords: Item response theory, three'parameter logistic model, optimal scoring weights
▶ General research  
An analysis of test data including a multiple-choice multiple-answer item using the nominal categories model
Tonmoya Okubo, Kojiro Shojima, Tomoichni Ishizuka
The National Center for University Entrance Examinations
In this research, test data including a multiplechoice multipleanswer item were analysed using the nominal categories model. The results revealed that response probabilities for the answer to each item are described as a function on a latent trait scale. Further, the result shows that some of the items are composed of attractive distracter choices. We have also found that the nominal categories model is usefu1 not only for multiple-choice items but also for multiple-choice multipleanswer items. In addition to the analysis using the nominal categories model, we also obtained results from the analysis on multiple choice-multiple-answer items using a binary response model. We then compared both these results. According to the results, using the nominal categories model to analyse this type of item yielded the largest information.
Keywords: item response theory, nominal categories model, distracter choices, multiple-choice multiple-answer item
▶ General research  
On Effectiveness and Limitation of Score Adjustment for SelectiveTestings: An Approach for Evaluation with Pass-Fail SwappingSimulation
Naoki T. Kuramoto1, Dai Nishigori1, Takuya Kimura2, Yasuo Morita1, Osamu Kamoike1
1Tohoku University, 2Nagasaki University
In this paper we tried to make a theoretical explanation about evaluation method on score adjustment suing pass-fail swapping simulations proposed by Kuramoto et al.(2008).We also conducted a case study using real admission data.We cannot apply statistical equating methods straightly to the scores obtained from achievement tests in the subject area with subject options.Score adjustment has been occasionally executed to convert raw scores in order to trim means afterwards. Score adjustment also brings about social issues.We tried to solve discrepant public opinions over past score adjustment affairs.Fairness theories in social psychology help us to understand what people feel when extremely biased scores emerge from different options.We directed our attention to the results of selection than scores in the case of individual university examnations.Swapping simulation is promising which dealwith it.Itis judged to be fair if the swap rates are consistent regardless of subject options.Swapping simulation was executed for evaluating score adjustment method used for optional tests in science of Thoku University entrance examinations.The result suggested problematic consequences though they seemed successful on swap-rate indices.
Keywords: university entrance examinations, swapping simulation, score adjustment, fairness, subject options