日本テスト学会誌 Vol.1 No.1 Abstract

トップ>学会誌>既刊号一覧>既刊号(1-1)>Abstract

JART Vol.1 No.1

▶ General research  
Some Research Perspectives in Future Testingand the Role of JART ~In commemoration of the first publication of the new journal~
Hiroshi Ikeda
The Chairman of the Board of Directors, The Japan Association for Research on Testing, Professor Emeritus, Rikkyo University
In commemoration of the first publication of the Japanese Journal for Research on Testing, the author stated that(1)advanced test operating systems today can be attained only by collaborative work of different expertise. This association has a high potentiality of realizing them.(2)Current test technology has developed efficient tools and methods of collecting individual information. Continuous accumulation of test results can provide valuable sources fbr evidence・based education. Benefits of test information should be returned to an in adequate way.(3)Test information could be used in either good or bad way. The association has to assist and disseminate the desirable ways of test use in society.
Keywords: operating system, data collection technique, computer'based testing, intelligent measurement, open use of educational data, test accountability
▶ General research  
Stability of Classification Results on the Cognitive Diagnosis for Individuals
Kikumi Tatsuoka1, Curtis Tatsuoka2
1Department of Human Development Teachers College, Columbia University, 2Department of Statistics George Washington University
This study introduces various methods to measure the stability of perfbrmance on a test. Two levels of stability measures are discussed: one is at the group level such as the correlation and reliability of two repeated measures, and the other is at the individual level. The stability of measures at the individual level is defined as how consistently an individual answers test items. Therefore, traditional reliability theory is not applicable to the individual level. We also address the issue of granularity in trying to determine which levels of perfbrmance measures are stable enough to diagnose and report to test users. It is well known in cognitive science that "bugs" measured at the micro level are unstable. On the contrarM it is known that total scores are fairly stable in psychometric theories. We use a test carefully designed for investigating these issues.
 
▶ General research  
Performance Assessment System that Recognizes Procedural Task Using CCD Camera
Ogata Hiroyuki, Igarashi Shunsuke
Seikei University, Faculty of Engineering
The development of information technology makes it possible to realize performance testing assisted by computer. However, most of the performance tests carried out nowadays use mouse and keyboard as input devices. Such a method is not always appropriate to assess examinees'ski11. To address such a problenm, we propose a method where the examinee performs his task for real in front of the system,and the system observes and assesses it. In this paper, we fbcus on the procedum:al aspect of the task. The system is composed of a CCD camera and a personal computer. The computer is used to recognize and score the examinee's task procedure. Here, the eigenspace method is applied for recognition, and the DP matching method is used fbr scoring. We developed a prototype system, and adopted the cardio・pulmonary resuscitation as an example task to verify the effectiveness of the system.
Keywords: Performance assessment system, Human skill, Task model, Eigenspace method, DP matching
▶ General research  
An research on the trend of academic achievement of English by IRT equating of National Center Test using the common・subject design
Osamu Yoshimura1, Kojiro Shojima1, Naoki Sugino2, Takeshi Nozawa2, Yuko Shimizu2, Eiji Saito3, Masashi Negishi4, Junko Okabe5, Simon Fraser6
1The National Center for University Entrance Examinations, 2Ritsumelkan University, 3Kansai University, 4Tokyo University of Foreign Studies, 5Aichi Prefectumal University, 6Kure University
In this study, we conducted an IRT equating of National Center Test"English"using the common-subject design. About 450 university freshman students took an English test which was consisted of 100 items. All the test item was once used in National Center Test before. Based on the test data, we could equate each of the scales of National Center Test "English"in 1991-2004 to thne test in 1990. The trend of academic achievement of English from 1990 to 2004 was discussed.
Keywords: National Center Test, academic achievement of English, trend, IRT, equating
▶ General research  
An Analysis of Error Propagation from Item Difficulty Parameters to Maximum Likelihood Estimates of Ability Parameter in the Rasch Model Using the Delta Method
Yoshikazu Sato1, Eiji Muraki2
1Department of Electrical Engineering, Miyagi National College of Technology Graduate School of Educational Informatics Education Division, Tohoku University, 2Graduate School of Educational Informatics Research Division, Tohoku University
ty parameter estimates propagate to the maximum likelihood estimates of the ability parameter in the Rasch model by use of the delta method. Thne delta method is a commonly used statistical method fbr deriving standard error expressions approximately. As a result of formulations, it reveals that thne standard errors of the item parameter propagate as the errors of the standard errors of the ability parameter. It also shows that the error of the standard error of the ability parameter can be expressed as the hnction of correct or incorrect response probabilities of the examinee to the items and the standard errors of the item parameter estimates. As for the simulation study, the rates of the errors included in thne standard errors of the ability parameter are a few percents under the conditions of n=25,50,and 75 items and N=200,400, and 600 examinees. It is also suggested that the computer adaptive testing may have the advantage to the paper and pencil test in terms of the errors of the standard errors of the ability parameter estimates.
Keywords: item response theory, Rasch model, standard error, error propagation, delta method
▶ General research  
ACommon Framework for Developing Automated Spoken Language Tests in Multiple Languages
Balogh Jeumifer1, Barbier Isabella1, Bernstein Jared1, Suzuki Masanori1, Harada Yasumari2
1Ordinate Corporation, 2Waseda University
A Common Testing Framework was developed to create spoken language tests in multiple languages. The goals were, frrst, to enable the creation of spoken language tests that can be administered in large volumes and scored rapidly without sacrificing reliability or quality, and second, to facilitate the efficient development of tests in any language. The framework consists of three components:atest architecture, a computerized testing system, and a development and validation process. In the resulting tests, human-recorded prompts are played over the telephone, and test・takers'responses are automaticany scored using speech recognition and other computer technologies. The testing framework has been used to create spoken language tests for Enghsh and Spanish and is currently being employed fbr Dutch and Japanese test development. Data from the Enlish and Spanish tests are presellted to show how the tests built on top of thne Common Testing Framework are reliable and valid.
Keywords: language assessment, spoken language, test development, speech recognition,computerized scoring
▶ General research  
The Chnaracteristics of Large-scale Examinations Administered by Public Institutions in Japan — From the Viewpoint of Standardization —
Sayaka Arai1,2, Shin-ichi Mayekawa1
1Department of Human System Science, Graduate School of Decision Science and Technology, Tokyo Institute of Technology, 2Japan Center for Examination Research
Most large・scale tests administered by public institutions in Japan are not standardized. The primary reason they are not standardized is what Ishizuka(2002,2003), Mayekawa (2003a,2003b)and Murakami(2003b)call thne Japanese Test Culture. The Japanese Test Culture can be characterized as follows: (i)Examinations are administered simultaneously, once a year;(ii)all questions are new each year;(iii) the questions are published subsequently;(iv)questions are developed by outside specialists, and Ilo psychometricians are involved in test developnment;(v)scores are reported as raw scores;(vi)thne average time for answering each question ranges from 2 to 4 minutes. The purpose of this article is to empirically verify the existence of the Japanese Test Culture. We surveyed the nine major large scale tests administered in Japan, and inteniewed thne developers of these tests. The results are sumnmarized using a number of criteria. A method to standardize tests within the framework of Japanese Test Culture is proposed.
Keywords: standardized test, large-scale test, Japanese Test Culture
▶ General research  
ACognitive Framework on Performance Assessment in School Education
Noriaki Sasaki1, Eiji Muraki2
1Graduate School of Educational Informatics Education Division, Tohoku University, 2Graduate School of Educational Informatics Research Division, Tohoku University
exists ln examiners when performance assessment was executed in school educatlon A cognitive framework was defined as point of view on performance that examiners had 139 persons (68 teachers and 71 students)were sampled and we used ltems on performance ln school education, called"working style" Examinees replied about degrees of cognition for 18 performances We regarded a factor as a component of a cognitive framework As a result of factor analysis,3 factors were extracted, and named "execution","activity", and "insistence" respectively Furthermore, we compared scale scores of teachers and students Consequently, students'scores tended to be higher than teachers'scores on "execution" signicantly, teachers'scores were higher than students'scores on''actlvlty"significantly, and there was no slgnificant difference between teachers'score and students'scores on "insistence"
Keywords: performance assessment, cognitive framework, school education, ability
▶ General research  
Analysis of Two Surveys on Arithmetic Achievement UsingMultilevel IRT
Hagiwara Yasuhito, Nagasaki Eizo
The National Institute for Educational Policy Research
In this study, we analyzed two surveys of 6th graders'arithmetic achievement using multilevel item response theory(IRT)model. One survey was administered in 1991, the other was in 2004. In each survey, we primarily sampled 40 schools nationwide, and the pupils in each school were sampled secondarily. Multilevel IRT model, according to the sampling procedure, was applied to the data. We also used ordinary IRT model, assumed that participants were sampled independently, to compare with the previous model. The results showed that(1)multilevel IRT model fitted better than the ordinary model to both data,(2)the mean of the latent trait and the intraclass correlation coefficient in the current survey were nonsignificantly smaller than in the past one. However, the result(2)cannot be generalized to the whole Japanese 6th grades at the time because the pupils were not random-sampled exactly.We discussed the development of the nmodel apphed in this study.
Keywords: item response theory, two-stage sampling, arithmetic, within-school variance,between-school variance
▶ General research  
Effects of the environmental and Japanese speech noises on the English listening comprehension testing
Uchida Teruhisa1, Nakaume Naoko2, Shojima Kojiro1
1Research Divislon, the National Center for University Entrance Examinations, 2Admission center, Niigata University
omprehension testing withn low-level noises.Two sources of the noise were the environmental and Japanese speech sounds, and added over the speech scripts in the test at the levels in ・12dB(A)and・6dB(A). Participants were 569 conege freshmen, who took the same test with different noises in terms of the sources as well as the levels. The results indicated that the scores declined in the condition with the Japanese speech noise, and that students felt more interfered by the Japanese sound than the environmental noise. Further, item response theory was used to adjust the scores in order to compensate the effects of noises on the scores. The attempt was successful. Items without noises were used as anchor-items that every test-taker has answered to estimateθs.
Keywords: spoken language education, educational evaluation, listening comprehension test, noise, itenm response theory (IRT)
▶ General research  
Analysis of Factors Affecting Trends in High School English Ability Using Latent Growth Curve Modeling
Chisato Saidai1, Tamaki Hattori2
1Namiki High School, Ibaraki Prefecture/Tohoku Graduate School of Educational Information Educational Division, 2University of Tsukuba Graduate School of Comprehensive Human Sciences
The Japanese system of high school entrance examination has created clear differences among Japanese high schools, academic standards. Thnis study focused on schools rather than on individuals in order to investigate the factors affecting trends in high school English scores. Mean IRT・scale scores in 43 high schools over eight years were analyzed using Latent Growth Curve Modeling(LGM). We examined four factors that may affect the trends in high school Enghsh scores:(1)overan academic standards,(2)location,(3) history(date of fbunding), and(4)thne admission rate of a given high schnool. The results suggest that (1)the overall academic standards of a given high school and (4)the admission rate of a given high schnool may be factors affecting trends in hnighn school English scores.
Keywords: High Schnools, English IRT Scale Scores, Longitudinal Data, Factors Affecting Trend8, Latent Growth Curve Modeling