Super Summary: Evaluating Critical Thinking

Many programs and strategies have been developed to help learners developcritical thinking skills. In order to evaluate these programs and to betterunderstand what factors affect the acquisition of critical thinkingskills, we need to think of ways how we can measure critical thinking inlearners. To find out what test methods have been developed, how thesemethods are applied and what is being done to evaluate existing tests, Idid a PsychLit search and a search on the World Wide Web. Surprisingly, there was not as much research as Ihad expected. I had to go back ten years and more to find articles thatfocussed on methods and tests to evaluate critical thinking. The articlesfall into threecategories: Overview articles and general discussions of critical thinkingtests and test methods (Ennis & Norris; Linn), descriptions of thedevelopment of new techniques to measure critical thinking (Elliot; Henri;High; Gibbs; Newman et al.; Shaw) and evaluations of critical thinking tests already in existence (Modjeski& Michael; Moss & Kozdial). In the following section, I will give anabstract for four of the articles and discuss their importance for thefield, myself and colleagues.

 

High, M. H. (1991). Assessing the effect of high school lessons inthinking skills. High School Journal, 75, 1, 34-39.

Starting from a critique of conventional quantitative measures of criticalthinking skills (Watson-Glaser Critical Thinking Appraisal, CornellCritical Thinking Tests) the author suggests a qualitative inquiry approachas a more valid measure ofcritical thinking skills in the school context. Eight teachers and 16students (grades 9-12) were randomly selected for naturalistic interviews.Teachers were asked about their goals and intentions when teaching criticalthinking skills (what they want the students to learn) and students wereasked what they perceived they were supposed to learn and what thoughtprocesses they had engaged in. Periods during which the class had focussed on critical thinking had beenvideotaped prior to the interviews. These tapes were shown to theinterviewees to support them in recalling their thoughts and intentionsduring the interview sessions.The study found that students were able to identify teacher behaviors andclassroom techniques that elicit different kinds of thought processes. Theyreported that these techniques caused them to analyze, synthesize, supporttheir claims with evidence, predict, evaluate and think flexibly. Teachers' andstudents' answers were compared to assess the level of congruence betweenthe students' perceptions of the lesson and the teachers' intentions. Thecongruence levelranged from 17% to 83%. A more detailed analysis of the students' interviewdata showed that the students level of immaturity, self-absorption and anegative mind set about the nature and value of schooling was related to alow level of congruence with the teachers' intentions. When asked if they wouldapply the skills learned in class outside of the classroom, only 11th and12th graders reported using the skills outside the classroom.

The importance of this article for the field is that it raises awarenessfor the question if quantitative measures are appropriate for measuringcritical thinking skills. The qualitative approach that the author suggests can certainlybe criticized because to me it rather measures the students' ability toguess their teachers' intentions than critical thinking skill, however theapproach of conducting introspective, qualitative interviews instead of quantitative tests isworth consideration. Personally, I found the article very interestingbecause it provided me with a concise overview of the main critiques ofquantitative critical thinking tests.Although I have doubts about the validity of the method used as mentionedabove, the article stimulated my thinking about methods to measure criticalthinking.

 

Newman, D. R., Webb, B., Cochrane, C. (1995). A content analysis methodto measure critical thinking in face-to-face and computer supported grouplearning. Interpersonal Computing and Technology, 3, 2, 56-77.http://www.helsinki.fi/science/optek/1995/n2

Following Garrison's theory of critical thinking and Henri's criticalreasoning skills, the authors describe how they developed a contentanalysis method to evaluate critical thinking in face-to-face discussionsand computer mediated communication.The content analysis method is based on a list of 24 paired indicators ofcritical thinking that are used to score transcripts of class discussions.Using these indicators, Newman et al. compare two sections of a course, one of whichwas taught online using an asynchronous conferencing system (CMC); theother one was taught in a traditional classroom setting. A preliminarycomparison of the two settings showed that there were less new ideas generated in the CMC group but moreevaluation and justification statements than in the classroom group. Theauthors conclude that in different settings, instructors

I see the importance of this article for the field in that it is one of thefew articles that gives a detailed description of a content analysistechnique to evaluate critical thinking that is based on a critical thinking theory. Even if this technique can not be applied inevery context (it would be far to complicated for teacher's use in schoolsfor example), I think it serves as a good starting point to think aboutalternative ways to evaluate critical thinking. To me personally this article was very interesting because itevaluates critical thinking in the context of computer mediatedcommunication, a field I want to focus on in the future. I got a lot of ideas out of this article for what would beindicators for critical thinking.

 

Modjeski, R. B. & Michael, W. B. (1983). An evaluation by a panel ofpsychologists of the reliability and validity of two tests of criticalthinking. Educational and psychological measurement, 43, 4,1187-1197.

The article reports the results of an evaluation of the Cornell CriticalThinking Test and the Watson-Glaser Critical Thinking Appraisal. A panel of12 psychologists judged the manuals of both tests according to the validityand reliability standardsas they are expressed in the APA Standards for Educational andPsychological Measurement. Modjeski and Michael found that both tests wererated high only on very few validity and reliability standards. Inparticular, both tests received a high rankingregarding the information available about the samples employed invalidation and in terms of the description of procedures and samples usedto determine reliability coefficients. Low scores were given concerningpossible test bias and cross-validation efforts as well as concerning the reporting ofstability of the scores over time. Overall, there was a tendency that theWatson-Glaser test received better validity as well as reliability ratingsthan the Cornell test.The authors conclude that because both tests lack considerably in terms ofvalidity and reliability they should be revised in the near future.

Since the Watson-Glaser and the Cornell test are the two most popular teststo measure critical thinking skills that are widely used, it is importantto ensure their validity and reliability. The result that thepsychologists' panel detected serious weaknesses in both tests is very important to researchers andpractioners who use the test to evaluate critical thinking. A flaw I see inthis study is that the psychologists' validity and reliability ratings werebased only on the informationgiven in the test manual. Thus, they did not necessarily rate the tests butrather the manuals. However, if relevant infomation about reliability andvalidity measures is missing in the manuals the test goodness seemsquestionable. This result was relevant to me as it helped me to become aware of possible problems withthe critical thinking tests (stability over time, cultural bias).

 

Moss, P. A. & Kozdiol, S. M. (1991). Investigating the validity of alocally developed critical thinking test. Educational MeasurementIssues and Practice, 10, 3, 17-22.

Subject of this study is the evaluationof a critical thinking test has been developed as part of the PublicSchools' Monitoring Achievement in Pittsburgh (MAP) Critical ThinkingProject. For each grade from 3 through 11 this test consists of a shortreading passage followed by an essay question that asks students to evaluate or draw an inference from the passage. Thestudents' essays are evaluated by the classroom teachers, following ageneric scoring guide.

This study seeks to evaluate the critical thinking test on several levels:The validity of the test is evaluated through a content analysis of the essayquestions. The interrater reliability is determined through comparing thescores that five teachers assigned to the same essays. In addition, anexploratory factor analysis across the scoring categories and the parallel forms of the test for each grade level wasperformed to see if there is evidence for a common skill underlyingperformance. Finally, data on the impact of the project on the socialstudies curriculum is gathered through a teaching practices inventory.

The results showed that there are serious flaws in the validity as well asthe reliability of the test. The content analysis revealed considerablequalitative differences in the parallel versions within some of the gradelevels as well as between the grade levels. This finding was supported through the results of the factoranalysis that showed significant differences between the parallel versions.The comparison of the tests scores determined by different teachers for thesame essays showed that the scoring manual was interpreted differently byteachers, resulting in low interraterreliability scores and undermining the validity of the test. The teachingpractices inventory finally indicated that the test has a positiveinfluence on the curriculum as it used by the teachers to plan theircurriculum. Also, it was seen as providing an opportunity for students to write in response to a critical thinkingtask and receive detailed feedback from their teachers.

The authors conclude that the validity and reliability of the test couldonly be enhanced by making substantial changes to the test itself. Inaddition they suggest that the teachers be provided with more rigorousscoring guidelines for the essays and with training on the scoring procedure. The authors acknowledge however,that the test has high validity for instructional purposes as can be seenfrom the teaching practices inventory data.

This article points out the flaws of the main test used to evaluate a citywide critical thinking program. But I think the relevance of the resultsgoes beyond the particular program that the test was supposed to evaluate:It also points out the difficulty to construct parallel versions of open ended essayquestions to evaluate critical thinking and it showed that especially in anopen ended test like this it is important to provide the evaluators withclear scoring guidelines and some training in the scoring procedure. To me the article wasalso particularly interesting because it shows the multitude of methodsthat can be employed to evaluate a critical thinking test.

 

 

Overview articles

Ennis, R. H. & Norris, S. P. (1990). Critical thinking assessment: status,issues, needs. In Legg, S. & Algina, J. (Eds.), Cognitive assessment oflanguage and math outcomes (pp. 1-42). Norwood, NJ: Ablex Publishing.

Linn, R. L. (1991). Dimensions of thinking: implications for testing. In L.Idol & B. F. Jones (Eds.), Educational values and cognitiveinstruction: Implications for reform (pp. 179-208). Hillsdale, NJ:Lawrence Erlbaum Associates.

 

Descriptions of development of techniques to measure criticalthinking

Elliot, L. B. (1993). Using debates to teach psychology of women. Teaching of Psychology, 20, 1, 35-38.

Henri, F. (1991). Computer conferencing and content analysis. In O'Malley,C. (ed.). Computer supported collaborative learning. Heidelberg:Springer.

High, M. H. (1991). Assessing the effect of high school lessons in thinkingskills. High School Journal, 75, 1, 34-39.

Gibbs, L, Gambrill, E., Blakemore, J., Begun, A. (1995). A measure ofcritical thinking about practice. Research on Social Work Practice, 5,2, 193-204.

Newman, D. R., Webb, B., Cochrane, C. (1995). A content analysis methodto measure critical thinking in face-to-face and computer supported grouplearning. Interpersonal Computing and Technology, 3, 2, 56-77.http://www.helsinki.fi/science/optek/1995/n2

Shaw, R. E. (1997). An ecological approach to the on-line assessment ofproblem-solving paths: Principles and applications. InstructionalScience, 25, 2, 151-166.

 

Evaluations of critical thinking tests

Modjeski, R. B. & Michael, W. B. (1983). An evaluation by a panel ofpsychologists of the reliability and validity of two tests of criticalthinking. Educational and psychological measurement, 43, 4,1187-1197.

Moss, P. A. & Kozdiol, S. M. (1991). Investigating the validity of alocally developed critical thinking test. Educational MeasurementIssues and Practice, 10, 3, 17-22.