"What Can You Tell From an N of 1?": Issues of Validity and Reliability in Qualitative Research Sharan B. Merriam The University of Georgia At conference presentations, in reviews of journal articles, at thesis defenses, the trustworthiness of qualitative research continues to be challenged, and rightly so. Rigor is needed in all kinds of research to insure that findings are to be trusted and believed. In applied fields like education, social work, counseling, and administration, the question of the trustworthiness of research findings looms large; after all, much research is designed to understand and improve practice. We want to feel confident incorporating research findings into our practice, for what we do affects the lives of real people. Questions most commonly posed to qualitative researchers reflect concerns with the validity and reliability of the research findings--questions such as the one in the title of this article, and others such as "How can you generalize from a small, non-random sample?", "If somebody else did this study, would they get the same results?", "How do you know the researcher isn't biased and just finding what he or she expects to find?", and "If the researcher is the primary instrument for data collection and anlysis, how can we be sure the researcher is a valid and reliable instrument?" These questions reflect legitimate concerns about the rigor of qualitative research; they also reflect philosophical assumptions underlying a quantitative or positivist worldview and are thus inappropriate for assessing the rigor of a qualitative study. The purpose of this paper is twofold: (l) to examine conceptions of validity and reliability from a qualitative or interpretive worldview; and (2) to present strategies for insuring for validity and reliability that are consonant with assumptions underlying the qualitative paradigm. Purposes of Qualitative Research In assessing the trustworthiness of qualitative research, it is important to back up and ask what kinds of questions or problems qualitative research is designed to address. Qualitiative research is ideal for the following: clarifying and understanding phenomena and situations when operative variables cannot be identified ahead of time; finding creative or fresh approaches to looking at over-familiar problems; understanding how participants perceive their roles or tasks in an organization; determining the history of a situation; and building theory, hypotheses, or generalizations. The question of trustworthiness becomes how well a particular study does what it is designed to do. Notions of validity and reliability must be addressed from the perspective of the paradigm out of which the study has been conducted. That is, if I am trying to build hypotheses rather than test them, if I am trying to understand a phenomenon rather than "treat" it, if I am interested in participants' perspectives rather than my own, different questions will need to be asked about the conduct of the study. Qualitative researchers have approached rigor from one of two angles. Some lay out the standard, positivist threats to validity and reliability made famous by Campbell and Stanley (l963) and Cook and Campbell (l979), and demonstrate how qualitative research addresses these threats. History, maturation, observer effects, selection and regression, mortality, spurious conclusions, and so on, can be addressed from a qualitative research perspective as demonstrated by Guba and Lincoln (l981) and Goetz and LeCompte (l984). More commonly, writers make the case that qualitative research is based on different assumptions regarding reality, thus demanding different conceptualizations of validity and reliability. Some have proposed using a different nomenclature. Agar (l986) for example, talks about credibility, accuracy of representation, and authority of the writer; Guba and Lincoln (l98l) suggest credibility, dependability, and transferability. The position expressed in this paper is that notions of validity and reliability need to be grounded in the worldview of qualitative research. Further, there are strategies that can be employed to ensure for trustworthiness that are highly compatible with this worldview. The discussion that follows is presented in terms of the three major aspects of rigor--internal validity, reliability, and external validity (generalizability). Internal Validity Internal validity asks the question, how congruent are one's findings with reality? In quantitative research the question is often more precisely stated as, are we observing or measuring what we think we are observing or measuring? Key to understanding internal validity is the notion of reality. Is reality fixed and stable as the positivists believe, or constructed and interpreted as qualitative researchers believe? These two views of reality are eloquently contrasted in Steinbeck's (l94l) log of his scientific journey to the Sea of Cortez in which he considers how one might describe a fish: the Mexican sierra has 'XVII-l5-IX' spines in the dorsal fin. These can easily be counted. But if the sierra strikes hard on the line so that our hands are burned, if the fish sounds and nearly escapes and finally comes in over the rail, his colors pulsing and his tail beating the air, a whole new relational externality has come into being--an entity which is more than the sum of the fish plus the fisherman. The only way to count the spines of the sierra unaffected by this second relational reality is to sit in a laboratory, open an evil smelling jar, remove a stiff colorless fish from formalin solution, count the spines, and write the truth 'D. XVII-l5-IX.' There you have recorded a reality which cannot be assailed-- probably the least important reality concerning either the fish or yourself. (p. 2) Qualitative research assumes that reality is constructed, multidimensional and ever-changing; there is no such thing as a single, immutable reality waiting to be observed and measured. Thus, there are interpretations of reality; in a sense the researcher offers his or her interpretation of someone else's interpretation of reality. Just as in quantitative research there are things you can do (such as control for extraneous variables) to ensure that findings are valid according to that paradigm's notion of reality, so too in qualitative research. The following strategies can be employed to strengthen the internal validity of a qualitative study: [These strategies and the ones discussed in the next two sections are drawn from experience and the literature, in particular, Guba and Lincoln (l98l), Merriam (l988), Patton (l99l).] l. Triangulation - the use of multiple investigators, multiple sources of data, or multiple methods to confirm the emerging findings (Denzin, l970; Mathison, l988). For example, if the researcher hears about the phenomenon in interviews, sees it taking place in observations, and reads about it in pertinent documents, he or she can be confident that the "reality" of the situation as perceived by those in it, is being conveyed as "truthfully" as possible. 2. Member Checks - taking data collected from study participants, and your tentative interpretations of these data, back to the people from whom they were derived, asking if the interpretations are plausible, if they "ring true." 3. Peer/Colleague Examination - asking peers or colleagues to examine your data and to comment on the plausibility of the emerging findings. 4. Statement of Researcher's Experiences, Assumptions, Biases - presenting the orientation, biases, and so on, of the researcher at the outset of the study. This enables the reader to better understand how the data might have been interpreted in the manner they were. 5. Submersion/Engagement in the Research Situation - collecting data over a long enough period of time to ensure for an in-depth understanding of the phenomenon. Criteria for determining how long is long enough can be found in Guba and Lincoln (l98l), Merriam (l988), Patton (l991). Most writers agree that internal validity is a strenth of qualitative research. There are fewer "layers" between the researcher and the phenomenon under investigation. The above strategies can help ensure that the interpretation of "reality" being presented is as "true" to the phenomenon as possible. Reliability Reliability is concerned with the question of the extent to which one's findings will be found again. That is, if the inquiry is replicated, would the findings be the same? Reliability in the "hard" sciences revolves around repeated measures of a phenomenon. Typically investigators disassociate themselves from the phenomenon being investigated by using "objective" measures. The more times the findings of a study can be replicated, the more stable or reliable the phenomenon is thought to be. This was precisely the problem with the cold fusion findings announced by University of Utah scientists. Other scientists were unable to replicate their work, hence the reliability of the findings of the original investigations were called into question. In the social sciences the whole notion of reliability in and of itself is problematic. That is, studying people and human behavior is not the same as studying inanimate matter. Human behavior is never static. Classroom interaction is not the same, day after day, for example, nor are people's understanding of the world around them. Cronbach (l975, p. 123) notes that "an acturarial table describing human affairs changes from science into history before it can be set in type." Further, the scientific notion of reliability assumes that repeated measures of a phenomenon (with the same results) establishes the truth of the results. However, measurements and observations can be repeatedly wrong, especially where human beings are involved. Scriven (l972) makes the point that a lot of people experiencing the same thing does not necessarily mean that their accounts are more reliable than that of a single individual. Five hundred people reporting that they had seen a magician cut a person in half, for example, would not be as reliable a report as that of the lone stagehand who had witnessed the event from behind the curtain. Qualitative researchers are not seeking to establish "laws" in which reliability of observation and measurement are essential. Rather, qualitative researchers seek to understand the world from the perspectives of those in it. Since there are many perspectives, and many possible interpretations, "there is no benchmark by which one can take repeated measures and establish reliability in the traditional sense" (Merriam, l988, p. l70). Clearly, replication of a qualitative investigation will not yield the same results. This fact does not lead to discrediting the results of either study, however (as it might in quantitative research). Rather, both sets of results stand as two interpretations of the phenomenon. Instead of reliability, one can strive for what Lincoln and Guba (l985, p. 288) call "dependability" or "consistency." The real question for qualitative researchers, they suggest, is not whether the results of one study are the same as the results of a second or third study, but whether the results of a study are consistent with the data collected. And, as with internal validity, there are strategies one can use to ensure for greater consistency. Three such strategies are listed below: l. Triangulation - the use of multiple methods of data collection in particular, as well as other forms of triangulation, can lead to dependability or consistency (as well as internal validity); 2. Peer Examination - again, this strategy provides a check that the investigator is plausibly interpreting the data; that is, someone else can be asked whether the emerging results appear to be consistent with the data collected; 3. Audit Trail - this strategy, suggested by Guba and Lincoln (l98l), operates on the same premise as when an auditor verifies the accounts of a business. "In order for an audit to take place, the investigator must describe in detail how data were collected, how categories were derived, and how decisions were made throughout the inquiry" (Merriam, l988, p. l72). Goetz and LeCompte (l984, p. 216) suggest that the audit trail should be so detailed "that other researchers can use the original report as an operating manual by which to replicate the study." Reliability, then, cannot be thought of in qualitative research in the same way as it is in positivist research. The logic of reliability in quantitative research is based on philosophical assumptions and a worldview different from that of qualitative research. What one strives for is consistency and dependability, a sort of internal reliability in which the findings of an investigation reflect, to the best of the researcher's ability, the data collected. External Validity The extent to which the findings of a study can be applied to other situations refers to the question of external validity, or generalizability. Indeed, this question seems to haunt qualitative research more than any other, probably because most people think of generalizability in the statistical sense of extrapolating from a sample to a population. Since qualitative researchers rarely select a random sample (which would then allow them to generalize to the population from which the sample was selected), it is thus concluded that one cannot generalize in qualitative research. While some qualitative researchers view generalizability as a limitation of the method, or just not appropriate for the social sciences, most prefer to think of generalizability as something different than going from a sample to a population. The goal of qualitative research, after all, is to understand the particular in depth, rather than finding out what is generally true of the many. There are at least three alternative conceptions of generalizability that are congruent with the philosophical assumptions underlying qualitative research. Cronbach (l975) thinks that empirical generalizations are too lofty a goal for social science research; rather, we should think in terms of working hypotheses. He writes: "Instead of making generalization the ruling consideration in our research, I suggest that we reverse our priorities. An observer collecting data in one particular situation is in a position to appraise a practice or proposition in that setting, observing effects in context....Generalization comes late....When we give proper weight to local conditions, any generalization is a working hypothesis, not a conclusion (pp. 124-125). Working hypotheses reflect situation-specific conditions of a particular context. They can also be used to guide practice (Patton, l991). A second concept, called concrete universals has been proposed by Erickson (l986). In attending to the particular, universals can be discovered. Concrete universals are based on the notion that particular situations convey insights that transcend the situation from which they emerge. The general lies in the particular. This is in fact how human beings make sense out of their world, how they cope with new situations. What is learned in a particular situation is applied to similar situations subsequently encountered. For example, a person who receives a speeding ticket on a particular highway will most likely "generalize" to subsequent instances of being on the same road, and to other similar roads. People do not wait until they have a sample of experiences before they generalize to new situations. A third way of viewing external validity is something becoming known as reader or user generalizability. In this view, the extent to which findings from an investigation can be applied to other situations is determined by the people in those situations. It is not up to the researcher to speculate how his or her findings can be applied to other settings; it is up to the consumer of the research. Wilson (l979, p. 454) suggests the notion of "a continuum of usefulness" beginning on one end representing "the setting where the information was gathered and stretching to dissimlar settings." Whether one thinks of generalizability in terms of working hypotheses, concrete universals, or user generalizability, there are strategies one can employ to strengthen this aspect of rigor in qualitative research. Four such strategies are discussed below: l. Thick description - this involves providing enough information/description of the phenomenon under study so that readers will be able to determine how closely their situations match the research situation, and hence, whether findings can be transfered. 2. Multi-site designs - the use of several sites, cases, situations, especially those representing some variation (Glaser and Strauss, l967), will allow the results to be applied to a greater range of other similar situations. 3. Modal comparison - this strategy involves describing how typical the program, event, sample is compared with the majority of others in the same class. In Wolcott's (l973) case study of a school principal, for example, he tells the reader how representative his subject is compared to the typical school principal. 4. Sampling within - a phenomenon being studied may have numerous component parts (teachers, administrators, students in a school system, for example), each of which could be randomly sampled for inclusion in the study. This would allow one to "generalize" to the larger group within the unit of study. In summary, external validity or generalizability seems to be most problematic for those not well acquainted with qualitative research. To consider generalizability a limitation of this kind of research is to be thinking in terms of statistical generalization based in the quantitative paradigm. By viewing external validity from the perspective of the assumptions underlying qualitative research, several reformulations of generalizability are possible, such as working hypotheses, concrete universals, and reader or user generalizabiity. What Can You Tell From an N of l? Viewed from a qualitative perspective, quite a bit can be learned from and N of l. The trustworthiness of the findings of a study with a small N and no random sampling are dependent upon the internal validity, reliability, and external validity of the study. As was discussed in this article, there are ways to view each of these concerns that are congruent with the underlying assumptions and worldview of qualitative research. Likewise, there are strategies that investigators can employ that will ensure for the validity and reliability of the study. Rigor is as valid a concern in qualitative research as in any other kind of research. Qualitative researchers employ different means of "persuading" the reader that a study is trustworthy. This is what Firestone (l987) calls the "rhetoric" of this research. While "the quantitative study must convince the reader that procedures have been followed faithfully because very little concrete description of what anyone does is provided" qualitative research persuades through its "classical strengths" of "concrete depiction of detail, portrayal of process in an active mode, and attention to the perspectives of those studied" (pp. 19, 20). persuades References Agar, M. (l986). Speaking of ethnography. Beverly Hills, CA: Sage. Campbell, D. T., & Stanley, J. C. (l966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally. Cook, T. D., & Campbell, D. T. (l979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally College Publishers. Cronbach, L. J., (l975). Beyond the two disciplines of scientific psychology. American Psychologist, 30, ll6-l27. Denzin, N.K. (l970). The research act: A theoretical introduction to sociological methods. Chicago: Aldine. Erickson, F. (l986). Qualitative methods in research on teaching. In M. C. Wittrock (Ed.), Handbook of research on teaching, (3rd ed.) New York: Macmillan, ll9-l6l. Firestone, W. A. (l987). Meaning in method: The rhetoric of quantitative and qualitative research. Educational Researcher, l6(7), 16-21. Goetz, J. P., & LeCompte, M. D. (l984). Ethnography and qualitative design in educational research. Orlando, Fl: Academic Press. Guba, E. G., and Lincoln, Y. S. (l981). Effective evaluation. San Francisco: Jossey-Bass. Mathison, S. (l988). Why triangulate? Educational Researcher, l7, 13-17. Merriam, S. B. (l988). Case study research in education: A qualitative approach. San Francisco: Jossey-Bass. Patton, M. Q. (l99l). Qualitative evaluation methods. (2nd ed.), Newbury Park, CA: Sage. 0Scriven, M. (l972). Objectivity and subjectivity in educational research. In L. G. Thomas (Ed.), Philosophical redirection of educational research: The seventy-first yearbook of the national society for the study of education. Chicago: University of Chicago Press. Steinbeck, J. (l941). Sea of Cortez. New York: The Viking Press. Wilson, S. (l979). Explorations of the usefulness of case study evaluations. Evaluation Quarterly, 3, 446-459. Wolcott, H. (l973). The man in the principal's office. New York: Holt, Rinehart and Winston.