Ece Hakim ‘21
The experiment (total N = 68) was conducted to investigate the impact of conceptual priming in the well-known “surgeon riddle.” The surgeon riddle is as follows: “Father and son are driving a car. They get into a car accident, the dad dies, son gets rushed into hospital. The surgeon says, ‘I cannot operate, this is my son.’ Who is the surgeon?”
The correct answer is “the mother;” however, many participants fail to answer the question correctly. It was argued that the failure could be attributed to the priming effect rather than implicit gender stereotypes. In order to test this hypothesis, three conditions were created: a control group, a non-priming condition and a stereotype condition. For stereotype condition, the riddle was reformulated as “A mother and her daughter are driving a car…” to observe the outcomes when the correct answer, the father, is consistent with the stereotypes. For non-priming condition, the riddle was reformulated as “A father and his daughter are driving a car…” expecting the presentation of two sexes to override the priming effect observed in the original riddle. It was predicted that the change in the phrasing once the conceptual priming is overridden, the number of correct answers in the non-priming condition will be same as that of in stereotyping condition. Our results fell between these two extremes: Performance in the non-priming condition (father–daughter) improved (27%) but was far from the ceiling (89%).
Conceptual Priming vs Gender Bias in the “Surgeon Riddle”
A social stereotype is a mental association, which is not necessarily reflective of statistical reality, between a social group or social category and a trait. (Greenwald & Krieger, 2006) It is well-established by research that there are certain associations regarding the personal traits of men and women shared by the members of some social groups. (Ashmore & Del Boca, 1979; Cuddy et al., 2015; Eagly & Mladinic, 1989; Jackson & Cash, 1985) One of the most prevalent associations is the association between traditional domestic roles and collectivistic traits (i.e. nurturance, social sensitivity etc.) and women, as explained by social role theory. (Eagly, 1987)
Despite the increasing number of women joining in the workforce, social role theory seems to be pervasive. Particularly, the gender gap in areas such as math and science is attributed to the belief that women possess fewer of the traits that are associated with scientists such as analytical thinking and independence. (Carli, Alawa, Lee, Zhao, & Kim, 2016) Although this is a relatively recent finding in psychology, the belief itself has been deeply ingrained for a much longer time. A piece of evidence of this is the well-known surgeon riddle –which has become well-known because of a large number of people who get it wrong:
“A father and his son are driving a car. They get into a car accident. The dad dies, and the son gets rushed into emergency. The surgeon says, ‘I cannot operate, this is my son.’ Who is the surgeon?”
The mother of the son.
The New York Times columnist Stephanie Coontz recalls being stumped upon hearing the riddle in 1962. (Coontz, 2013) However, the origin of the riddle is unknown. In academic papers, it could be traced back to only 1985 –to Anthony J. Sanford’s book titled “Cognition and Cognitive Psychology.” In this book, Sanford uses the riddle to explain the phenomenon of presupposition. He attributes the failure to find the correct answer to the presupposition that surgeons are male. (Sanford, 1985) Likewise, later research –the ones that studied the riddle as well as the ones that referenced to it– explained the phenomenon with the implicit gender stereotype (Kollmayer, Pfaffel, Schober, & Brandt, 2018; Mineshima, 2008; Wapman, 2014) Although this explanation seems to make intuitive sense, which explains the limited research regarding this riddle, I argue that it is a quick conclusion to jump to. In this paper, I will explore an alternative explanation for the phenomenon of surgeon riddle, which is the priming effect.
A simple definition of priming effect would be the presentation of information designed to activate knowledge structures, such as trait concepts and stereotypes and hence make them more accessible. (Bargh, Chen, & Burrows, 1996; Gilovich, 2016) Research has shown that priming of a social category (i.e. elderly, women, African-American) impact social perception, and therefore a shift in judgement toward the primed category, assimilation effects of judgement, occurs. (Bargh et al., 1996; Herr, 1986, 1986; Herr, Sherman, & Fazio, 1983) I argue that the presentation of two male figures (father and son) and zero female figures in the riddle primes the social category of males. This leads to a shift in the judgement of participants toward males, and therefore makes it harder for them to think of a female surgeon and find the correct answer. So, if this priming effect were overridden, participants would not fail to find the correct answer.
In order to test this hypothesis, two different conditions were created. The first condition (non-priming condition) should have been presented with a question that would not create a priming effect. In order to achieve this, the question was reformulated in the following form:
“A father and his daughter are driving a car. They get into a car accident. The dad dies, and the daughter gets rushed into emergency. The surgeon says, ‘I cannot operate, this is my daughter.’ Who is the surgeon?”
The mother of the daughter.
I argue that the representation of both sexes, a male (father) and a female (daughter), would override the priming effect. In the second condition (stereotype condition), the question was reformulated in the following form:
“A mother and her son are driving a car. They get into a car accident. The mother dies, and the son gets rushed into emergency. The surgeon says, ‘I cannot operate, this is my son.’ Who is the surgeon?”
The father of the son.
This question also overrides the priming effect by presenting both sexes to the participant. However, in this condition, the expected answer from the participants was “the father.” So, if participants could find the answer to the question in this condition but not in the priming condition question, then the failure of the participants could be attributed to the implicit gender stereotypes. There was also be a control group which was given the original question. The success of the participants will be evaluated based on their accuracy (“Did they give the correct answer?) and on their response latency (“How many seconds did it take them to find the answer?”).
In total 68 (34 female, 34 male) participated in the study. The age of the participants ranges from 18 to 71, while the average is 29.9. The participants were randomly assigned to one of the three sample groups:
“A father and his son are driving a car. They get into a car accident. The dad dies, and the son gets rushed into emergency. The surgeon who comes into the emergency room and sees the son says, ‘I cannot operate, this is my son.’ Who is the surgeon?”
The mother of the surgeon.
“Father and daughter are driving a car. They got into a car accident, the father dies, daughter gets rushed into hospital. The surgeon says, ‘I cannot operate, this is my daughter.’ Who is the surgeon?
The mother of the surgeon.
“Mother and son are driving a car. They got into a car accident, the mother dies, son gets rushed into hospital. The surgeon says, ‘I cannot operate, this is my son.’ Who is the surgeon?
The father of the surgeon.
Each question was printed on a two-sided paper. On the front page, the participants were asked to provide information about their age, gender and education level. After they filled out the form, they were informed about the procedure. They were told to turn the page, read the question and write down their answer. The researcher started a timer to measure the response time. The timer was started immediately after the participants turned the page and stopped immediately before the participants started writing down their answers. Since the thought process starts while the participant is reading the question and ends right before s/he starts writing it, this specific method of time measurement captured the exact time spent finding the answer. If the participant still could not find the answer at the end of two minutes, s/he was stopped, and the answer was recorded as “N/A”. After the participants finished writing their answers, they were asked if they were familiar with the question. At the end of the experiment, the measured time was recorded by the researcher. The unit of data was seconds. The participant’s response was judged in terms of accuracy. The correct answers were assigned the value of 1, and the wrong answers were assigned the value of 0.
Among 68 participants, 10 of them were familiar with the question. So, their results were omitted from the calculations. In each condition, there were 19 participants who provided an answer. However, the number of participants who gave the correct answer varied in each condition group. In the control group, there were 4 correct answers. In stereotype condition, there were 17 correct answers. In non-priming condition, there were 9 correct answers. While calculating the average response times, only the response times for the correct answers were taken into account. The average response time for the control group was 33.8 seconds, while the average accuracy as 0.25. The average response time in non-priming condition was 38.5 seconds, while the average accuracy was 0.53. The average response time for sample 2 (Mother-Son) 30.6 seconds, while the average accuracy was 0.89 (Fig.1).
The differences in the number of correct responses across conditions were tested using chi-squared tests. These tests were conducted in R using the following syntax:
chisq.test(rbind(c(4, 16), c(9, 10))); chisq.test(rbind(c(4, 16), c(17, 2))); chisq.test(rbind(c(9, 10), c(17, 2)))
The results of these tests are as follows:
- Control group (father–son) vs. non-priming condition (father–daughter condition):
χ2(1) = 2.17, p = .141
- Control group (father–son) vs. stereotype condition (mother–daughter) condition:
χ2(1) = 16.23, p < .001
- Non-priming condition (father–daughter condition) vs. stereotype condition (mother–daughter) condition:
χ2(1) = 5.97, p = .014
It was concluded that there is a clear difference between the control group and stereotype condition, whereas there is only a suggestive difference between the non-priming condition and stereotype condition.
The differences in average response times across conditions were tested by using t-test in R. The difference between non-priming condition and the control group was not statistically significant (p=0.86). Likewise, the difference between stereotype condition and the controlled group was not statistically significant (p= 0.33) Finally, the difference between non-priming condition and stereotype condition was not statistically significant (p=0.19).
The results of the study partially support the hypothesis. It was observed that when the priming effect was overridden through the presentation of two sexes in the question, male and female, the accuracy rate increased. Yet, the accuracy rate of stereotype condition was still significantly higher than that of the non-priming condition. Therefore, it was concluded that although priming effect has an impact on participants’ failure to find the correct answer of the original riddle (father-son), based on these results, the impact of implicit gender stereotypes cannot be neglected. The results also showed that the response time was an insignificant measure and did not contribute to the findings of the study. The participants who could eventually find the answer spent almost an equal amount of time across all conditions. The priming effect or the expected answer (whether it was “the mother” or “the father”) did not change the response time significantly.
In the discussion of the results, it is important to acknowledge the limitations of the study. To start with, the small sample size (N=68) is a significant limitation. In each condition, the sample size was around 20 participants. The fact that the surgeon riddle is a fairly well-known question since it was posted online by BBC caused a further reduction in the number of participants whose responses could be evaluated. Although there were 68 participants, 10 of those participants had to be omitted from the study due to their familiarity with the question. In the future, the use of a larger sample size could further improve the results of the study.
Another limitation is the willingness and persistence of the participants. Notably, the average response time of the college students who gave the wrong answer is 30.2 seconds, while the average response time of the participants knew the question is 19.8. This shows that the college students who gave the wrong answer spent very little time to find the answer. Maybe, if they were willing to spend more time, there could have been a higher number of correct answers. This limitation might disappear if the participants are offered a compensation for their participation in future studies.
Another limitation of the study is the problem of internal validity. It is not unusual for a participant to be asked a “riddle-like” question. However, it is regarded as unusual for a question that sounds like a riddle to have such an easy answer. So, the nature of the experiment leads the participants to think that the question is “tricky.” Therefore, it is possible that they automatically disregard the possibility of ‘the mother” as an answer. For future studies, this limitation can be overcome by giving the question in a survey, along with other questions that have very basic answers.
These conclusions are important in the sense that they challenge the widely accepted explanation that the people who are asked the surgeon riddle fail merely due to their implicit gender stereotypes. Acknowledging the fact that this explanation was so readily accepted by psychologists (see Hoagland, 1988; Kollmayer et al., 2018; Oakhill, Garnham, & Reynolds, 2005) communicates that as the discourse concerning stereotypes deepens, it becomes easier to build theories on assumptions. Therefore, the study reveals that it might be necessary to revisit certain fundamental theories with a critical eye.
Table 1. The Control Group:
“Father and son are driving a car. They get into a car accident, the dad dies, son gets rushed into hospital. The surgeon says, ‘I cannot operate, this is my son.’ Who is the surgeon?”
Table 2. Sample 1:
“Father and daughter are driving a car. They get into a car accident, the father dies, daughter gets rushed into hospital. The surgeon says, ‘I cannot operate, this is my daughter.’ Who is the surgeon?”
Table 3. Sample 2:
“Mother and son are driving a car. They get into a car accident, the mother dies, son gets rushed into hospital. The surgeon says, ‘I cannot operate, this is my son.’ Who is the surgeon?”
I would like to thank Fiery Cushman, Benedek Kurdi, and the fellow visitors of Harvard Square.
Ashmore, R. D., & Del Boca, F. K. (1979). Sex stereotypes and implicit personality theory: Toward a cognitive—Social psychological conceptualization. Sex Roles, 5(2), 219–248. https://doi.org/10.1007/BF00287932
Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71(2), 230–244. https://doi.org/10.1037/0022-3522.214.171.124
Carli, L. L., Alawa, L., Lee, Y., Zhao, B., & Kim, E. (2016). Stereotypes About Gender and Science: Women ≠ Scientists. Psychology of Women Quarterly, 40(2), 244–260. https://doi.org/10.1177/0361684315622645
Coontz, S. (2013, June 9). Progress At Work, But Mothers Still Pay a Price. New York Times (1923-Current File); New York, N.Y., p. SR5.
Cuddy, A. J. C., Wolf, E. B., Glick, P., Crotty, S., Chong, J., & Norton, M. I. (2015). Men as cultural ideals: Cultural values moderate gender stereotype content. Journal of Personality and Social Psychology, 109(4), 622–635. https://doi.org/10.1037/pspi0000027
Eagly, A. H., & Mladinic, A. (1989). Gender Stereotypes and Attitudes Toward Women and Men. Personality and Social Psychology Bulletin, 15(4), 543–558. https://doi.org/10.1177/0146167289154008
Gilovich, T. (2016). Social psychology (Fourth Edition.). New York: WWNorton & Company.
Greenwald, A. G., & Krieger, L. H. (2006). Implicit Bias: Scientific Foundations. California Law Review, 94(4), 945–967.
Herr, P. M. (1986). Consequences of priming: Judgment and behavior. Journal of Personality and Social Psychology, 51(6), 1106–1115. https://doi.org/10.1037/0022-35126.96.36.1996
Herr, P. M., Sherman, S. J., & Fazio, R. H. (1983). On the consequences of priming: Assimilation and contrast effects. Journal of Experimental Social Psychology, 19(4), 323–340. https://doi.org/10.1016/0022-1031(83)90026-4
Hoagland, S. L. (1988). Lesbian ethics: Beginning remarks. Women’s Studies International Forum, 11(6), 531–544. https://doi.org/10.1016/0277-5395(88)90107-0
Jackson, L. A., & Cash, T. F. (1985). Components of Gender Stereotypes: Their Implications for Inferences on Stereotypic and Nonstereotypic Dimensions. Personality and Social Psychology Bulletin, 11(3), 326–344. https://doi.org/10.1177/0146167285113008
Kollmayer, M., Pfaffel, A., Schober, B., & Brandt, L. (2018). Breaking Away From the Male Stereotype of a Specialist: Gendered Language Affects Performance in a Thinking Task. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.00985
Mineshima, M. (n.d.). Gender Representations in an EFL Textbook, 20.
Oakhill, J., Garnham, A., & Reynolds, D. (2005). Immediate Activation of Stereotypical Gender Information. Memory & Cognition, 33(6), 972–983.
Sanford, A. J. (1985). Cognition and cognitive psychology. New York: Basic Books.