1 Review of “Threats and supports to female students’ math beliefs and achievement”
Chapter Author & Date of Article Review
Sara J. Finney. March 2025
Article Reference & Link
McKellar S.E., Marchand A.D., Diemer M.A., Malanchuk O., & Eccles J.S. (2019). Threats and supports to female students’ math beliefs and achievement. Journal of Research on Adolescence, 29(2), 449-465. https://doi.org/10.1111/jora.12384
Study’s Purpose, Methods, Analytic Approach, Results, & Implications
The purpose of the study was to examine how teachers’ behaviors related to female middle and high school students’ math beliefs and achievement. Longitudinal data were collected from 518 female adolescents. Structural equation models were specified a priori based on expectancy-value theory and research indicating differential treatment of female students in STEM courses. Specifically, the models specified that female students’ motivation and achievement in math were impacted by teachers’ gender-based differential treatment of students and teachers’ emphasis on the relevance of math skills. This study was unique because it not only examined downstream consequences of math motivational beliefs (e.g., achievement), but also used longitudinal data to examine upstream teacher practices that shaped these math motivational beliefs. Results indicated that teachers’ gender-based differential treatment was negatively related to students’ math motivational beliefs and achievement, whereas teachers’ emphasis on the relevance of math was positively associated with these outcomes. Implications of these findings are 1) acknowledgement of the impact of sexism on females’ pursuit of STEM content and careers and 2) the promotion of strategies that could be employed to support positive math motivational beliefs and achievement for young women that may attenuate the STEM gender gap.
Explanation of How Article Addresses Social Justice or Equity
There are important implications of students’ motivation in math for their academic success and careers (Wang & Degol, 2013; Watt, 2006). However, students are not motivated to achieve in math when they believe they are not capable (low self-efficacy) and when they believe math is not important (low value of math; Eccles, et al., 2005). School context influences these motivational beliefs (Wigfield et al., 2015). Notably, teachers are critical socializing agents for adolescents (Eccles & Roeser, 2011) and teacher–student interactions are especially important for female students’ development of their motivational beliefs (Leaper & Brown, 2014). Female students are particularly vulnerable to teacher feedback; they internalize messages received in the classroom as diagnostic of their abilities (Pomerantz, et al., 2002). Thus, female students face more social identity threats to their math beliefs than male students. One of the many threats includes female students’ perceptions that STEM classes are inhospitable to them due to their gender (Leaper et al., 2012). A review of discrimination in schools indicated that teachers contribute to female students’ declined self-efficacy, interest, and achievement in STEM (Leaper & Brown, 2014).
Declines in motivational beliefs during adolescence have lasting effects for students’ path in life and help explain the shortage of women in STEM (National Center for Science and Engineering Statistic, 2015). Underrepresentation of women in STEM has traditionally been explained by lower achievement than men. However, contextual factors are important to understand female students’ persistence in STEM (Riegle-Crumb, King, Grodsky, & Muller, 2012). Using this literature base, the authors provided a systems context to explain gender differences in math motivation and achievement. Differences in motivation and achievement between female and male students were hypothesized to be due to inequitable school contexts in the form of teacher differential treatment of female and male students. These inequities in the treatment of students were used to predict differences in motivation and achievement between males and females, rather than assuming differences were due to students’ capabilities (i.e., differences were not hypothesized to be due to individual failings but instead produced by broader systemic issues). By uncovering relations between teachers’ gender-based differential treatment of students and students’ motivation/achievement, discussion could focus on strategies to remedy this inequitable treatment of female students.
Alignment between the Purpose of the Study and the SEM Analytic Approach
The purpose of the study was to examine how middle and high school female students’ perceptions of two context factors (gendered differential treatment and relevant math instruction) were related to two math motivational beliefs (self-concept and importance) and math achievement. SEM aligned with this purpose in the following ways. First, the authors specified their models a priori using expectancy-value theory in order to include variables (self-concept and importance) known to influence achievement (i.e., avoid model misspecification). Second, using SEM, they could evaluate how teachers’ behavior (gendered differential treatment and relevant math instruction) related to achievement after controlling for self-concept and importance (i.e., isolate unique effects). Third, using SEM, they could estimate the indirect effects of teacher behavior on achievement via the motivational beliefs (i.e., could compare direct, indirect and total effects). Fourth, given the longitudinal design, they could specify and estimate direct and indirect relations between middle school beliefs and achievement and high school beliefs and achievement (i.e., estimate cross-lagged effects). Fifth, given the interest in beliefs over time, they could use SEM to test differential item functioning (DIF) prior to estimating the structural models (i.e., assessed longitudinal measurement invariance).
Two Positive Aspects of the Analytic Approach
The authors tested measurement invariance of the constructs across male and female students and invariance wasn’t supported; thus, they focused on female students only. The authors clearly explained that because they could not establish full measurement invariance, they could not compare the processes specified in the full SEMs between men and women without ruling out differences as caused by differential measurement of the constructs between men and women. Thus, they correctly focused on only one population—women. They selected women given the historic issues associated with women in STEM.
The authors estimated indirect effects, in addition to direct effects, which is particularly important in order to understand why variables are related. For example, 8th grade gendered differential treatment was related to math self-concept three years later in 11th grade; hence gendered differential treatment had a long-lasting impact on females. However, it is important to uncover how the differential treatment in 8th grade related to math self-concept years later. The authors found that 8th grade gendered differential treatment had an indirect effect on self-concept in 11th grade via 9th grade achievement (8th grade differential treatment 9th grade math achievement11th grade math self-concept). Had the authors only reported the direct effects, we would not understand these more complex relations. Moreover, they tested substantively informed indirect effects, not all indirect effects, which reduced the number of statistical tests and focused attention on the indirect effects relevant to the purpose of the study.
One Less Than Ideal/Negative/Missing Aspects of the Analytic Approach
I have concerns regarding how Math Importance was measured and thus how the parameter estimates were influenced by this measurement issue. The authors used Expectancy-Value Theory to specify their models. Value has historically been multidimensional, in that task value could be utility value (i.e., importance for the future), attainment value (e.g., importance for self), or intrinsic value (i.e., enjoyment), and multiple items are typically used to operationalize each type of value. The authors used one item and it is unclear which type of value this one item represents: “Compared to other kids your age, how important are each of the following activities to you: math?” Moreover, they are asking the students to rate their math interest in reference to other students, which may be very difficult as interest isn’t observable. Further, whether a student has more or less interest than other students (normative comparison) does not indicate the level of interest in math (in an absolute sense). Finally, there is no way to evaluate the reliability of this one item. If math interest was not assessed well, then all paths in the model would be impacted (not only the paths that involve math importance). That is, if the paths that involve math interest are underestimated, then paths from other variables to the same outcomes may be overestimated.
What was Found in Current Study
As expected, gendered differential treatment from teachers was negatively associated with female students’ motivational beliefs and math achievement. Moreover, relevant math instruction was positively associated with motivational beliefs. Specifically, gendered differential treatment in the 8th grade negatively related to female students’ math importance and math achievement in the same year. Differential treatment in the 8th grade also negatively predicted 9th grade achievement. Differential treatment in 11th grade was negatively related to 11th grade self-concept. In contrast and as expected, 8th grade relevant math instruction was positively related to math importance and self-concept in the 8th grade. Self-concept was also related to 8th grade achievement and math importance as well as achievement the following year. Eleventh-grade relevant math instruction was positively related to 11th grade self-concept and 11th grade self-concept related to math importance. There were few significant indirect effects. Eighth-grade relevant math instruction had an indirect effect upon 8th grade math importance via 8th grade self-concept. Further, 8th grade self-concept partially mediated the relation between 8th grade relevant instruction and self-concept in the 11th grade. Eleventh-grade relevant instruction had an indirect effect upon 11th grade math importance via 11th grade self-concept. Lastly, 8th grade differential treatment had an indirect effect on self-concept in 11th grade via 9th grade achievement.
Purpose/Research Questions of Future SEM Study that Builds Off Current Study
The purpose of the future study is to examine the measurement invariance of perceptions of differential treatment by teachers across subpopulations of female students (e.g., Asian, Black, Latina, White). That is, an obvious next question is if results of the current study replicate when examining these subpopulations separately. However, first we must evaluate if the measurement of perceived differential treatment by teachers is equivalent across subpopulations. Recognition of academic discrimination may be more nuanced for some subpopulations versus others.
Methods & Analytic Approach to Address Future Study Purpose/Research Questions
Methods
Sample: The goal would be to obtain samples of female adolescent students who identify as various ethnic or racial identities (e.g., Asian, Black, Latina, White). Intersectionality of identity would be taken into account by allowing students to select “all that apply” when denoting their identity. Specific measurement of racial and ethnic identity would align with current APA recommendations and piloted prior to collecting data. Given the measure (described below) is only four questions (eight estimated parameters for unidimensional model) and the factor pattern coefficients were large (standardized values around .70) from the previous study, the goal would be to try to obtain samples of size 300 to 500 for each subpopulation. Larger samples (near 500) would be helpful in case item distributions were grossly nonnormal or particular response options were not selected, and thus a variety of different estimation methods may need to be employed. This initial study would focus on assessing measurement invariance when the female students are in 8th grade rather than gathering these responses longitudinally across middle and high school as in the current study.
Measures: In the previous student, 4 items were used to operationalize “Perceived Gendered Differential Treatment by Teachers” and answered using a response scale of 1 = never to 5 = more than six times.
- At school, how often do you feel like teachers called on you less often than they called on kids of the opposite sex?
- At school, how often do you feel like you got disciplined more harshly by teachers than kids of the opposite sex?
- At school, how often do you feel that teachers thought you were less smart than kids of the opposite sex?
- How often have you felt that teachers/counselors discouraged you from taking certain classes because of your sex?
Given the metric of the response scale, the instructions to students must contain a timeframe. If the measure is given near the end of the school year (e.g., April), the instructions would read: “Please reflect on your interactions with all your teachers over the past year. When answering each question indicate how often you felt a particular way using the following response scale: 0 = never, 1= once, 2 = twice, 3 = three times, 4 = four times, 5 = five times, 6 = six or more times.” I changed the response scale to be more intuitive and to incorporate descriptors for each option. Moreover, “opposite sex” would be changed to “different genders” to recognize there are more than two genders.
Procedure: Self-identified female students would complete this measure during school hours in the month of April, with the goal that the data collection take place for all students within the same week. Students would be told to reflect on all teachers over the past academic year. Students would be told that their responses are anonymous and that results would be used to provide better supports and opportunities for student learning. Students would be given 20 minutes to complete the measure via computer and indicate their racial or ethnic identity. For a subset of the respondents, the data collection would take place in a separate room in order to gather cognitive processing information via think alouds. That is, the computer would have headphones and a microphone so the student could explain how they are processing the meaning of the item and selecting their response. The think aloud information will be helpful if measurement invariance is not established across subpopulations (i.e., possible explanations for DIF may emerge from the recorded think alouds).
Analytic Approach
Data would be screened and an estimation method chosen based on item response distributions: ML estimation if all response options are chosen and distributions are normal, robust ML if all response options are chosen and distributions are considered nonnormal (skew > |3| and kurtosis >|8|), and robust DWLS estimation if all response options are not chosen (data treated as categorical).
Assuming all response options are chosen, measurement invariance would be tested across the subpopulations using the following procedures. First, configural invariance would be assessed by fitting a one-factor model to the four item responses separately for each subpopulation. If the configural model does not fit the data well, model misfit will be diagnosed and discussed, potentially with the think aloud data if helpful. If the configural model fits the data well for each group, metric invariance will be tested by constraining the unstandardized factor pattern coefficients to be equal across subpopulations. If the metric model indicates model-data misfit, the non-uniform DIF would be identified and discussed, potentially with the think aloud data if helpful. If the metric model fits the data well, scalar invariance would be tested by constraining the item intercepts to be equal across subpopulations. If the scalar invariance model indicates model-data misfit, uniform DIF would be identified and discussed, potentially with the think aloud data if helpful. If the scalar model fit the data well, then full measurement invariance would be established, indicating the measure functions the same across the subpopulations. At this point, differences in the latent mean of “Perceptions of Differential Treatment by Teachers” could be estimated across groups, along with differences in the latent variance of the factor. The means and variances of the observed composite score of “Perceptions of Differential Treatment by Teachers” could also be computed for each subpopulation. A comparison of the latent effect sizes and observed effect sizes would indicate the impact of measurement error on statements about subgroups mean differences on “Perceptions of Differential Treatment by Teachers”.