Review of “Development and initial testing of the critical reflection on sexism scale (CROSS) among young adults in the U.S.”

Sara Finney

3 Review of “Development and initial testing of the critical reflection on sexism scale (CROSS) among young adults in the U.S.”

Chapter Author & Date of Article Review

Sara J. Finney. April 2025

Article Reference & Link

Gee, M.N. & Johnson, S.K. (2025). Development and initial testing of the critical reflection on sexism scale (CROSS) among young adults in the U.S. Applied Developmental Science, Advanced Online Publication. https://doi.org/10.1080/10888691.2025.245745

Study’s Purpose, Methods, Analytic Approach, Results, & Implications

The purpose of this two-study project was to create and evaluate the factor structure of the Critical Reflection on Sexism Scale (CROSS), in particular potential differential item functioning (DIF) across gender and racial groups. The CROSS was purported to measure a person’s analysis of systemic gender-based oppression across four factors: Sexism Awareness, Structural Attributions for Gender Inequality, Male Privilege Awareness, and Rejection of Individual Attributions for Gender Inequality. Regarding methodology, in Study 1, an exploratory factor analysis of item responses from White young adults (N = 598; Mage = 22 years) supported the four-factor model. In Study 2, the four-factor structure was supported via confirmatory factor analyses of responses from an independent sample of Black and White young adults (N = 1,486; Mage = 21 years). Using this second sample, results indicated full measurement invariance of the items (configural, metric, and scalar) across cisgender women and men and across Black and White young adults. Implications of these findings include the use of the CROSS to compare critical reflection on sexism across gender and racial groups. That is, after controlling for group differences in critical reflection, women’s responses to items measuring critical reflection were not different than men’s (the same for White individuals and Black individuals). Demonstrating invariance across these samples was an important step toward determining whether this sexism-specific critical reflection measure was psychometrically equivalent across groups differentially subjected to sexism and intersecting systems of oppression.

Explanation of How Article Addresses Social Justice or Equity

Critical consciousness (CC) is a person’s awareness of oppressive systemic forces and their self-efficacy and action toward challenging these forces (Freire, 2000). Generally, CC has been defined as three dimensions: critical reflection (social analysis of oppressive systems and recognition of the systemic nature of the inequalities); critical self-efficacy or motivation (perceived capacity to impact change or belief in the importance of changing systems); and critical action (engagement in behaviors that impact structural inequities). CC has been considered an “antidote to oppression” because CC may be a source of empowerment and prompt mobilization against structural constraints that limit a person’s self-determination (Watts et al., 1999). In turn, CC has been used to study people’s beliefs about social inequality and their actions to promote social justice. Aspects of CC have been positively related to academic performance (e.g., Seider et al., 2020) and career aspirations (e.g., Diemer & Hsieh, 2008) for marginalized youth. CC has been related to youth participation in community activism, which has the potential to change policy and reorganize institutional structures (Christens & Dolan, 2011). However, measures of CC are needed to better study and understand these beliefs and their influence on actions. More specifically, measures that are domain-specific (i.e., focused on specific systems of oppression, such as racism or sexism) and that address specific aspects of CC (e.g., critical reflection) are needed. To address these needs, this two-study project described the development and psychometric evaluation of a measure of critical reflection (one aspect of CC) that is specific to sexism and evaluated its functioning across gender (cisgendered males and females) and racial (Black and White young adults) groups. Critical reflection is conceptualized as the explanations or attributions individuals make for social inequality (e.g., Diemer et al., 2022). Individuals who have developed critical reflection make structural attributions for social inequality (not personal attributions). When developing a domain-specific measure of critical reflection, one must determine whether the measure is psychometrically equivalent across social groups differentially subjected to sexism in two ways: 1) between groups with varying power within the system of oppression addressed by the measure (e.g., men vs. women critical reflection on sexism); and 2) between groups with varying power within other systems of oppression (e.g., Black vs. White individuals critical reflection on sexism) because individuals may understand and act against one system of oppression differently based on their position within another system of oppression.

Alignment between the Purpose of the Study and the SEM Analytic Approach

Previous critical reflection measures have three issues. First, critical reflection was assessed generally/globally and the few that assessed it using a domain-specific approach required complicated models. That is, domain-specific critical reflection measures of racism, classism, and sexism were best represented by bifactor models that partitioned item-specific variance into both a general critical reflection and a domain-specific critical reflection measure. Thus, to model domain specific critical reflect scores, full SEMs must be employed. Second, critical reflection has been represented as unidimensional although theoretically it is considered multidimensional. Thus, CFA could be used to evaluate the factor structure of responses to domain-specific critical reflection of sexism items. Third, systems of oppression create differences between groups to justify subordination, which lead to common group experiences (e.g., gender discrimination). Due to these experiences, a measure of perceptions of oppression may not function equivalently across different groups. Measurement invariance testing can examine the degree to which the psychometric properties of items are the same across groups. Without measurement invariance, one does not know if different average scores across groups indicate real differences in the levels of critical reflection or differences in how the measure operates for different groups (Kline, 2011).

Two Positive Aspects of the Analytic Approach

Their justification for the groups being compared via the invariance testing was coherent and complete. Often when assessing DIF, typical socially constructed groups are compared (gender groups, ethic groups) with no clear explanation regarding why the items may function differently across these groups. The authors explained that different systems of oppression (gender vs race) impact individuals in different ways. Thus, when asking about explanations or attributions of social inequality (the items under study), individuals experiencing society in different ways may answer items in different ways even if their levels of critical reflection are equal. That is, the items may operate very differently for groups who experience more or less inequity. One can imagine some items being confusing for particular groups (White men) who don’t experience sexism or various aspects of life (“By nature, women are happiest when they are making a home and caring for children”). In short, there was clear a priori stated justification for examining measurement invariance across these specific groups.

A rigorous process to evaluate measurement invariance was employed. First, the data were screened in order to choose the best estimation method. Second, the four-factor configural model was evaluated and due to the high factor correlations, a three-factor model was tested and was rejected due to model-data misfit. Third, regarding fit assessment of the configural model, the authors examined standardized covariance residuals in additional to global fit indices and explained their decision to champion the four-factor model based on these residuals. Fourth, they used three indices to evaluate the difference in fit between nested invariance models (difference in CFI, difference in RMSEA, and difference in chi-square) and were clear how these comparisons and their results aligned with the tests of measurement invariance (metric and scalar).

One Less Than Ideal/Negative/Missing Aspects of the Analytic Approach

Even though measurement invariance was established, the manuscript would have been stronger if a conceptual explanation of each level of invariance were provided. That is, what does it mean conceptually for the construct of critical reflection if configural invariance is supported vs. not supported; what does it mean conceptually if the factor loadings are of different magnitudes across groups and thus metric invariance isn’t established; what does it mean conceptually if item intercepts differ across groups and thus scalar invariance is not supported? These explanations would be most helpful if actual items on this scale were used as examples (What does it mean for this item to have different factor loadings across groups: “By nature, women are happiest when they are making a home and caring for children.” What does it mean if this item’s intercepts differed across groups?). Related to this, it would have been helpful if the invariance testing process was linked to the concepts of non-uniform DIF and uniform DIF. Finally, I was surprised that the latent means, variances, and correlations were not reported for each group. Given measurement invariance was established, readers could have been presented with any differences (or lack thereof) between the groups with respect to the average levels and variability about these averages for each factor. Moreover, it would have been interesting to see if the factors were equally correlated for each group or if some groups better distinguished between the factors (i.e., factors were more distinct for some groups). Including this information would have conveyed that establishing invariance of the factor structure and item parameters (no difference in number of factors, factor loadings, and item intercepts) does not infer that there are no differences across groups in latent means, variances, and factor correlations.

What was Found in Current Study

The authors found a newly constructed 12-item measure of critical reflection on sexism functioned well across four different groups of individuals: cisgendered males vs females and Black vs White individuals. The four factors were moderately to strongly related to each other. Specifically, the factors of Structural Attributions for Gender Inequality and Sexism Awareness were highly correlated (.84). Configural, metric and scalar invariance were supported across the four groups, suggesting the measure could be used in future research that involved young adults with a college education from these groups without worrying about if the measure functioned differentially across groups.

Purpose/Research Questions of Future SEM Study that Builds Off Current Study

The purpose of the future study would be gather needed external validity evidence for the four factors. The current study indicated fairly strong factor correlations among some of the dimensions, yet the authors indicated the factors were substantively distinct. In particular, the Sexual Awareness factor had large correlations with both Structural Attributions for Gender Inequity (.84) and Male Privilege Awareness (.74) factors. By correlating the four factors with external variables and evaluating if the four factors relate differentially to these external variables, we would gather support for their distinctiveness. Moreover, if the external variables are selected intentionally based on prior research, the expected relations between each of the four factors and the external variables can be hypothesized. These a priori expected relations are necessary for strong convergent and divergent validity evidence that inform the naming of the factors (Benson, 1998). Although future studies need to evaluate the functioning of this measure across older adults and individuals without a college degree, the purpose of this study would be to extend the validity evidence of the measure for the same population examined in the current study (young adults with college experience).

Methods & Analytic Approach to Address Future Study Purpose/Research Questions

Methods

Sample: Given the measure functioned the same across gender and Black and White young adults, I would gather a large diverse sample that has these characteristics. Specifically, in order to estimate full SEMs that specify the measurement model for the 12 items (four-factor model), I would gather data from at least 800 individuals (200 Black men, 200 Black women, 200 White women and 200 White men). Because this sample will not be disaggregated to estimate the full SEMs and factor loadings were quite strong (over .70), having 200 individuals per group is adequate (i.e., 800 individuals is adequate to estimate models with 12 strong factor loadings, 12 error variances, 6 disturbance correlations if the four factors are being predicted by external variables or 6 factor correlations if the four factors are predicting external variables, and approximately 8 paths between two external variables and the four factors; approximately 40 parameters so approximately 20 observations per parameter).

Measures: In the previous study, 12 items were used to operationalize the four factors and answered using a response scale of 1 = strongly disagree to 10 = strongly agree. I would administer the same items, in the same order, with the same directions as the previous study. Two external variables would be gathered to provide convergent and divergent validity evidence. First, known groups validity would be gathered by predicting the four subscales from a dichotomous variable that indicates if the individual experienced gender discrimination (directly or vicariously) or not. Those experiencing gender discrimination (either directly or vicariously) would be expected to score higher on the Sexism Awareness factor than those who have not experienced it. One would expect smaller differences between the two experience groups on the other factors. If this pattern of results emerged, this would serve as validity evidence that the Sexism Awareness subscale is distinct from the other three subscales. Second, further convergent and divergent validity could be demonstrated by examining the associations of the four CROSS factors with Case’s (2007) Male Privilege Awareness scale. It would be expected that the CROSS Male Privilege Awareness factor would correlate more strongly with Case’s measure than would the other three CROSS factors.

Procedure: Participants would be recruited nationally and locally using university participant pools in psychology and education departments at a variety of institutions to guarantee desired diversity in the sample. The study would be eligible for adults (i.e., 18 years and older) who identify as Black or White and Male or Female. Participants would be sent a link taking them to survey webpage. Participants who provide consent and completed the measures would be compensated for their time ($10 Amazon gift card). All items would be forced response (i.e., no missing data), all item responses would be timed to evaluate rapid responding, and all external variable measures would be completed after the 12 items representing the measure under study.

Analytic Approach

Responses to the 12 items would be screened and an estimation method chosen based on item response distributions: ML estimation if at least 6 of the 10 response options were chosen for each item and distributions were normal, robust ML if 6 of the 10 response options were chosen but distributions are considered nonnormal (skew > |3| and kurtosis >|8|), and robust DWLS estimation if fewer than 5 response options are chosen (data treated as categorical).

First, a four-factor CFA would be estimated that aligns with the championed model from the initial study. Global fit would be assessed using the CFI, SRMR and RMSEA (in addition reporting the X² value, degrees of freedom and test of significance). Local misfit would be assessed using correlation residuals. If the model does not fit the data, model-data misfit would be investigated and explained.

Assuming the four-factor model fits the data adequately, the four latent factors would be predicted by the dichotomous variable of discrimination experience (0/1). The disturbances of the four factors would be allowed to correlate because it isn’t expected that this discrimination experience variable explains the relations between the four factors. The four paths from the discrimination experience variable to each CR factor would be estimated and their standardized coefficients compared if model-data fit was adequate. Any misfit associated with this model would be attributed to the paths from the discrimination experience variable to each item being fixed to zero (i.e., these fixed parameters reflect the assumption of no uniform DIF). If there was model-data misfit, correlation residuals would be used to evaluate which items were influenced by the discrimination experience variable after controlling for the CR factor (i.e., which item experienced uniform DIF). If the model fit adequately (no uniform DIF), then the standardized direct paths from the discrimination experience variable to each CR factor could be compared to evaluate if the Sexual Awareness factor had the strongest relation as expected.

A second model could be estimated where the four CR factors were correlated (just like a typical CFA model) but the four factors directly predict Case’s (2007) Male Privilege Awareness scale. Case’s scale would be modeled as a single-indicator latent variable given we don’t need to assess its factor structure, but can instead simply use the scale’s reliability to fix the error term. Given adequate model-data fit, the standardized direct paths from each CR factor to Case’s Male Privilege external variable would be compared. If the CR factors are distinct, one could expect the CROSS Male Privilege factor to have the strongest direct path to Case’s Male Privilege scores.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License