Exploring the Validity of Three-Dimensional Assessments for Understanding Student Learning of Science
Effective Years: 2022-2026
The Next Generation Science Standards (NGSS Lead States, 2013) organizes goals for student learning along three dimensions: (1) scientific and engineering practices such as developing and using models; (2) crosscutting concepts such as cause and effect; and (3) disciplinary core ideas which includes foundational ideas from the life, physical, and earth sciences, and engineering and technology. Students in 44 states are being or will soon be held accountable to science standards based on the NGSS. High-quality assessments along these three dimensions (termed "3-D assessments") are needed to help states, teachers, and researchers measure students’ progress and the impact of educational interventions. While there is emerging agreement about the features of high-quality 3-D assessments, developers have struggled to create them for the NGSS that meet the criteria that have been set in the standards. In the process, they have developed assessments that students find more difficult to complete and score lower on than traditional ones. In addition, there is little evidence to determine if current assessments are measuring what they purport to measure. This project addresses this issue by conducting studies to explore what these assessments are actually measuring. This project will extend prior work around assessment research and development and promises to have a significant impact on how the field of science education thinks about 3-D assessments. The findings from this project will provide information about how the assessments function for English-language learners learners which could be used to increase the equity of 3-D assessments. The findings could also help assessment developers better develop instruments that measure students’ ability to demonstrate knowledge along the three dimensions articulated in the NGSS. Additionally, if the assessments used in this project are found to be valid measures of students’ 3-D science understanding, the project’s findings could be used to investigate the impact of 3-D science learning at the middle school level.
In order to advance the NGSS, it is critical that advances in the assessment of learning along these three dimensions of knowing be made. Thus far, 3-D assessment development has largely focused on developing clusters of items that are built around a focal phenomenon, known as phenomena-based item clusters (PBICs). The overarching research question that this project seeks to answer is to what extent are PBICs valid measures of students’ science understanding along these three NGSS dimensions? This question will be answered by developing an evidence-based validity argument based on the variety of evidence outlined in the updated Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999/2014). First, the project team will develop a pool of extant PBICs and investigate the extent to which they meet the quality criteria established for 3-D assessments (evidence based on test content). The results of this investigation will then inform revisions to the assessments that will be made before they are used in a series of studies designed to gather validity evidence. Next, think-aloud interviews will be conducted with middle school students to study how well PBICs tap into the intended cognitive processes related to the NGSS dimensions (evidence based on response processes). Concurrently, a larger group of middle school students from across the U.S. who will be using the OpenSciEd materials will be recruited to participate in a pre-test/post-test study. Rasch modeling of the pre- and post-test data will be used to determine the dimensionality of the items that make up the PBICs (evidence based on internal structure), investigate the correlation between performance on PBICs and disciplinary core idea-focused, multiple-choice assessments (evidence based on relations to other variables), and explore whether the PBICs function differently for English learners and native English speakers (evidence based on internal structure). Hierarchical linear models will be used to uncover what proportion of variance explained by PBICs is explained by measures of reading or writing ability (evidence based on relations to other variables). Finally, measures of instructional sensitivity for both the PBICs and the disciplinary core idea-focused assessments will be calculated to investigate the extent to which these assessments are sensitive to science instruction along the three NGSS dimensions (evidence based on consequences of testing).
This project is supported by NSF's EHR Core Research (ECR) program. The ECR program emphasizes fundamental STEM education research that generates foundational knowledge in the field. Investments are made in critical areas that are essential, broad, and enduring: STEM learning and STEM learning environments, broadening participation in STEM, and STEM workforce development. The program supports the accumulation of robust evidence to inform efforts to understand, build theory to explain, and suggest intervention and innovations to address persistent challenges in education.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.