ECR Projects

Explore past and current fundamental STEM education research projects across the three research areas that NSF's EDU Core Research (ECR) program funds, as well as across ECR funding types. Other search filters draw from both NSF's data and the ECR Hub's hand coding of award abstracts.

Ninth-grade biology students create cell models using clay.

Home > ECR Projects Search > Project Detail
STEM Learning and Learning Environments STEM Learning and Learning Environments  Broadening Participation in STEM Broadening Participation in STEM

Sub-group Fair Coding Taken to Scale for Science, Technology, Engineering, and Mathematics Learning

Effective Years: 2021-2026

This project will advance research in an important area needed for contributing to the national need for well-educated scientists, mathematicians, engineers, and technicians through the creation and validation of a process designed to effectively and fairly code educational data. Over its five-year duration, this project will develop and test an approach to coding data on learning in science, technology, engineering, and mathematics (STEM). It will do so in a manner that takes into account differences between groups without requiring researchers to code data by hand. Thus, this proposal will save time while also being able to help with answering questions about group differences in STEM learning. This approach will be made publicly available for use by researchers coding learning data while taking into account differences across groups.

As a result of these efforts, algorithms will be developed for learning science research that will: (i) produce fair classifiers, fair sets of codes, and identify conceptual shift among subgroups, (ii) provide support for identifying intersectional subgroups that may be modeled unfairly without specifying the interactions of group characteristics in advance, (iii) control for elevated Type 1 error rates, and (iv) provide an interface that can be used by researchers who care about fair coding. The technique being tested will enable this to occur with less data than is typically needed outside of learning science research. Though fairness in coding is a well-recognized challenge, particularly in STEM education research, this issue is handled almost exclusively by researchers examining their data by hand to see if the coding appears fair. This is time-consuming, and usually only done when there are strong a priori reasons to be concerned about fairness. In data science, there are relatively sophisticated approaches to fairness in coding, but there are significant limitations (e.g., the need for very large amounts of human coded data) that are currently insurmountable in learning science research. As such, having a validated method to code data, while systematically checking for subgroup fairness with the ability to identify the presence of conceptual shift will advance the field of data science in STEM learning research. By providing the protocol and algorithms associated with this process to other researchers through R Project for Statistical Computing, journal articles, and presentations will assure dissemination. This project is funded by the EHR Core Research (ECR) program, which supports work that advances fundamental research on STEM learning and learning environments, broadening participation in STEM, and STEM workforce development.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.