Contribution Summary
The paper provides early empirical evidence on using generative text models for deductive labeling of open-ended teamwork feedback.
Draft enrichment generated from extracted publication text; pending human review.
Plain-Language Summary
This study tests whether GPT-3.5 can classify open-ended student teamwork feedback using an existing taxonomy. It also examines whether the model can assess the accuracy of its own labels so instructors or researchers can focus human review on questionable cases.
Research Question
How well does GPT-3.5 match human labels when classifying student teamwork feedback into a predetermined taxonomy, and how well do its self-rated accuracy scores correspond to human evaluations?
Methods
- Sampled 200 student teamwork feedback comments and prompted GPT-3.5-turbo to apply labels from an existing teamwork feedback taxonomy.
- Had researchers rate the model's labels as accurate, unclear, or inaccurate.
- Prompted the model to rate the accuracy of its own labels on a ten-point scale and compared those ratings with human judgments.
Key Findings
- Researchers judged 85% of the model-generated labels as accurate, 8% as unclear, and 7% as inaccurate.
- The model handled many semantically similar comments well, but sometimes defaulted to the first label or missed negative sentiment.
- The self-checking step tended to flag some questionable labels, suggesting a possible triage workflow for human review.
Implications
Generative models can help classify open-ended student comments without forcing students into closed-response formats.
Human judgment remains necessary for ambiguous, negative, or context-dependent feedback comments.
Future systems should test label ordering, sentiment handling, privacy constraints, and local model alternatives.
Research Artifacts
Abstract
Publication on Exploring the Efficacy of ChatGPT in Analyzing Student Teamwork Feedback with an Existing Taxonomy
Related Projects
Using Large Language Models and Generative AI to Scale Qualitative Data Analysis
How can researchers combine qualitative judgment with open-source generative AI to scale thematic analysis without hiding methodological choices?
CAREER: Minds and Machines: Exploring Engineering Faculty Member Mental Models of Generative AI and Instructional Decisions
How do engineering faculty understand generative AI, and how do those mental models shape instructional decisions?
EAGER: Natural Language Processing for Teaching and Research in Engineering Education (NLPTREE)
How can NLP methods help engineering education researchers and instructors analyze text-rich learning data responsibly and at scale?