Exploring the Efficacy of ChatGPT in Analyzing Student Teamwork Feedback with an Existing Taxonomy

Contribution Summary

The paper provides early empirical evidence on using generative text models for deductive labeling of open-ended teamwork feedback.

Draft enrichment generated from extracted publication text; pending human review.

Plain-Language Summary

This study tests whether GPT-3.5 can classify open-ended student teamwork feedback using an existing taxonomy. It also examines whether the model can assess the accuracy of its own labels so instructors or researchers can focus human review on questionable cases.

Research Question

How well does GPT-3.5 match human labels when classifying student teamwork feedback into a predetermined taxonomy, and how well do its self-rated accuracy scores correspond to human evaluations?

Methods

Sampled 200 student teamwork feedback comments and prompted GPT-3.5-turbo to apply labels from an existing teamwork feedback taxonomy.
Had researchers rate the model's labels as accurate, unclear, or inaccurate.
Prompted the model to rate the accuracy of its own labels on a ten-point scale and compared those ratings with human judgments.

Key Findings

Researchers judged 85% of the model-generated labels as accurate, 8% as unclear, and 7% as inaccurate.
The model handled many semantically similar comments well, but sometimes defaulted to the first label or missed negative sentiment.
The self-checking step tended to flag some questionable labels, suggesting a possible triage workflow for human review.

Implications

Generative models can help classify open-ended student comments without forcing students into closed-response formats.

Human judgment remains necessary for ambiguous, negative, or context-dependent feedback comments.

Future systems should test label ordering, sentiment handling, privacy constraints, and local model alternatives.

Research Artifacts

protocolTeamwork feedback labeling promptPrompting workflow for applying an existing taxonomy to open-ended student teamwork comments.

protocolModel self-check promptPrompting workflow for asking the model to rate the accuracy of its own generated labels.

Abstract

Publication on Exploring the Efficacy of ChatGPT in Analyzing Student Teamwork Feedback with an Existing Taxonomy

Related Projects

Using Large Language Models and Generative AI to Scale Qualitative Data Analysis

How can researchers combine qualitative judgment with open-source generative AI to scale thematic analysis without hiding methodological choices?

Project

CAREER: Minds and Machines: Exploring Engineering Faculty Member Mental Models of Generative AI and Instructional Decisions

How do engineering faculty understand generative AI, and how do those mental models shape instructional decisions?

Project

EAGER: Natural Language Processing for Teaching and Research in Engineering Education (NLPTREE)

How can NLP methods help engineering education researchers and instructors analyze text-rich learning data responsibly and at scale?

Project

All publications