Expanding possibilities for generative AI in qualitative analysis: Fostering student feedback literacy through the application of a feedback quality rubric

Contribution Summary

The paper offers evidence about both the promise and limits of using local LLMs as rubric raters, while also extending feedback literacy theory to peer feedback in first-year engineering.

Draft enrichment generated from extracted publication text; pending human review.

Plain-Language Summary

This study tests whether a local open-source large language model can apply a feedback quality rubric to first-year engineering peer feedback comments. It uses that methodological test to examine what students' comments reveal about feedback literacy and self-regulated learning.

Research Question

How reliably can an open-source local LLM apply a feedback quality rubric to engineering student peer feedback comments, and what do those ratings reveal about first-year students' feedback quality?

Methods

Applied a four-criteria feedback quality rubric to 295 peer feedback comments from a first-year engineering course.
Piloted multiple open-source local models and used qwen2.5-32b to rate comments against the rubric criteria.
Compared model ratings with researcher ratings using Cohen's quadratic weighted kappa and analyzed the resulting scores and comment typology.

Key Findings

The LLM produced coherent ratings but was less reliable than human researchers overall, especially on more subjective criteria.
The model reached strong agreement on the Action criterion and behaved in ways similar to a novice rater applying a new rubric.
Students' feedback comments were generally low to medium quality, with constructive elements such as performance gaps and actionable next steps appearing less often than positive descriptions.

Implications

LLM-supported qualitative rating requires careful rubric design, prompt iteration, pilot testing, and human oversight before substantive claims are made.

Local open-source models can support privacy-sensitive educational research workflows, but model choice and compute requirements remain important constraints.

Engineering instructors should explicitly teach students how to write feedback that includes specific behaviors, performance gaps, and actionable next steps.

Research Artifacts

instrumentFeedback quality rubricA four-criteria rubric for assessing the task, behavior, gap, and action content of peer feedback comments.

protocolLLM rubric-rating workflowA local-model workflow for piloting prompts, applying the rubric, and comparing model ratings with researcher ratings.

Abstract

This study investigates the reliability and utility of generative AI to apply a feedback quality rubric to peer feedback comments from first-year engineering students, exploring how LLMs can assist in qualitative analysis.

Related Projects

Using Large Language Models and Generative AI to Scale Qualitative Data Analysis

How can researchers combine qualitative judgment with open-source generative AI to scale thematic analysis without hiding methodological choices?

Project

CAREER: Minds and Machines: Exploring Engineering Faculty Member Mental Models of Generative AI and Instructional Decisions

How do engineering faculty understand generative AI, and how do those mental models shape instructional decisions?

Project

All publications