Research Project

Using Large Language Models and Generative AI to Scale Qualitative Data Analysis

Leveraging open-source large language models and generative AI to create workflows to conduct large-scale qualitative data analysis

Virginia Tech Academy of Data Science Discovery Fund$10,0002024-2025Active Study

Research Question

How can researchers combine qualitative judgment with open-source generative AI to scale thematic analysis without hiding methodological choices?

Approach

  • Design reproducible workflows for inductive qualitative codebook development
  • Compare model-assisted coding outputs with researcher interpretation and validation practices
  • Document where AI systems can support analysis and where human qualitative judgment remains essential

Evidence and Outputs

PublicationPublishedHumanities and Social Sciences Communications2026

Thematic analysis with open-source generative AI and machine learning

Featured paper introducing an open-source generative AI and machine learning method for inductive qualitative codebook development.

PublicationPublishedJournal of Engineering Education2025

Using generative AI for large-scale qualitative analysis of social media posts

Journal of Engineering Education study using generative AI to analyze more than 10,000 Reddit posts about why people leave computer science.

Research Artifacts

WorkflowIn progress

GATOS qualitative analysis workflow

Open-source workflow pattern for moving from raw qualitative text to candidate themes, researcher validation, and documented codebook decisions.

Why It Matters

  • Creates a public methodological bridge between traditional qualitative analysis and reproducible AI-assisted workflows
  • Makes the homepage evidence strip point to a substantive project rather than a generic method claim

People

  • PI: Dr. Andrew Katz