Research Project
Using Large Language Models and Generative AI to Scale Qualitative Data Analysis
Leveraging open-source large language models and generative AI to create workflows to conduct large-scale qualitative data analysis
Virginia Tech Academy of Data Science Discovery Fund$10,0002024-2025Active Study
Research Question
How can researchers combine qualitative judgment with open-source generative AI to scale thematic analysis without hiding methodological choices?
Approach
- Design reproducible workflows for inductive qualitative codebook development
- Compare model-assisted coding outputs with researcher interpretation and validation practices
- Document where AI systems can support analysis and where human qualitative judgment remains essential
Evidence and Outputs
PublicationPublishedHumanities and Social Sciences Communications2026
Thematic analysis with open-source generative AI and machine learning
Featured paper introducing an open-source generative AI and machine learning method for inductive qualitative codebook development.
PublicationPublishedJournal of Engineering Education2025
Using generative AI for large-scale qualitative analysis of social media posts
Journal of Engineering Education study using generative AI to analyze more than 10,000 Reddit posts about why people leave computer science.
Research Artifacts
WorkflowIn progress
GATOS qualitative analysis workflow
Open-source workflow pattern for moving from raw qualitative text to candidate themes, researcher validation, and documented codebook decisions.
Why It Matters
- Creates a public methodological bridge between traditional qualitative analysis and reproducible AI-assisted workflows
- Makes the homepage evidence strip point to a substantive project rather than a generic method claim
People
- PI: Dr. Andrew Katz