← All Projects
Research Assistant · 2024

GraphEidos Dataset

Published dataset on visual rhetoric in digital humanities

PythonETLNLPData Mining

The Problem

Researchers studying visual rhetoric in digital humanities lacked a comprehensive, structured dataset of publications in the field. Manually collecting and organizing thousands of publications was infeasible.

Approach

I engineered a modular Python pipeline to programmatically scrape publication metadata from multiple sources, then built cleaning and deduplication workflows to ensure data integrity. The final dataset was structured for accessibility and deposited as an open-access resource.

Results

  • 3,274 publication records scraped, cleaned, and structured
  • Published as an open-access dataset in the Journal of Open Humanities Data