By Will Dean and Matt Shoemaker
In 2020, Temple Libraries launched the university’s first institutional repository, TUScholarShare, with an integrated research data collection and deposit service. The first deposit to the collection was a dataset collected by Matt Shoemaker, Head of the Loretta C. Duckworth Scholars Studio. Five years is a long time in the academic world – around how long an undergraduate degree often takes – and the last few have contained more than their fair share of events so we wanted to celebrate this milestone with a look at our first dataset.

The original “Gen Con Programs” dataset contained records from all events held at the Gen Con gaming convention from 1968 to 2017 (and a recent update brings the dataset up to 2025). Gen Con is the largest, longest running, and one of the oldest analog game conventions in the world. Today, the 4-day event hosts more than 20,000 gaming events each year for more than 70,000 attendees, and this dataset allows researchers to see how analog gaming has changed since 1968. What games were most popular, how they were described, how many people could play them, and more is all contained within this dataset.
The event data for Gen Con was difficult for researchers to access due to its ephemeral nature. From 1968 to 2002 it was only available within physical printed programs, many of which are quite scarce due to many people simply throwing them away once the convention was over for the year. The data after 2002 was captured by downloading a CSV dump of the convention’s online event catalog. This had to be done in a timely manner as it, too, is lost to the ether shortly after the convention ends. For the physical programs, staff in the Loretta C. Duckworth Scholars studio scanned each program and trained ABBY FineReader to use OCR to extract and format the event data. The spreadsheets then underwent minimal cleaning to make sure their columns matched across the years before being compiled and submitted for deposit to TUScholarShare.

The Libraries’ Research Data Services (RDS) team, which oversees data deposits to TUScholarShare, used the dataset to test our data curation workflow that was adapted from the Data Curation Network. The process involved closely examining spreadsheets within the dataset, creating new descriptive information (or metadata) to facilitate search and retrieval, and frequently communicating with the depositor. It also presented an opportunity to use some tools that were new to RDS in order to preserve this dataset openly.
As an open access repository, TUScholarShare is committed to making its content openly available and reusable, including its file types. For ease of use, the deposit is available as an XLSX file that opens in Excel and displays multiple sheets via tabs. While the XLSX file type is ubiquitous at the moment thanks to Microsoft’s domination of the office software market, it is not an open format that anyone can freely use. The comma separated value, or CSV, filetype is the most widely used open format for textual data and we used the Excel Archival Tool, openly available under the GNU GPLv3 license, to convert the many spreadsheet tabs quickly and easily into individual CSV files. Both file versions are available in the deposit, allowing us to make it available as openly as possible.
Over the past five years, our data deposit workflow has been refined to be clearer and more efficient, and the breadth of deposit types has grown to include materials from nine of Temple’s schools and colleges. Updating our first deposit, in collaboration with Matt, with more recent data has allowed us to reflect on how far we have come and demonstrates how data deposits are meant to be reused and updated as we learn more about the world around us through research. Check out the updated Gen Con Programs dataset and consider contributing your own work to the Research Data collection via TUScholarShare’s data deposit form.