Programme
Hybrid Workshop at the Max Planck Institute for Legal History and Legal Theory (mpilhlt), 04 November, 2025
All times are Frankfurt time, that is CET (UTC/GMT +1 hour)
Tuesday 04 November 2025
Onboarding
09:00-09:15 Arrival/Registration
09:15-09:45 Christian Boulanger/Andreas Wagner (mpilhlt): Welcome and Upshot from RefExtract2023, State of the Discussion
09:45-10:00 Coffee Break
Research presentations
10:00-12:30
Hiba Arnaout (TU Darmstadt): In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis
Yurui Zhu/Matteo Romanello (Odoma): Benchmarking Large Language Models on Reference Extraction and Parsing in the Social Sciences and Humanities
Sofía Aguilar Valdez (Saarland University): How Scientific Ideas Evolve
Open Discussion and Ad-Hoc Presentation of Research
12:30-13:30 Lunch
Datasets, Infrastructure and Interoperability
13:30-15:30
Angelo Di Iorio/Matteo Guenci/Marta Soricetti*/Silvio Peroni/Lorenzo Paolini*/Ivan Heibi (University of Bologna): Citation Extractor and Classifier: Pipeline and Datasets (*presenting)
Tamara Heck/Christoph Schindler/Verena Weimer/Philipp Mayr/Ahsan Shahid (DIPF/GESIS): Open Citation Data for Educational Research
Christian Boulanger, Andreas Wagner (mpilhlt): Datasets in the Legal Theory Knowledge Graph Project
Interoperability Roundtable: Open Discussion on Data Models and Data Formats
15:30-16:00 Coffee Break
Tools, Workflows and Pipelines
16:00-17:30
Raphael Schlattmann/Malte Vogl (mpigea)/Aleksandra Kaye (TU Berlin/mpigea): LLM-Based Knowledge Graph Extraction Pipeline
Luca Foppiano (ScienciaLAB): Training the Grobid Reference Extraction Models
Christian Boulanger/Andreas Wagner (mpilhlt): Annotation Tools for Machine Learning: PDF-TEI Editor (for LLamore & Grobid), Prodigy, TEI-Publisher
17:30-18:30 Takeaways, Way Forward, Closing
19:00 Dinner (self-paid)
Restaurant Zur Stalburg, Glauburgstraße 80, 60318 Frankfurt am Main