

The Learnable Typewriter: A Generative Approach to Text Analysis.
Ioannis Siglidis, Nicolas Gonthier, Juliette Gaubil, Tom Monnier, and Mathieu Aubry.
In International Conference on Document Analysis and Recognition (ICDAR), 2024.
↳ webpage, chapter.
An Interpretable Deep Learning Approach for Morphological Script Type Analysis.
Malamatenia Vlachou-Efstathiou, Ioannis Siglidis, Dominique Stutzann, and Mathieu Aubry.
In International Workshop on Computational Paleography (IWCP), 2024.
↳ webpage, chapter.
Diffusion Models as Data Mining Tools.
Ioannis Siglidis, Aleksander Holynski, Alexei A. Efros, Mathieu Aubry, and Shiry Ginosar.
In European Conference on Computer Vision (ECCV), 2024.
↳ webpage, chapter.
Name | Institution | Jury Role |
---|---|---|
Alexei A. Efros | UC Berkeley | President |
Jean Ponce | École Normale Supérieure | Reviewer |
Josef Sivic | Czech Technical University | Reviewer |
Hadar Averbuch-Elor | Cornell University | Examiner |
Shiry Ginosar | Toyota Institute of Technology | Examiner |
Mathieu Aubry | École Des Ponts ParisTech | Advisor |
Image archives contain lots of hidden knowledge that researchers in the digital humanities would like to discover at scale. Much of this knowledge is visual, which makes it challenging to describe using textual descriptions, or inversely to manually ground existing textual descriptions to visual evidence. Given collections of images that are identified by general predefined classes, for example a type of script or the name of a country, the goal of this thesis is to develop machine learning approaches that can mine informative visual structure hiding behind those labels. Our work focuses on two specific problems of image data mining. The first, is to summarize and help refine existing typologies of handwritten characters. Character morphology has been central to the field of palaeography, where existing typologies are described through textual descriptions, hindering qualitative analysis. Our first contribution, the "Learnable Typewriter" achieves an explicit decomposition of a manuscript's text lines into small images of characters called sprites, which allows for an interpretable quantitative comparison. The second problem, is to summarize the visual structure that makes images, typical of their assigned label. Analyzing historical, or cultural image datasets, by counting the presence of predefined attributes often provides very general observations that can't focus on the visual details that are typical of the input label. In our second contribution, "Diffusion Models as Data Mining Tools", we leverage the abstract and scalable compositional synthesis capabilities of diffusion models to mine typical visual vocabularies from versatile labeled datasets, including portraits, geographical images, and scenes, of the order of thousand to million images.
Document | Description | Size |
---|---|---|
Acknowledgements | Thesis acknowledgements | 120 KB |
Thesis | (printed) | 847 MB |
Unfortunately no video remains, from the presentation. While I have tested going live on youtube, during the day of my thesis I clicked go live on youtube and directly changed tab to start my presentation. Only when I returned back to the youtube tab to close the live, I realized that it never got recorded due to thread parallelism/youtube implementation on chrome, as the thread froze, and went live directly when I return back "writing go live". The only artifact remain is my "oh no". Still some content remains: