Image Mining via Synthesis

Image Data Mining Research Visualization

The Learnable Typewriter: A Generative Approach to Text Analysis.
Ioannis Siglidis, Nicolas Gonthier, Juliette Gaubil, Tom Monnier, and Mathieu Aubry.
In International Conference on Document Analysis and Recognition (ICDAR), 2024.
↳ webpage, chapter.

An Interpretable Deep Learning Approach for Morphological Script Type Analysis.
Malamatenia Vlachou-Efstathiou, Ioannis Siglidis, Dominique Stutzann, and Mathieu Aubry.
In International Workshop on Computational Paleography (IWCP), 2024.
↳ webpage, chapter.

Diffusion Models as Data Mining Tools.
Ioannis Siglidis, Aleksander Holynski, Alexei A. Efros, Mathieu Aubry, and Shiry Ginosar.
In European Conference on Computer Vision (ECCV), 2024.
↳ webpage, chapter.

Name	Institution	Jury Role
Alexei A. Efros	UC Berkeley	President
Jean Ponce	École Normale Supérieure	Reviewer
Josef Sivic	Czech Technical University	Reviewer
Hadar Averbuch-Elor	Cornell University	Examiner
Shiry Ginosar	Toyota Institute of Technology	Examiner
Mathieu Aubry	École Des Ponts ParisTech	Advisor

Image archives contain lots of hidden knowledge that researchers in the digital humanities would like to discover at scale. Much of this knowledge is visual, which makes it challenging to describe using textual descriptions, or inversely to manually ground existing textual descriptions to visual evidence. Given collections of images that are identified by general predefined classes, for example a type of script or the name of a country, the goal of this thesis is to develop machine learning approaches that can mine informative visual structure hiding behind those labels. Our work focuses on two specific problems of image data mining. The first, is to summarize and help refine existing typologies of handwritten characters. Character morphology has been central to the field of palaeography, where existing typologies are described through textual descriptions, hindering qualitative analysis. Our first contribution, the "Learnable Typewriter" achieves an explicit decomposition of a manuscript's text lines into small images of characters called sprites, which allows for an interpretable quantitative comparison. The second problem, is to summarize the visual structure that makes images, typical of their assigned label. Analyzing historical, or cultural image datasets, by counting the presence of predefined attributes often provides very general observations that can't focus on the visual details that are typical of the input label. In our second contribution, "Diffusion Models as Data Mining Tools", we leverage the abstract and scalable compositional synthesis capabilities of diffusion models to mine typical visual vocabularies from versatile labeled datasets, including portraits, geographical images, and scenes, of the order of thousand to million images.

Document	Description	Size
Acknowledgements	Categories I made along the way.	120 KB
Ch. 1: Introduction	Image Data Mining as a precursor to visual categorization.	14,4 MB
Ch. 2: Related Work	Image Data Mining as summarized human interpretable visual discovery.	25,8 MB
Ch. 3,4: Main Contributions	Learnable Typewriter & Diffusion Models as Data Mining Tools.	48,2 MB
Ch. 5: Epilogue	Intelligence from dataset learning to dataset making.	7,5 MB
Thesis	(online version)	847 MB
Thesis	(printed version)	847 MB

Unfortunately no video remains, from the presentation. While I have tested going live on youtube, during the day of my thesis I clicked go live on youtube and directly changed tab to start my presentation. Only when I returned back to the youtube tab to close the live, I realized that it never got recorded due to thread parallelism/youtube implementation on chrome, as the thread froze, and went live directly when I return back "writing go live". The only artifact remain is my "oh no". Still some content remains: