fbpx
Loading Events
  • This event has passed.

Digital Heritage Seminar: Layout Analysis and OCR with Deep Learning and Heuristics

Online event

11 April 2022
14:00 - 15:30

This event has passed.

Schedule

11 April 2022
14:00 - 15:30

Event type

Online event

Price

Free

Tags

Digital Heritage Seminar: Image Processing

 

KBR invites you to attend a scholarly series on digital cultural heritage: the KBR Digital Heritage Seminar, in cooperation with ULB-UGent-VUB-UCL.

In this series from February to June 2022 we will virtually host three academic scholars in presenting their work on cultural heritage and specifically on image processing.

“The devil is in the details!” When it comes to digital cultural heritage, this is as true as “The devil is in the images!” Great efforts have been devoted to the digitization of original collections in the cultural heritage. On the one hand, this helps greatly in promoting the collections and in allowing the general public to have much easier access to the collections (e.g. by publishing the images on websites like our digital library Belgica). On the other hand, technologies still need to be advanced in order to fully exploit the information (e.g. texts) that are still locked behind the digitized images.

In this series, we are very honored to have three researchers who have rich experiences in image analysis and especially for extracting information from digitized collections.

 

Programme

Clemens Neudecker, Berlin State Library, Berlin, Germany

 

“New Tools for Old Documents – Layout Analysis and OCR with Deep Learning and Heuristics”

This talk will discuss the main achievements and experiences of the QURATOR project at the Berlin State Library (SBB) for document layout analysis. Historical documents that are being digitized in large quantities by libraries and archives frequently exhibit a wide array of features that disturb layout analysis, such as complex layouts with multiple columns, drop capitals and illustrations, skewed or curved text lines, noise, annotations, etc.

In order to deal with these challenges and defects, a robust document layout analysis was developed that is implemented by pixel-wise segmentation using convolutional neural networks. In addition, heuristic methods are applied to detect columns or marginalia, and to determine the reading order of text regions. A key objective lies in feeding the resulting outputs to subsequent processes like a text recognition (OCR) engine or an image similarity search.

View slides

 

 

 

Practical information

Registration is free but mandatory. The morning of the event you will be sent the link to the webinar. Should you have any further questions please email gna.yh@xoe.or.

Duration: 1,5 hours

Register here

 

 

About the speaker

Clemens Neudecker studied Philosophy, Computer Science and Political Science at Ludwig Maximilian University (LMU) of Munich. For more than 15 years, he has been working in R&D at various Digital Libraries, including the Bavarian State Library and National Library of the Netherlands. Clemens is currently a researcher and a project coordinator at the Berlin State Library. He is also a member of the Council at Europeana, the European Union’s digital platform for cultural heritage.