Excalibur

Excalibur

Extract tables from PDFs into CSVs

Star
Available for Windows, Mac and Linux

Excalibur can be easily installed using pip.

pip install excalibur-py

Or run directly with the executable!

Download Now!

About


The Portable Document Format

A PDF file defines instructions to place characters at precise x,y coordinates relative to the bottom-left corner of the page. Words are simulated by placing some characters closer than others. Spaces are simulated by placing words relatively far apart. And finally tables are simulated by placing words as they would appear in a spreadsheet. The format has no internal representation of a table structure.

Extracting tables from PDFs is hard

The Portable Document Format was not designed for tabular data. Sadly, a lot of open data is shared as PDFs and getting tables out for analysis is a pain. A simple copy-and-paste doesn't work. Excalibur makes PDF table extraction very easy, by automatically detecting tables in PDFs and letting you save them into CSVs and Excel files through a web interface.

Why another tool?

There are both open and closed-source tools that are widely used for PDF table extraction. They either give a nice output or fail miserably. Excalibur is powered by Camelot which gives users additional settings to tweak table extraction and get the best results. You can see how it performs better than other open-source tools and libraries in this comparison.

Secure and built for scale

You get complete control over your data, since all file storage and processing happens on your own local or remote machine. Excalibur can also be configured with MySQL and Celery to execute table extraction jobs in a parallel and distributed manner. By default, jobs are executed sequentially.

Usage


Upload a PDF

You can upload a PDF using the web interface. You can also interact with previous uploads.

Autodetect tables

Excalibur can automatically detect tables in your PDF.

Or draw table areas and/or column separators

You can guide the tool by drawing table areas and column separators in cases where the tables are buried deep inside the text and autodetection fails.

Or load saved settings

You can save table extraction settings for a PDF once, and apply them on new PDFs to extract tables with similar structures.

View and download data

Finally, you can view the extracted tables and download them as CSVs or Excel files. Excalibur also supports JSON and HTML.

Contact


Do you have feedback or want us to build a new feature? Just holler!