Getting Started
Requirements
Java
Java 8+
Python
3.8+
Installation
Before installing tabula-py, ensure you have Java runtime on your environment.
You can install tabula-py from PyPI with pip command.
pip install tabula-py
If you want to leverage faster execution with jpype, install with jpype extra.
pip install tabula-py[jpype]
Note
conda recipe on conda-forge is not maintained by us.
We recommend installing via pip to use the latest version of tabula-py.
Get tabula-py working (Windows 10)
This instruction is originally written by @lahoffm. Thanks!
If you don’t have it already, install Java
Try to run an example code (replace the appropriate PDF file name).
If there’s a
FileNotFoundErrorwhen it callsread_pdf(), and when you typejavaon command line it says'java' is not recognized as an internal or external command, operable program or batch file, you should setPATHenvironment variable to point to the Java directory.Find the main Java folder like
jre...orjdk.... On Windows 10 it was underC:\Program Files\JavaOn Windows 10: Control Panel -> System and Security -> System -> Advanced System Settings -> Environment Variables -> Select PATH –> Edit
Add the
binfolder likeC:\Program Files\Java\jre1.8.0_144\bin, hit OK a bunch of times.On command line,
javashould now print a list of options, andtabula.read_pdf()should run.
Example
tabula-py enables you to extract tables from a PDF into a DataFrame, or a JSON. It can also extract tables from a PDF and save the file as a CSV, a TSV, or a JSON.
import tabula
# Read pdf into a list of DataFrame
dfs = tabula.read_pdf("test.pdf", pages='all')
# Read remote pdf into a list of DataFrame
dfs2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")
# convert PDF into CSV
tabula.convert_into("test.pdf", "output.csv", output_format="csv", pages='all')
# convert all PDFs in a directory
tabula.convert_into_by_batch("input_directory", output_format='csv', pages='all')
See example notebook for more detail. I also recommend reading the tutorial article written by @aegis4048 and another tutorial written by @tdpetrou.
Note
If you face some issues, we’d recommend trying tabula.app to see the limitation of tabula-java. Also, see FAQ as well.