- Java 8+
Before installing tabula-py, ensure you have Java runtime on your environment.
You can install tabula-py form PyPI with
pip install tabula-py
conda recipe on conda-forge is not maintained by us.
We recommend to install via
pip to use latest version of tabula-py.
Get tabula-py working (Windows 10)¶
This instruction is originally written by @lahoffm. Thanks!
- If you don’t have it already, install Java
- Try to run example code (replace the appropriate PDF file name).
- If there’s a
FileNotFoundErrorwhen it calls
read_pdf(), and when you type
javaon command line it says
'java' is not recognized as an internal or external command, operable program or batch file, you should set
PATHenvironment variable to point to the Java directory.
- Find the main Java folder like
jdk.... On Windows 10 it was under
- On Windows 10: Control Panel -> System and Security -> System -> Advanced System Settings -> Environment Variables -> Select PATH –> Edit
- Add the
C:\Program Files\Java\jre1.8.0_144\bin, hit OK a bunch of times.
- On command line,
javashould now print a list of options, and
tabula-py enables you to extract tables from a PDF into a DataFrame, or a JSON. It can also extract tables from a PDF and save the file as a CSV, a TSV, or a JSON.
import tabula # Read pdf into a list of DataFrame dfs = tabula.read_pdf("test.pdf", pages='all') # Read remote pdf into a list of DataFrame dfs2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf") # convert PDF into CSV tabula.convert_into("test.pdf", "output.csv", output_format="csv", pages='all') # convert all PDFs in a directory tabula.convert_into_by_batch("input_directory", output_format='csv', pages='all')