Extension:TikaAllTheFiles

The TikaAllTheFiles (TATF) extension facilitates full-text search over uploaded files, by using the Apache Tika content analysis toolkit, which "detects and extracts metadata and text from over a thousand different file types".

In practical terms: if you already have Extension:CirrusSearch set up and working on your wiki, TATF will allow you to perform full-text searches over the contents of almost any uploaded file --- not just the PDFs.

TATF's features and capabilities:
 * extract embedded digital text from any type of uploaded file so that it can be indexed for full-text search;
 * extract and index printed text from bitmap image files and from images embedded in document files, e.g., image-only PDF's (requires Tesseract OCR;
 * extract metadata from any type of uploaded file for display on  pages;
 * index metadata properties along with text, to enable simple searching for properties within full-text search.

Installation
This extension can be installed using.

The complete installation and configuration instructions can be found in.

Configuration parameters
The complete description of configuration parameters can be found in.