Extension:TikaAllTheFiles
Appearance
Release status: beta |
|
|---|---|
| Implementation | Media, Search |
| Description | Using Apache Tika, provides text and metadata extraction for thousands of file types, enabling full-text search of almost any uploaded file |
| Author(s) | Matt Marjanovic (CtapMaddogtalk) |
| Maintainer(s) | Center for Transparent Analysis and Policy |
| Latest version | 2.0.1 (2025-08-15) |
| Compatibility policy | Master maintains backward compatibility. |
| MediaWiki | 1.37+ |
| PHP | 8.1+ |
| Database changes | No |
| Composer | centertap/tika-all-the-files |
| License | GNU General Public License 3.0 or later |
| Download | Codeberg: README.md RELEASE-NOTES.md |
| Translate the TikaAllTheFiles extension if it is available at translatewiki.net | |
The TikaAllTheFiles (TATF) extension facilitates full-text search over uploaded files, by using the Apache Tika content analysis toolkit, which "detects and extracts metadata and text from over a thousand different file types".
In practical terms: if you already have Extension:CirrusSearch set up and working on your wiki, TATF will allow you to perform full-text searches over the contents of almost any uploaded file --- not just the PDFs.
TATF's features and capabilities:
- extract embedded digital text from any type of uploaded file so that it can be indexed for full-text search;
- extract and index printed text from bitmap image files and from images embedded in document files, e.g., image-only PDF's (requires Tesseract OCR;
- extract metadata from any type of uploaded file for display on
File:pages; - index metadata properties along with text, to enable simple searching for properties within full-text search.
Installation
[edit]This extension can be installed using composer.
The complete installation and configuration instructions can be found in README.md.
Configuration parameters
[edit]The complete description of configuration parameters can be found in README.md.
