Platform Engineering Team/Event Platform Value Stream/Event Catalog

This page documents the initial prototype of the Event Catalog for Apache Flink.

The Event Catalog in Wikimedia Event Utilities provides an easy way to access Wikimedia's Kafka in an SQL-like way for stream and batch processing. It does schema validation and performs automatic normalization of  and   fields.

Getting Started
(Assuming you already have Apache Flink installed)
 * Package versions in the examples here may changed

1. Build Event Utilities from this patch (If it's merged, then pull from main) to get

2. Download

3. Download

4. Start Flink's SQL client with these libraries. In this example they're all in a  folder.

4a. If you're inserting, also start the Flink cluster beforehand.

5. Create the catalog

6. Use the catalog

7. Check to see if you can query the kafka topics

Catalog Options
To create the catalog, you need to provide it with some default options.

Table Options
Tables within the catalog require some custom options in addition to the ones needed for the connector and format.

Limitations

 * When you create a table from scratch, you must use a  column (see examples)
 * When inserting, all columns (besides $schema and meta) must be present for it to succeed. (See T328211)
 * You cannot directly insert into a catalog-provided table.
 * You cannot alter the schema or its version after a table is created.
 * To use a table with a schema version other than latest, you must create the entire table from scratch.