Trust and Safety Product/Decision records/2025-02-05-IPoid-OpenSearch
Appearance
Authors
[edit]Status
[edit]- Proposed
Reviewers
[edit]Context
[edit]The IPoid service imports Spur data to a relational database. This presents several operational challenges:
- Daily management of high-volume inserts/deletions/updates is difficult to course correct, when errors occur
- Complex reconstruction of data into relational database format
- Data recency, due to time in import
- Database drift during daily updates
Options considered
[edit]1. Maintain status quo.
- Attempt to optimize imports
- Invest in maintenance and observability
2. Use an OpenSearch instance for data storage and querying
- Simplified data pipeline to direct JSONL ingestion
- Native handling of IP address data types and queries
- Initial experiments for imports and queries are promising:
- Initial import of 37.28M records completed in 50 minutes
- Query response times of 4-20ms for IP lookups
- Under 50ms for attribute queries
Decision
[edit]Propose to migrate the IPoid node JS import and serving app to use an OpenSearch instance for the data store and as a serving application for web requests.
Consequences
[edit]1. Simplified data pipeline
- Direct ingestion of JSONL format without complex transformations
- Ability to run multiple imports per day, reducing data staleness
- Automatic type inference for fields including IP addresses
- Built-in handling of updates and merges
- Built-in support for historical record querying
2. Querying
- Ability to track historical data through `last_seen` field
- Flexible querying capabilities using OpenSearch query DSL
Risks
[edit]- Unclear path to OpenSearch instance hosting at WMF. T362105 is deprioritized at the moment.