User:Dantman/SpamDB
Idea: Write a service which accepts submissions of spam and indexes them in a large database. In the future this service could be used to build some sort of filter to judge the spammyness of a wiki edit.
Suggested stack of technology:
- Python; With a gevent based server to be able to handle a lot of requests.
- Riak for the storage of the data blobs.
- Other database engine (unknown) for storing indexes and things we may want to iterate over.
Every spam entry will have a short document or row. The typical data pieces will be title and text. Both of these are mapped to a sha hash used as a key for blob storage. Looking up the data in Riak will get it back for use.
{
"title": "[...shasum...]",
"text": "[...shasum...]"
}
Though considering we'll want to separate data stored in Riak without loosing flexibility in our document keys we may want to store values as something like "riak:title/[...shasum...]".