Wikimedia Labs/Reverse proxy for web services

From MediaWiki.org
Jump to: navigation, search

THIS IS COMPLETED.

Currently, we need to give a public IP address for every project that needs to host a web server. This is wasteful. Generally, all of these services could be proxied to, and we could use a single IP address. Instead, we should have a proxy service that will allow users to associate a proxy with an instance, or list of instances (for load-balancing).

Spec[edit]

We should make a RESTful, OpenStack-like API that allows for the following actions:

  1. Create proxy service
    1. Host name of proxy service
    2. From port
    3. To port
    4. Type of load balancing
    5. Enable SSL (optional - v2?)
      • If SSL is enabled, which certificate to use, or upload a new certificate
      • This is likely hard to implement, as it would require an IP address per certificate, unless a star certificate was used.
      • Also, certificates would need to be per-project, or global (for private clouds), which would complicate matters further.
    6. Enable caching (optional - v2?)
      • This is likely hard to implement, as it would require letting the user supply caching rules
      • Seems nginx has a module for selective purging
      • Varnish supports selective purging natively
      • As long as the backend supports "allow purge from" for IP addresses, or a secret key this would be possible, as we could add the backends to the allow-from, or share the secret key with the end-user.
  2. Associate proxy service with instances
  3. Disassociate proxy service from instances
  4. Delete proxy service

Using Nginx on Ubuntu as an example of the proxy service's environment: creation of a proxy service would add a configuration file to /etc/nginx/sites-available/<proxy-name> and would create the host name (should the proxy service name simply be the host name? It would make it unique, which makes things easier.). When associated with instances, it would modify the /etc/nginx/sites-available/<proxy-name>, link it to /etc/nginx/sites-enabled/<proxy-name> (if not already linked), and reload nginx. When instances are disassociated, the service would modify /etc/nginx/sites-available/<proxy-name>, removing the instance, and would reload nginx. If the instance is the last instance to be removed, it would also unlink /etc/nginx/sites-enabled/<proxy-name>. When a proxy service is deleted, the configuration would be deleted from /etc/nginx/sites-available/<proxy-name> and the associated host name would be deleted.

Possible implementation issues[edit]

What happens when a service is created with the same host name as one added to a floating IP address? What happens when the DNS service adds the same hostname to a floating IP?

Dependencies[edit]

This obviously depends on having a reliable DNS service available as well, which should be included with OpenStack Nova's essex release.

Alternatives to implementing this ourselves[edit]

There's an existing OpenStack project for this called Atlas [1] [2]. Unfortunately, the only implemented driver is the Zeus load balancer, which is proprietary. They are also implementing an HAProxy backend for this. Another unfortunate thing about Atlas is that it is written in Java, and not python.

Other suggestions[edit]

We need to add support to pybal for IPv6. Twisted currently doesn't support IPv6 in its stable releases. Maybe we could switch pybal to use eventlet, and after doing so, extend it to act as an OpenStack-like service that supports LVS and haproxy. There's a couple major benefits to doing so:

  1. We need to update pybal anyway, and this would be one step in that direction
  2. It would be awesome to be able to pool/depool realservers via an API call
  3. pybal supports BGP, which would allow us to have redundancy for the load balancers
  4. We'd be using the same service in production and in Labs, and would be using the same service for L2/3/7 services