Fundraising tech/Failmail zoo

A guide to the species of failmail encountered in our systems

Fail Mail (civi1001) run-job: XXX failed with code 1
This phylum of failmail is generated by the process-control script. The subject line indicates which job failed, and with which exit code, but for any other details you will have to log on to civi1001 and read the log indicated in the body of the failmail.

Get recipient data from Silverpop failed with code 1
This happens almost daily with error message 'Login Failed'. Should we be retrying logins? Or is this usually during Silverpop's planned maintenance?

Fredge Multiqueue Consumer failed with code 1
This indicates either a DB access problem, or a coding error on our part. Look in the log referenced in the failmail. If it's a DB access issue, check with FR-ops and turn off the fredge queue consumer if necessary.

If it's an issue with one of the messages, like a badly formatted or missing field, determine the source of the message from the source_ fields and fix the offending component.

Error validating Amazon message. Firewall problem?
Our IPN listener receives messages from Amazon Simple Notification Services when donors send an Amazon Pay donation. Each notification contains a URL for a certificate, and part of the Amazon Pay SDK's logic retrieves that cert and uses it to check the signature. This failmail means something went wrong getting the cert.

There is an IP address in this failmail, but it's not the one we're trying to connect to and failing. The IP logged for all Amazon SNS messages is the SOURCE of the notification, not the IP address we try to contact to get the cert. To find the offending IP address, you will have to ssh into thulium / frpig1001 and try to fetch the cert from the URL mentioned in the log.

For example:

Fundraising ops will have to make sure that IP address is whitelisted in IPtables on thulium / frpig1001.

INVALID_MESSAGE Cannot create contribution, civi error!
This is usually due to a DB lock issue. Our queue consumer creates the contact and their contribution within a single transaction involving upwards of a dozen tables. Depending on what else is happening in Civi, this can result in a deadlock and the contact insert can fail. In this case, the queue message is generally shunted to the 'damaged' table. Follow the link at the bottom of the failmail and click 'resend' at the bottom of the damaged message UI to try the insert again.