Extension talk:AWS

Jump to navigation Jump to search

About this board

No previous topics.

Can this extension also store other files except from media files to AWS?

7
Summary by Edward Chernenko

All uploads (via Special:Upload) are stored in S3.

MavropaliasG (talkcontribs)

Hi, I would like to have mediawiki store uploaded datasets, csv files, and zip files (and in general any file) to AWS. Can I do this with this extension, or is it only for images (and videos?) Thanks

T0lk (talkcontribs)

If you can upload it to your wiki, this extension will put that file on s3. Does that help?

MavropaliasG (talkcontribs)

Thank you @T0lk, so this extension puts ALL uploads, regardless of their file type on s3?

Ciencia Al Poder (talkcontribs)

Yes. This works for all uploads (from Special:Upload)

MavropaliasG (talkcontribs)

Thank you for the reply @Ciencia Al Poder. Can I somehow also integrate it with the upload through visual editor? (i.e. when you edit a page with visual editor, and you press Insert > Media > Upload ?

Ciencia Al Poder (talkcontribs)

AFAIK, it affects *all uploads* to the local wiki, no matter how they're uploaded (special:upload was an example), since this is the common repository for the wiki, and there's no way to choose between file repositories on upload

MavropaliasG (talkcontribs)

Thank you for the information , much appreciated.

Summary by Edward Chernenko

Answered 1) why the current IAM permissions are not broad, 2) how to mark all uploaded S3 objects as non-public.

148.252.132.218 (talkcontribs)

In my S3 bucket permissions I am trying to use the 4 "public access blocks". But when I do this MediaWiki cannot access the bucket and therefore the extension doesn't work.


Now given I am using an IAM policy I understand the "public access blocks" should not be an issue. I saw on an AWS video that an overly permissive IAM policy is considered public by AWS (I couldn't find this video as I was writing this), so I think that is what is happening here. This is surely overly-permissive. Has anyone had this issue?


The recommended IAM poilcy is:

"Action" "s3:*",

"Resource": "arn:aws:s3:::<something>/*",


"Action": [ "s3:Get*", "s3:List*" ],

"Resource": [ "arn:aws:s3:::<something>" ]


Would you agree the following permissions should work too (or am I missing needed permissions)?

"Action" [ "s3:ListObjects", "s3:GetObject", "s3:PutObject" ],

"Resource": "arn:aws:s3:::<something>/*",


"Action": [ "s3:GetObject", "s3:ListObjects" ],

"Resource": [ "arn:aws:s3:::<something>" ]


Thank you

Edward Chernenko (talkcontribs)
  1. It's not overly permissive. You are meant to use a separate S3 bucket for images, and if the S3 bucket contains only images, then there is no extra security added by only permitting certain operations. (I mean, with PutObject alone a malicious user can delete all files by overwriting them with zeroes, and you can't really restrict PutObject. While the very point of minimizing permissions is to reduce the possible impact of such malicious user's actions)
  2. Currently the extension also does CopyObject (when image is moved and/or deleted), DeleteObject and alike. You can find all API calls that it uses by searching for client-> in the code.
  3. There is no guarantee that additional API calls won't be used in the future versions. (that is, if some future change requires the use of "GetObjectVersion", then its addition won't be considered breaking anything, as the permission recommended in README is Get* - but it will break setups where only GetObject is allowed).
148.252.132.218 (talkcontribs)

Understood, and thank you Edward for the prompt response.


The problem I am having is that using those permissions means that I cannot add the extra security layer which is to enforce the "4 public access blocks". While I appreciate that the risk and potential impact is low, I tend to like to increase the restrictions as much as possible.


I will add those permissions used in client-> calls only for the moment.

Edward Chernenko (talkcontribs)
85.255.233.161 (talkcontribs)

Yes, I was referring to those Public Access Policies in your link.


I used to have the Public Access Policies blocked for all S3 buckets both at an individual level and at a global level, as a security setting. Using this extension with the recommended setup makes me have to take off the global block and the block at an individual level for the bucket that is being used by the extension.


I couldn't find clear documentation to understand this issue specifically with regards to IAM, I did listen to a YouTube video where I recall someone from AWS said something along the lines of "we don't like IAM roles that have * so they might be considered public". And when I remove the public access policies block, the extension works correctly. Therefore it seems to me that there is something in my setup that is being 'considered public'.


Other IAM and Bucket Policies that I am using are not creating this issue on other buckets. For example, if I turn on the public access policies block on the bucket being used by MediaWiki, Cloudfront will still serve images from that particular S3 bucket. So Cloudfront, ironically, has no 'public permissions' despite effectively making the entire bucket publicly readable via serving all of its contents through a subdomain...


On the link you provided, there is a section that says "The meaning of 'public'", which talks about the issue of granting * related to some elements in the Bucket Policies.


Its really a pain, but to keep the desired security setting of globally denying all Public Access Policies for S3 buckets I would like to find a solution.


I was planning on doing some testing with 'more restricted permissions' to see if that solves it. Maybe restricting the 'Actions' or maybe restricting access to my VPC only for instance?

Edward Chernenko (talkcontribs)

The only reason why your S3 buckets are considered "public" is because any visitor of non-private wiki can see/download the images that you have uploaded into it. This is by design (it's supposed to be public). It has nothing to do with IAM permissions that you mentioned above.

If you have a private wiki, then Extension:AWS marks all uploaded S3 objects with "private" ACL, so they are not accessible regardless of what you write in IAM.

In short, you are trying to solve a nonexistent problem. Since it has nothing to do with this extension, I can not provide further support on this matter.

85.255.233.161 (talkcontribs)

[Note: _dot_ and _colon_ are used below to circumvent the ⧼abusefilter-warning-linkspam⧽]


It is not related to the images being accessible by any non-private wiki viewer. As I mention above:


"Cloudfront, ironically, has no 'public permissions' despite effectively making the entire bucket publicly readable via serving all of its contents through a subdomain..."


Why is the fact that any visitor can view/download the images not conflicting with the S3 buckets not being considered "public"? Because of the following configuration:

  1. set $wgAWSBucketDomain = 'img_dot_mysite_dot_com'; as indicated in the extension's readme
  2. set Cloudfront to serve the images from my bucket to 'img_dot_mysite_dot_com'
  3. set a Bucket Policy in the S3 bucket to allow Cloudfront to serve the images as follows:


{

   "Version": "2008-10-17",

   "Id": "PolicyForCloudFrontPrivateContent",

   "Statement": [

       {

           "Sid": "1",

           "Effect": "Allow",

           "Principal": {

               "AWS": "arn:aws:iam::cloudfront:user/XXXXXXXXXXXXX"

           },

           "Action": "s3:GetObject",

           "Resource": "arn:aws:s3:::XXXXXXXXXXXXXXX/*"

       }

   ]

}


Here the buckets are not considered public with regards to the "4 public access blocks" for S3 but Cloudfront is granted access through the bucket policy. That read access is very well isolated from any other S3 type access, e.g. Put. Additionally it adds the extra layer of security of having the 4 public blocks enabled. This is our desired security setting.


And if I use the extension with "4 public access blocks", viewing the images in the wiki is not a problem. Any non-private wiki viewer can view the images given they are being served by AWS Cloudfront.


The problem comes when the EC2 server tries to write to the bucket. Why? it seems to me because the IAM policy is "considered public" by S3, and therefore I get an error such as the following:


Warning: doCreateInternal: S3Exception: Error executing "PutObject" on "https_colon_//s3_dot_amazonaws_dot_com/XXXXXXXXXX/thumb/XXXXXXXXXXXXXXXXX.jpg/120px-XXXXXXXXXXXXXXXXX.jpg"; AWS HTTP error: Client error: `PUT https_colon_//s3_dot_amazonaws_dot_com/XXXXXXXXXXXXXXXXX/thumb/XXXXXXXXXXXXXXXXX.jpg/120px-XXXXXXXXXXXXXXXXX.jpg` resulted in a `403 Forbidden` response: AccessDeniedAccess DeniedXXXXXX (truncated...) AccessDenied (client): Access Denied - AccessDeniedAccess Denied XXXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXXX/seo= in /var/www/html/w/extensions/AWS/s3/AmazonS3FileBackend.php on line 1117

Kris Ludwig (talkcontribs)

The IAM is not too broad.

The reason you are getting that error shouldn't be because the IAM policy is considered public by S3 but likely because the extension is trying to Put with ACL = public-read:

e.g. AmazonS3FileBackend.php:347: 'ACL' => $this->isSecure( $container ) ? 'private' : 'public-read',

Can you try the following in LocalSettings.php?

$wgFileBackends['s3']['privateWiki'] = true;

185.69.145.145 (talkcontribs)

Seems to work.


Thank you!

Has anyone gotten this to work with $wgUseSharedUploads?

2
T0lk (talkcontribs)

I have a wiki-family setup and I installed this extension on my shared media repository (similar to commons.wikimedia.org), however, none of the other wiki's (for example, en.wikipedia, fa.wikipedia) could generate thumbnails on demand anymore. Images and thumbnails that existed before the migration loaded from s3 without a problem. Thumbnails generate in s3 when the media repository wiki is the one making the request, just not the other wiki's in the family make the request. These are the shared upload settings I'm using:


$wgUseSharedUploads = true;

//$wgSharedUploadPath = 'https://example.com/images'; #old setting

$wgSharedUploadPath = 'https://examplebucket.s3.amazonaws.com'; #new setting

$wgHashedSharedUploadDirectory = true;

$wgSharedUploadDirectory = "images";

$wgFetchCommonsDescriptions = true;

$wgSharedUploadDBname = 'example_wikidbcommons';

$wgSharedUploadDBprefix = '';

$wgRepositoryBaseUrl = "https://example.com/File:";


This is the equivalent code using wgForeignFileRepos

$wgForeignFileRepos[] = [

        'class' => 'ForeignDBRepo',

        'name' => 'mywiki',

        //'url' => "https://example.com/images", #old

        'url' => "https://examplebucket.s3.amazonaws.com", #new

        'directory' => 'images',

        'hashLevels' => 2, // This must be the same for the other family member

        'dbType' => $wgDBtype,

        'dbServer' => $wgDBserver,

        'dbUser' => $wgDBuser,

        'dbPassword' => $wgDBpassword,

        'dbFlags' => DBO_DEFAULT,

        'dbName' => 'example_wikidbcommons',

        'tablePrefix' => '',

        'hasSharedCache' => false,

        'descBaseUrl' => 'https://example.com/File:',

        'fetchDescription' => true

];

I'm not sure what else I should be changing, $wgSharedUploadPath says "Thumbnails will also be looked for and generated in this directory."

T0lk (talkcontribs)

I was able to resolve the problem. The solution is to modify the $wgForeignFileRepos I posted above as follows:

Change: 'name' => 'local',

Add: 'backend' => 'AmazonS3',

Reply to "Has anyone gotten this to work with $wgUseSharedUploads?"

Amazon EFS drive and mount it to $wgUploadDirectory.

4
Donxello (talkcontribs)

you mention " Instead of using Amazon S3 (and this extension), you can create an Amazon EFS drive and mount it to $wgUploadDirectory. It's recommended for small wikis".

I have created an EFS drive and mounted it to my instance where the wikis is being run.

But how can I make this EFS drive the uploadDirectory in the LocalSettings? I have a DNS for this EFS but that's it?

Thanks Tristan

Edward Chernenko (talkcontribs)

Just mount it to /path/to/your/mediawiki/images, that's it.

Donxello (talkcontribs)

This time it worked, the other day not.

We run several WIKIs and your extension is great but will you maintain it or what is the plan in the future?

Anyway, highly appreciate your work so far!

Un abrazo.

Edward Chernenko (talkcontribs)

1) If you use EFS, you don't need this extension. Mounting EFS works with MediaWiki out-of-the-box. 2) No particular plans (I don't use it on my own wikis). But feature-wise, this extension has 99% of what's needed, plus it's covered with automated tests and all.

Reply to "Amazon EFS drive and mount it to $wgUploadDirectory."

JSON for IAM Policy update

2
HyverDev (talkcontribs)

Been looking at this and it seems the JSON for the IAM role isn't correct anymore. Maybe amazon changed their grammar policy since the original entry this is what I have got to:

{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::<something>/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": "arn:aws:s3:::<something>"
        }
    ]
}
Edward Chernenko (talkcontribs)

Nothing changed. The example in the article was always supposed to be inserted into the Statement array. This is not a "replace IAM inline policy with this" example, because IAM inline policy may already exist (and contain other rules that shouldn't be overwritten).

Reply to "JSON for IAM Policy update"

Can an existing bucket be updated with new images from an /images folder?

5
164.144.55.1 (talkcontribs)

We have been using this extension with success in our non-production wiki, although production still uses the original /images folder on its server filesystem. Now that we are preparing to promote our non-prod system to production, we need a way to trigger this extension's "copy files from /images to s3 bucket" process with the new images that have been uploaded into our production system over the last few weeks. Ideally, it would be nice if it recognized images that already exist in the bucket and only copies new ones over there, but we would be fine with clearing the bucket and triggering another full load, similar to what happened the first time this extension was installed/ran.


I've replaced the old /images folder with a new one containing a few new images, but the AWS extension doesn't seem to be noticing them or uploading them to s3. Is this possible to trigger manually?


Thank you for maintaining this extension! We're using RDS now as well, and are very excited to have stateful data removed from our server to take advantage of better failover/autoscaling.

Edward Chernenko (talkcontribs)

You should copy these files from local directory to S3 bucket manually, e.g. via "aws s3 sync" command from AWS CLI. When this extension is enabled, the local directory is completely ignored by MediaWiki, so images in local directory won't be detected or automatically moved.

164.144.55.1 (talkcontribs)

Thanks! I can try that.


Quick follow up question:

Will this handle thumbnails properly? In the local /images dir, thumbnail file paths appear to be separated into numbered subfolders with more nested numbered subfolders like this

thumb
| - 1
| -- 10
| --- chart.jpg
| ---- 121px-chart.jpg
| ---- 161px-chart.jpg
| ---- 180px-chart.jpg
| ---- 300px-chart.jpg
| --- cat.jpg
| ---- 121px-cat.jpg
| ---- 161px-cat.jpg
| ---- 180px-cat.jpg
| ---- 300px-cat.jpg

Does the extension perhaps rework the thumbnail creation script to create these thumbnails from images after they end up in s3? (In which case I wouldn't need to worry about uploading thumbnails at all, and I would just aws s3 sync the original images and let the thumbnail creation process happen on its own).

164.144.55.1 (talkcontribs)

Sorry about that formatting. Visual editor didn't maintain my indents in those code block lines.

Edward Chernenko (talkcontribs)

You should set

$wgAWSRepoHashLevels = 2; $wgAWSRepoDeletedHashLevels = 3;

to maintain the same naming (with 1/10/cat.jpg instead of just cat.jpg) as in the local directory.

Reply to "Can an existing bucket be updated with new images from an /images folder?"

Proposal to archive this extension

2
Summary by Edward Chernenko

Old topic, related to previous version of this extension. As of 10 Oct 2018, the extension is maintained again.

MarcoAurelio (talkcontribs)
Jkmartindale (talkcontribs)

Might as well

Is AWS still a valid extension?

6
Summary by Edward Chernenko

topic archived as non-actual: as of 10 Oct 2018, extension is maintained

Ahancie (talkcontribs)

The latest topics in this discussion are at least a year old.  Does that mean that there is a better solution than using this extension?

Edward Chernenko (talkcontribs)

Do you only need the Amazon S3 part? (to store images in S3)

  1. I have a stable fork of S3-related functionality of Extension:AWS. It has been used in production for more than a year. If you want, I can send you the current state of the code.
  2. I did send patches to Extension:AWS, but some very important bugfixes which make it stable (e.g. https://gerrit.wikimedia.org/r/#/c/255534/ ) are sadly not yet merged into Extension:AWS.
  3. Unfortunately, the maintainer seems to be unresponsive, and I can't contact him.

I don't know what to do here. Edward Chernenko (talk) 00:55, 29 December 2016 (UTC)

Edward Chernenko (talkcontribs)
Ahancie (talkcontribs)

Thank you for sharing this with me.

Florianschmidtwelzow (talkcontribs)

@Edward Chernenko Do you probably want to publish the extension/fork on mediawiki.org and probably also host it in gerrit (which would bringt the benefit, that core-developers can look for it when removing deprecated code, which may be used by your extension :)

Edward Chernenko (talkcontribs)

Sure, why not.

Will create the page for it (and move it to Gerrit) in a week or so.

Invalid queue name for Semantic Jobs

1
Summary by Edward Chernenko

archived, no longer actual: current version of extension (as of 10 Oct 2018) no longer provides SQS features (only S3 features)

Karlpietsch (talkcontribs)

If you have semantic media wiki enabled there is a queue named SMW\UpdateJob the backslash is invalid in SQS queue names so you will need to change the code to normalise the queue names and replace the backslash with something else

MWException from line 116 of /var/www/html/w/extensions/AWS/sqs/JobQueueAmazonSqs.php: Amazon SQS error: Error: Can only include alphanumeric characters, hyphens, or underscores. 1 to 80 in length

Exception & cannot create directory

6
Summary by Edward Chernenko

archived, not applicable to the current version of extension (as of 10 Oct 2018)

KraizeeM (talkcontribs)

On all pages of the wiki when enabling the AWS extension I get this error at the very bottom of the page;

Exception encountered, of type "Aws\Common\Exception\InvalidArgumentException"

And then when attempting to upload I get this;

Could not create directory ‘mwstore://AmazonS3/local-public/2/26’.

My config is in this format;

require_once("$IP/extensions/AWS/AWS.php");

$wgFileBackends['s3'] = array(

    'name'        => 'AmazonS3',

    'class'       => 'AmazonS3FileBackend',

    'lockManager' => 'nullLockManager',

    'awsKey'      => '****',

    'awsSecret'   => '****',

    'awsRegion'   => 'eu-west-1',

    'containerPaths' => array(

        'wiki_id-local-public'  => 'https://publicbucketname.s3-website-eu-west-1.amazonaws.com/',

        'wiki_id-local-thumb'   => 'https://thumbbucketname.s3-website-eu-west-1.amazonaws.com/',

        'wiki_id-local-deleted' => 'https://deletedbucketname.s3-website-eu-west-1.amazonaws.com/',

        'wiki_id-local-temp'    => 'https://tempbucketname.s3-website-eu-west-1.amazonaws.com/',

    )

);

$wgLocalFileRepo = array(

    'class'           => 'LocalRepo',

    'name'            => 'local',

    'backend'         => 'AmazonS3',

    'scriptDirUrl'    => $wgScriptPath,

    'scriptExtension' => $wgScriptExtension,

    'url'             => $wgScriptPath . '/img_auth.php',

    'zone'            => array(

        'public'  => array( 'container' => 'public' ),

        'thumb'   => array( 'container' => 'thumb' ),

        'temp'    => array( 'container' => 'temp' ),

        'deleted' => array( 'container' => 'deleted' )

    )

);

Karlpietsch (talkcontribs)

try this:

$wgAWSRegion = 'eu-west-1';

$wgFileBackends['s3']['containerPaths'] = array(
    'wiki_id-local-public' => 'publicbucketname,
    'wiki_id-local-thumb' => 'thumbbucketname',
    'wiki_id-local-deleted' => 'deletedbucketname',
    'wiki_id-local-temp' => 'tempbucketname'
);

// Make MediaWiki use Amazon S3 for file storage.
$wgLocalFileRepo = array (
    'class'             => 'LocalRepo',
    'name'              => 'local',
    'backend'           => 'AmazonS3',
    'scriptDirUrl'      => $wgScriptPath,
    'scriptExtension'   => $wgScriptExtension,
    'url'               => $wgScriptPath . '/img_auth.php',
    'zones'             => array(
        'public'  => array( 'url' => 'http://publicbucketname.s3-eu-west-1.amazonaws.com/' ),
        'thumb'   => array( 'url' => 'http://thumbbucketname.s3-eu-west-1.amazonaws.com/' ),
        'temp'    => array( 'url' => 'http://tempbucketname.s3-eu-west-1.amazonaws.com/' ),
        'deleted' => array( 'url' => 'http://deletedbucketname.s3-eu-west-1.amazonaws.com/' )
    )
);
KraizeeM (talkcontribs)

Where should the key/secret go with that?

Karlpietsch (talkcontribs)

just add

$wgAWSCredentials = array(

   'key' => 'yourkey',
   'secret' => 'yoursecret'

);

KraizeeM (talkcontribs)

If I change the config to that I get this error;

Warning: Cannot modify header information - headers already sent by (output started at /var/www/html/wiki/includes/OutputPage.php:2322) in /var/www/html/wiki/includes/WebResponse.php on line 37

KraizeeM (talkcontribs)