User:Bawolff/review/Newsletter

From mediawiki.org

phab:T110170 This review is of 60404a97575e8. Any line numbers refer to that revision.

This is review of Newsletter extension for conformance with MediaWiki standards, performance, and security. I'm not reviewing anything to do with User requirements, or architecture.

I've including things here that are suggestions to consider as well as things that are more critical. Generally stuff involving db or performance is more important than anything I say about the UI or any time I say "Would be nice". Stuff that's really important is marked (important).

Most of my comments are pretty brief, and don't really explain why or how. If you're confused by any of them, don't understand them, or have any other questions, don't hesitate to ask.


General[edit]

  • The code mostly follows mediawiki coding conventions, and is generally easy to read :)
  • What happens if the newsletter owner leaves (or disappears suddenly, has account blocked, etc). There perhaps needs to be someway to hand off ownership, possibly without the original owner's corporation.
  • Logging should probably be added for most events.
  • (important) How does someone delete a newsletter. How does one clean up after a vandal making a bunch of bogus newsletters
  • (very minor). Most of the php files have execute permissions on them. They probably should not.

Newsletter.php[edit]

  • My understanding is that this is a complete rewrite from the code Siebrand wrote. In that case the * @copyright Copyright Š 2013 Siebrand Mazeland should be changed. Yes Done
  • There is no extension.json file. Yes Done
  • Extension credits should include license. Yes Done

SQL[edit]

Preparing patches - Tinaj1234 014:16, 27 Aug 2015 (UTC)

  • Generally MediaWiki code uses column names that start with a short prefix of the table name, even for foreign keys in the table, which newsletter does not do. However, I don't think this is a hard rule for extensions (?)
  • The table defintions use REFERENCES keyword. Most of the MediaWiki mysql schema avoids foreign key constraints. I believe this is intentional, although I'm not sure the exact reason. You should double check with someone about using them.
  • (important) nl_newsletters needs a secondary index on nl_name - Yes Done
  • (important) nl_newsletters needs a secondary index on nl_owner_id - Yes Done
  • (important) nl_publishers primary index should be in the other order PRIMARY KEY (publisher_id, newsletter_id) not PRIMARY KEY (newsletter_id, publisher_id) - Yes Done
  • In nl_newsletters nl_desc VARCHAR (256) is a bit odd. 255 is the max varchar size where the overhead is only 1 byte. If you're going to have the field length be around 256 bytes long, you might as well make it 255. However I think nl_desc varbinary(767) would be a more appropriate size. If you don't care about having an index on the field (which you shouldn't need), you can make the size even bigger than 767 bytes. Remember also that in some languages, 1 character = 4 bytes. - Yes Done

Newsletter.hooks.php[edit]

onEchoGetDefaultNotifiedUsers[edit]

  • Queries to user table should be done in batches, as opposed to 1 by 1. Especially if there are a lot of subscribers.
  • How does this function scale. Will there ever be more than say 20,000 subscribers to a newsletter? More than 100,000 subscribers? Might have to put things like this in the job queue if the number of subscribers is large.

modules[edit]

  • console.log( data ). I think this should be mw.log. Also, i think this could be an error on old browsers without native console. Yes Done
  • You list a lot of dependencies:
                'jquery.cookie',
                'jquery.tabIndex',
                'mediawiki.jqueryMsg',
                'mediawiki.api',
                'jquery.confirmable'

As far as I can tell, only mediawiki.api is needed. Yes Done

api[edit]

  • All the module names end in the word api, which seems superfluous (none of the other modules use this naming convention).
  • (important) Need parameter self-documentation (Usually in i18n file I think)
  • (important) Needs to validate parameters
  • (important) Needs to be limited to post requests only Yes Done
  • (important) Needs to require a CSRF token - Yes Done
  • (important) Should limit to logged in users. Right now anons can subscribe to things, which is bad.
  • (important) newsletter_id has the type of string instead of integer (?) Other parameters have incorrect types too.
  • (important) No response given for success. It should return success to user.
  • Its very unclear how somebody would discover what id a newsletter has.
    • Would be nice to have an api module that could enumerate all newsletters, including details like id.

Special pages[edit]

  • It would be nice (Although possibly not critical) if all these worked without js.
  • in getFieldNames() $headers[$field] = $this->msg( "newsletter-header-$property" )->text(); - Anytime you programmaticly create a message name, you should include the message name in a comment, so that someone can find it using grep. Sometimes its important to be able to find out everywhere a specific message is used.
  • It might be nice to have a navigation header at the top possibly, linking together all the different newsletter special pages. In particular, Special:NewsletterManage should definitely link to Special:NewsletterCreate somewhere
  • It would be nice if all these special pages had their own section in Special:SpecialPages
  • In HTMLForm, if you have an i18n message, use label-message, not label

SpecialNewsletters.php[edit]

  • No submit button. Most users don't expect that adjusting a radio button would cause a subscription action without some other indication of success (Although the subscriber count does help indicate that it does). I think a submit button would be more inline with what users expect. However, this is perhaps something that it would be good to get wmf design team input on (Or whatever the equiv is post-reorg).
    • It would also be nice if newsletters you are subscribed to were highlighted somewhere (E.g. the table row becomes light blue or something)
    • I find putting the subscriber count in a disabled text box kind of odd. Why not just a normal table cell?
  • (important) Lots of static fields. For most of these, I think a private field would make more sense. Especially things like $subscribedNewsletterId, which depends on the specific instance of the object (As it depends on the user it was constructed with), and has the potential (albeit a very low potential given how MW usually works, but the lifetime of SpecialPage objects in a normal MW request is not something you should rely on) of becoming out of sync. (This applies to other special pages too)
    • (important) getSubscribedNewsletters, should be a public function, not a static one. It should probably simply return its data, instead of storing it in member fields of itself


                $resl = $dbr->select(
                        'nl_subscriptions',
                        array( 'newsletter_id' ),
                        array(),
                        __METHOD__
                );

                foreach( $resl as $row ){
                        $result = $dbr->selectRowCount(
                                'nl_subscriptions',
                                array(),
                                array( 'newsletter_id' => $row->newsletter_id ),
                                __METHOD__
                        );
                        self::$allSubscribedNewsletterId[] = $row->newsletter_id;
                        self::$subscriberCount[$row->newsletter_id] = $result;
                }
  • Two things about the previous code block:
    • Provided you're querying the total subscribers for all newsletters, it should be done as a single query, e.g. SELECT newsletter_id, count(*) from nl_subscriptions group by newsletter_id; (The separate queries might make more sense if you had a LIMIT clause to not do a full count of subscribers)
    • This has poor scalability. It requires a full table scan. This is only ok if the sum of the subscribers to all newsletters is small (Say less than 5000)
    • Doing it this way, doesn't take into account paging. You have to get the subscriber count of all newsletters, even if you're only displaying a subset of newsletters. It would be better to do this queries from the Pager code.
                                $res = $dbr->select(
                                        'nl_newsletters',
                                        array( 'nl_main_page_id' ),
                                        array( 'nl_name' => $value ),
                                        __METHOD__
                                );

                                $mainPageId = '';
                                foreach( $res as $row ) {
                                        $mainPageId = $row->nl_main_page_id;
                                }
  • Two things for this block of code:
    • If you're only interested in a single field, use $dbr->selectField instead.
    • Queries like this should really be batched, or better yet, done directly from getQueryInfo() using a join
return '<a href="' . $url . '">'. $value . '</a>';
  • Use Linker class instead. I would also suggest in the case where the main page id isn't a valid page, to simply not link instead of linking to the current page.
in_array( $this->mCurrentRow->nl_id, SpecialNewsletters::$allSubscribedNewsletterId ) ?
                                                SpecialNewsletters::$subscriberCount[$this->mCurrentRow->nl_id] : 0,
  • in_array is somewhat inefficient on large arrays. Not critical, but it would be better to do isset( SpecialNewsletters::$subscriberCount[$this->mCurrentRow->nl_id] ) ? SpecialNewsletters::$subscriberCount[$this->mCurrentRow->nl_id] : 0
  • (important) line 157, 147: $this->msg( 'newsletter-subscribe-button-label' ); This should have a ->parse() on the end. Additionally, you may want to consider using the <label> html element here.
        function isFieldSortable( $field ) {
                return false;
        }
  • Seems like nl_name should be sortable (So people can go in reverse order).
  • Not critical, as its mostly understandable from context, but it would be nice if each function had code comment blocks with @param...

SpecialNewsletterManage.php[edit]

  • The degenerate case where there are no newsletters to manage has a bunch of empty drop down boxes. This potentially could have a nicer ui in that case (e.g. An error message of the form, you can't add a publisher as you own no newsletters).
                $res = $dbr->select(
                        'nl_publishers',
                        array( 'newsletter_id' ),
                        array( 'publisher_id' => $this->getUser()->getId() ),
                        __METHOD__
                );

                foreach( $res as $row ) {
                        $newsletterIds[$row->newsletter_id] = $row->newsletter_id;
                }

                foreach( $newsletterIds as $value ) {
                        $resl = $dbr->select(
                                'nl_newsletters',
                                array( 'nl_name', 'nl_id' ),
                                array( 'nl_id' => $value ),
                                __METHOD__
                        );

                        foreach ( $resl as $row ) {
                                $newsletterNames[$row->nl_name] = $row->nl_id;
                        }
                }
  • These two queries should be combined using an inner join between the nl_publishers table and the nl_newsletters table. (As a general rule, if you're creating a query in a for loop, the query can likely be combined with another query, or failing that should perhaps be batched together)
  • (minor) line 40 $output->returnToMain(); - normally a return to main link is given after a user has completed an action, and is no longer likely to be on the special page. I'm not sure if the link belongs here.
                                $dbr = wfGetDB( DB_SLAVE );
                                $issueCount = $dbr->selectRowCount(
                                        'nl_issues',
                                        array( 'issue_id' ),
                                        array( 'issue_newsletter_id' => $newsletterId ),
                                        __METHOD__,
                                        array()
                                );
                                //inserting to database
                                $dbw = wfGetDB( DB_MASTER );
                                $rowData = array(
                                        'issue_id' => $issueCount + 1,
                                        'issue_page_id' => $pageId,
                                        'issue_newsletter_id' => $newsletterId,
                                        'issue_publisher_id' => $formData['publisher']
                                );
  • Multiple things with this block:
    • line 149: selectRowCount You should really only use selectRowCount if you have a LIMIT statement limiting the number of rows. Otherwise you should use count(*) directly [imho]. However in this particular case, I'm not sure you actually want to count rows (See below).
    • (important)Why not just make issue_id auto-increment? (But if there was some reason not to do that, you should use MAX( 'issue_id' ) instead of counting, both because that's more what you're after, and depending on indexes it's a lot more efficient)
    • The code as written has race conditions. Anytime you're inserting stuff into the database based on stuff already in the db, use $dbw for both the select, and the insert. Ideally you'd put the two things in a transaction ($dbw->begin()), but if you use auto increment in issue_id these concerns become unnecessary.
  • Line 146 isset( $newsletterId ). The isset should be a couple lines before at $newsletterId = $formData['issue-newsletter']. Iif $formData['issue-newsletter'] is not set, then assigning to the local variable would trigger a warning.
  • Line 167 (selecting nl_name). Should use $dbr->selectField() instead of select()
  • line 139 static function onSubmitIssue( $formData ) {. No need for this to be static. Should be fine as a public method. If its public instead of static, you can also stop using all the RequestContext::getMain(), which should be avoided.
  • (Tad nitpicky): !empty( $formData['newsletter-name'] ). Its generally better to use isset instead of !empty, since empty suppresses all errors, including fatal ones, so typos can lead to really confusing errors.
                                try {
                                        $dbww->insert('nl_publishers', $rowData, __METHOD__);
                                        RequestContext::getMain()->getOutput()->addWikiMsg( 'newsletter-new-publisher-confirmation' );

                                        return true;
                                } catch ( DBQueryError $e ) {
                                        return RequestContext::getMain()->msg( 'newsletter-invalid-username-error' );
                                }
  • I'm confused under what circumstances this error would happen. Before inserting to the database, you should verify that the given username is valid, and exists. An invalid username might trigger a fatal error on the $user->getId() line (if $user is false).
  • (important) Validation of form input seems to be somewhat missing. The select fields are automatically validated by HTMLForm, but the free text fields (and probably the hidden fields) should be validated before use. See also the validation-callback option of HTMLForm.
  • How does the table at the top scale? How likely is it that there will be say 500 newsletters? Perhaps only the newsletters where the user is a publisher or owner should be shown
  • There should be a rowspan attribute on cells in the first row of the table instead of just being blank. (I'm not exactly sure how easy this is to do with the HTMLForm class. Probably would involve going through results first by overriding preprocessResults(), to get how many rows each newsletter takes, and then accessing that in getCellAttrs)
  • From a UX prespective, the action column is kind of confusing for user's who are not owners of any newsletter that has other publishers
        function getDefaultSort() {
                return 'newsletter_id';
        }
  • You should order on a coumn that's unique (so paging works. Although the page's layout kind of assumes table doesn't page). I think you might need to override getIndexField() additionally. Not sure.
  • (important) Can blocked users still send out newsletters? They probably should not be able to
  • (very important) line 294 return $user->getName();, line 289 return $newsletterName; Run through htmlspecialchars() before outputting.
  • line 322 value' => 'Remove', - needs i18n
  • $this->msg( 'newsletter-owner-radiobutton-label' ); and $this->msg( 'newsletter-publisher-radiobutton-label' ) need ->parse() at the end.
                $info = array(
                        'tables' => array( 'nl_publishers' ),
                        'fields' => array(
                                'newsletter_id',
                                'publisher_id'
                        )
                );
  • This should be a more complex query instead of having it spread out over multiple queries. The $newsletterOwners stuff right after it (as well as the query on line 276 for the newsletter_id field), should be accomplished either by inner join to nl_newsletters, or as a separate query in preprocessResults method. If its a separate query, it should probably specify a where clause equal to all the newsletter_id's fetched in the main query instead of just doing all newsletters (for the case where paging takes place and we don't care about all newsletters). However I'd recommend the inner join. I also am unsure of the purpose of the array( 'DISTINCT' ) on line 257, as the results should already be unique afaict.
    • Similar, User::newFromId internally generates a new query on each row. If the number of rows is small that's fine, otherwise they should probably be batched in preprocessResults (or as an inner join and use User::newFromRow).
  • No way to delete a newsletter from the UI. This is probably something that should be do-able from the UI and not just the api.

SpecialNewsletterCreate.php[edit]

  • No error handling if newsletter main page doesn't exist.
  • After creating a newsletter, you're more likely going to want to go to Special:NewsletterManage, than you would want to return to main page.
  • 50 bytes might be a bit on the small size for newsletter name. Better to err on the side of caution. Keep in mind that 1 character = 4 bytes in some languages, so that's only about 12 letters in some languages.
  • No maxlength/byte limitting set on the name field or the description field.
  • (minor) If you have a value selected for the frequency select box, the freeform text box should disappear. See also HTMLSelectAndOtherField class
  • The frequency field uses maxlength, but should use jquery.byteLimit, as maxlength is measured in number of UTF-16 bytes, where db uses number of UTF-8 bytes.
  • (important) Putting the current user id as a hidden field seems unnecessary. Better just to get it directly when you need it.
  • (important) validate newsletter page name before using it. If someone puts in invalid page name (e.g. "I <3 newsletters"), causes fatal.
  • There should probably be a user right you need to have to create a newsletter.
  • line 89 'nl_name' => $formData['name'] (and similar). Any freeform length limited data coming from the user needs to be put through $wgContLang->truncate() [In case its too long, avoid dangling unicode code points].
  • Ideally you would also check in the validation phase if newsletter name already exists, so htmlform can handle the error on still keep all the other form fields (However, need both validation check and check at time of insert due to potential race condition)
  • line 104: use selectField() instead.
  • Are you aware that by using page_id for the issue main page, and newsletter main page that if someone moves the page to a new title, this will automatically update (probably a good thing all in all). However if someone deletes and undelete's the page, the reference will be lost (probably a bad thing). Its unclear, but it might be better to store that as a (namespace, page_title) pair.
  • line 82: needs i18n.