Toolserver:Code snippets

Calculate replag
NB: This actually shows how long ago the last edit to the wiki was. If the wiki hasn't been edited for a while, it might appear the replag is higher than it really is, because the last edit was some time ago.

Select a server
See Queries for a description of the "sql" script.

Copying user database
You can dump your user database from sql and import it to sql-s1, 2 and 3 using: mysqldump -h sql u_username | mysql -h sql-s1 u_username

You can also produce a database dump during the process (can be useful to restore the database in case of problems): mysqldump -h sql u_username | tee u_username.sql | mysql -h sql-s1 u_username

Kill processes
To get the process ID (pid), see viewing processes. Once you have the pid, you can kill it: kill pid

If the process doesn't exit, kill it harder: kill -9 pid

But be aware that kill -9 doesn't give the process any opportunity to clean up.

If the process is running in the foreground (i.e. from the shell), you can also kill it with CTRL-C. If that doesn't work, try CTRL-\ (which like kill -9, gives the process no opportunity to clean up).

To kill all your processes at once, including any and all ssh processes: pkill -u username

Calculate size of home
du -hs ~

Find folders taking up space
du --max-depth=1 | sort -k 1 -nr | head -n 10

Bash
These scripts and settings should be placed in your .bash_profile file. After updating the file, you need to source it in order for the changes to become active ($ source ~/.bash_profile).

The env command gives a list of variables with their results for your current environment.

Setting a default editor
The default editor on nightshade is currently nano</tt>. If you want to change your personal default editor (for example, to use joe</tt>), use: export EDITOR=joe in ~/.bash_profile

MySQL queries
If you primarily deal with running queries on one particular database, there are bash scripts that can make life easier. The "sql" script is great for selecting the appropriate server, however it requires a lot of typing for commonly needed functions. And when a query finishes, you don't know how many results were output.

The "query" script runs an .sql file on the particular database and then prints the number of lines returned from the query.

The "gquery" scripts run an .sql file on the particular database, prints the number of lines returned from the query, and then gzips the output.

Both scripts are invoked using the script name followed by the file name (without an extension), for example: query articles-ns-0

Symlinks
If you are constantly needing to make files public from a particular directory, it's possible to create a "mkpub" shortcut to symlink the files to your public directory (public_html/).

Simple regexp
This detects all IPv4 addresses made with 1 to 3 digits (including extra leading zeroes) per component: Copped from http://www.geekzilla.co.uk/View0CBFD9A7-621D-4B0C-9554-91FD48AADC77.htm
 * (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)</tt>

Note that this will unexpectedly match "000.000.000.000" or "010.01.001.1" (which are NOT usable as valid hostnames in IPv4 address format ; this means that such names will have to be resolved into some unknown address via a DNS hostname resolution query). Note that such pseudo-IP address formats are currently being used as valid user account names in Wikimedia sites, but NOT used by anonymous users just identified by their IPv4 address.

Improved regexp
This is a more restrictive version which detects only IPv4 addresses with a valid decimal format, without any extra leading zeroes in components: Note that extra parentheses could be used here to group some alternatives whose leading digit is "2", but this offers no benefit in terms of total expression length, and this would renumber the found subexpressions from \1, \2, \3, \4 above, into \1, \3, \5, \7 below: If you don't need the four separate components, and your regexp engine supports expression counters, you may also use this equivalent compact regexp where \2 is matched 3 times (once for each of the three first components), and \4 matches the fourth component:
 * (0|1[0-9]?[0-9]?|2|2[0-4][0-9]?|25[0-5]?|2[6-9]|[3-9][0-9]?)\.(0|1[0-9]?[0-9]?|2|2[0-4][0-9]?|25[0-5]?|2[6-9]|[3-9][0-9]?)\.(0|1[0-9]?[0-9]?|2|2[0-4][0-9]?|25[0-5]?|2[6-9]|[3-9][0-9]?)\.(0|1[0-9]?[0-9]?|2|2[0-4][0-9]?|25[0-5]?|2[6-9]|[3-9][0-9]?)</tt>
 * (0|1[0-9]?[0-9]?|2([0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)\.(0|1[0-9]?[0-9]?|2([0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)\.(0|1[0-9]?[0-9]?|2([0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)\.(0|1[0-9]?[0-9]?|2([0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)</tt>
 * ((0|1[0-9]{0,2}|2([0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)\.){3}(0|1[0-9]{0,2}|2([0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)</tt>

IPv6
The valid IPv6 address format is less restricted than the one for IPv4, because its use within URL’s is possible ONLY within delimiting [square brackets], with which such host address will never be confused with a valid DNS hostname. For this reason, extra leading zeroes are accepted (provided that each component doesn’t have more than 4 hex digits). Additionally, the letter case of hex digits is not significant.

Here is a simple deterministic regexp matching only unabbreviated IPv6 addresses:
 * [0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){7}</tt>

Such unabbreviated address may still be canonicalized into uppercase and in its shortest form (without unnecessary leading zeroes), in which case the following regexp will only match this shortest unabbreviated form (which should be the one used for anonymous IPv6 users on MediaWiki sites where the first character of user account names is forced to uppercase):
 * (0|[1-9A-F][0-9A-F]{0,3})(:(0|[1-9A-F][0-9A-F]{0,3})){7}</tt>

However, the complexity of the standard notation is that one or more successive 16-bit hex components within the eight (separated by colons) may be dropped in the abbreviated notation (provided these components are all zero), by accepting that ONLY ONE of the specified components being empty (with such syntax, the total number of colons may be between 2 and 8, instead of just 7, and there may be between 0 and 8 hexadecimal 16-bit numbers) ; this will result in one possible occurence of double colons "::".

A deterministic regexp to match all abbreviated and unabbreviated IPv6 addresses follows (if you want to understand how it is structured, look at the wiki code source):
 * (::([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,6})?|[0-9A-Fa-f]{1,4}:(:([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,5})?|[0-9A-Fa-f]{1,4}:(:([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,4})?|[0-9A-Fa-f]{1,4}:(:([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,3})?|[0-9A-Fa-f]{1,4}:(:([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,2})?|[0-9A-Fa-f]{1,4}:(:([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4})?)?|[0-9A-Fa-f]{1,4}:(:([0-9A-Fa-f]{1,4})?|[0-9A-Fa-f]{1,4}:(:|[0-9A-Fa-f]{1,4}))))))))</tt>

For example, this regexp will accept "::", or "::0", or "0::", or "0::0", or the shortest unabbreviated format "0:0:0:0:0:0:0:0", or the full format "0000:0000:0000:0000:0000:0000:0000:0000" as they are all equivalent IPv6 addresses with a valid syntax. The strings "0:0:0:0:0:0:0:0::", or "00000000", or "::0::", or "0::0::0" will not be accepted as they are not in a valid IPv6 address format (too many colons, or missing colons with too many digits in a component, or ambiguous notation in the last two cases).

It is still preferable to canonicalize all abbreviated IPv6 addresses into the shortest unabbreviated form and with a single lettercase as shown above, notably if the address is used as the default user name (or talk page name) for contributing users not connected with their own named account, and that will be recorded in edit histories. (More information would be welcome from MediaWiki developers about which canonical format they will use as the default user name, for IPv6 users editing pages without being logged on, or for the private webserver logs.)

'Unit' testing
If you need to check various pieces of data against multiple tests and see what data passes each test a quick and dirty way to do that is like so

Dealing with UTF-8
One problem you might run into, especially when dealing with non-english wikis, is that page titles can contain utf8. Perl uses an internal flag to denote whether or not a string is utf8 or not. Concatenating or processing strings with mixed flags can often lead to strange side effects such as double encoding or getting mangled. Doing the following can help avoid most, if not all, problems: