Analytics/Onboarding

=Background= You will need lots of accounts, memberships and other secret keys to become a real productive member of the Analytics team. Here's an overview of things you should do in the first week.

= Office = https://office.wikimedia.org/wiki/Getting_Started_With_User_Info_and_Talk_Pages =Servers= All Analytics team-members should have access requested to the following machines: Talk with Andrew Otto about how to submit your private key. You would likely need to proxy your ssh connection from a know machine to access
 * stat1001
 * stat1002
 * stat1003?
 * bast1001.wikimedia.org
 * bastion.wmflabs.org
 * Hadoop cluster
 * vanadium

some of the hosts above. You should not use the same ssh key for labs (testing) and stat1 machines (production).

The easiest would be to ask some team member for its .ssh/config file and get the proxy setup.

Please have in mind that different processes are required to access production machines (stat1) and testing machines (labs)

Sample ssh config
Sample ssh config:

=RT=
 * 1) RT is the tool to deal with operations tickets. You need it for thing such us getting access to all the labs machines mentioned above.
 * 2) Ask mutante (Daniel Zahn) on #wikimedia-operations for instructions to request RT account

= LDAP = Some tools require LDAP authentication, talk to techsupport@wikimedia.org to be added to "wmf" LDAP group

=Mediawiki=
 * 1) Create account: https://www.mediawiki.org/w/index.php?title=Special:UserLogin&returnto=Analytics&type=signup
 * 2) Log in

=Labs=
 * 1) Labs is a cluster of virtual machines
 * 2) Create account: https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup&returnto=Main+Page
 * 3) Log in
 * 4) You need to set up ssh keys: []
 * 5) Upload your public SSH key: https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack
 * 6) Please have in mind that labs is a testing environment thus this ssh key should only be used in testing, if you need access to machines in the production cluster your ssh key should be different.
 * 7) Configure your ~/.ssh/config with bastion hosts
 * 8) Get familiar with the labs environment, how to use the labs interface to spin up nodes, remove nodes, etc

=Gerrit=
 * 1) Gerrit is the code review workflow we use, build on top of git
 * 2) Log in to Gerrit using your Labs credentials.
 * 3) To verify everything works, clone a repo repo from https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytics using SSH.
 * 4) Take a look at how to deal with gerrit in different work scenarios: http://etherpad.wikimedia.org/p/analytics-gerrit

=Mailinglists=

Reading mailing lists is important. All projects we build or use are opensource, and as most opensource projects, they have communities which come together on mailing lists. There is much knowledge to be gained in these mailing lists.

For an overview of all available mailinglists see https://lists.wikimedia.org/mailman/listinfo
 * Please subscribe to:
 * 1) https://lists.wikimedia.org/mailman/listinfo/analytics
 * 2) https://lists.wikimedia.org/mailman/listinfo/wikimetrics
 * 3) https://lists.wikimedia.org/mailman/listinfo/wmfresearch
 * Please request acces to:
 * 1) Analytics internal (email Toby)
 * 2) Operations
 * 3) Engineering


 * Optionally you may want to read archives or subscribe to the following mailing lists:
 * 1) Mediawiki
 * 2) Mobile
 * 3) Mediawiki API

If there are mailing lists you want to read without subscribing you may consider using the following gateways:
 * 1) Mail Archive
 * 2) Gmane
 * 3) Gossamer Threads

=Bugzilla=
 * 1) Create an account at https://bugzilla.wikimedia.org/
 * 2) Email [mailto:aklapper@wikimedia.org Andre Klapper] and ask to be CC'ed by default for components that you will be involved with. Frontend engineers: Limn, Wikimetrics. Backend engineers: ask Diederik

=IRC=
 * 1) Install an IRC client -- ask team members for recommendations ( some would be quassel, irssi, pidgin, xchat, textual or adium if you're on a Mac )
 * 2) Follow instructions on https://meta.wikimedia.org/wiki/IRC/Cloaks to request an IRC cloack
 * 3) Connect to #wikimedia-analytics on Freenode

= Process =

Google Calendar

 * 1) Add the Analytics Team Calendar to your default view. Most likely someone needs to "share" the calendar with you.

Scrum
Analytics/Scrum_Planning

=Equipment=

Hardware
As far as equipment goes you will need a good development machine.

Minimum machine specs:


 * >=4GB RAM
 * i7 >= 2.4 Ghz quad-core or better
 * 300GB disk

Recommended machine specs:


 * >=8GB RAM
 * i7 >= 2.4 Ghz quad-core or better
 * SSD (if you're going to be working on wikimetrics)
 * 300GB disk

At first sight you might think these are not required, but you will have to run VMs, you will be using vagrant to re-create various environments(sometimes with multiple nodes), so you will need some hardware for that.

Headset
Please buy a high-quality headset -- your colleagues will love you for this. For more tips see https://office.wikimedia.org/wiki/Office_IT/Projects/Telepresence

=Optional accounts=

You could consider creating accounts for:
 * GitHub

=Operating System=

The machines we deploy on are using Ubuntu and it would be more convenient for you to have Ubuntu installed on your development machine or any other UNIX based operating system. It will considerably facilitate your work. You may choose any other Linux distribution you're familiar with.

Mac is also a very possible choice.

=Misc=

This is a collection of things you might find useful in your work.

Sync tools
You may find the following tools useful for sync-ing files between your local machine and remote machines(one-way or two-way). You can also mount remote directories as if they were your local directories:


 * 1) sshfs
 * 2) rsync
 * 3) lsync
 * 4) unison
 * 5) scp

IDEs and editors
For Java development, you may use what IDE you feel comfortable with. Eclipse is the IDE du jour, but you might want to look at IDEA also. For remote development you may find vim to be useful(or a combination of a sync tool and your favorite editor/IDE). Other editors you might find useful may include Sublime Text, Emacs.

Searching
You may find the following tools useful to search through configuration files or code:


 * 1) Ack (mainly for grepping code. video presentation)
 * 2) grep
 * 3) GNU find

Environment simulation
It may be useful that you familiarize yourself with Vagrant and Puppet to be able to recreate smaller environments/conditions on your machine to test various software you're developing or contributing to.

Random Docs
Through description of our Hadoop infrastructure : https://plus.google.com/u/0/events/c53ho5esd0luccd09a1c30rlrmg