I like this introduction very much. One thing puzzling me is the term approximate stemming. Another thing puzzling me is the focus on stemming so much.
In my opinion stemming is always an approximation of the end result we want to achieve: find the dictionary form of the word. Stemming algorithms for many languages exist. I can also think of other methods, like fuzzy matching, machine learning with user training or morphological analysis, which is available for example for Finnish.
The above processed can be combined with some parsing to learn whether word is a noun or verb for example, which makes a huge difference in English.
Also, the section about word segmentation does not say anything about identifying wikipedia:multiword expressions, which are also important for effective use of glossaries and dictionaries.