User:Pfctdayelise/Using Wikipedia as a resource for computational linguistics
Appearance
The aim of this book is to outline some areas of research in computational linguistics or natural language processing where Wikipedia, and by extension the other Wikimedia projects, have the potential to be valuable resources. It is not intended to serve as an introduction to either of these fields and does not assume any knowledge of Wikipedia.
Description of Wikimedia projects
[edit | edit source]Witkionary, Wikinews, Commons, Wikibooks, Wikisource, Wikiquote. Languages. Meta & Commons direct translations of help (etc) pages. Who contributes? growth.
Database dumps
[edit | edit source]Post(pre?)-processing tools
[edit | edit source]Description of the English Wikipedia
[edit | edit source]- License
- Accessible - dumps
- Coverage - biased to pop-culture and geek topics (best coverage), wikiprojects
- Format - MOS - but not reliable
- FAs, cleanup tags
- RDRs
Interwiki links
[edit | edit source]Categories
[edit | edit source]Disambiguation pages
[edit | edit source]Possible tasks
[edit | edit source]Word sense disambiguation
[edit | edit source]Word and phrase translation
[edit | edit source]Web mining, data mining
[edit | edit source]Machine translation
[edit | edit source]Geospatial term disambiguation and named entity recognition
[edit | edit source]Image analysis (?)
[edit | edit source]Synonymy, abbreviations (RDRs)
[edit | edit source]
- http://wm.sieheauch.de/?p=48 papers
- Wiki Research Bibliography
- Michael Strube
- "Web corpus mining by instance of Wikipedia"
- Video mining