Appearance
Cleaning the Textual Variables
- Deleting Descrition
- Replace common French HTML entities with their characters
- Language Recognition with langdetect
- sent_tokenize with french language (?!?)
- Removal of special characters and numbers except [^a-zA-ZéàèêëîïôùüçÀÉÈÊËÎÏÔÙÜÇ]
- word_tokenize --> dic
- Stemming with detected language and SnowballStemmer in en/de/fr
- join dic to string
