Tokenization is the process of breaking piece of text into smaller pieces like words, phrases, symbols and other elements which are called tokens. Even a whole sentence can be considered as a token. During the tokenization process some characters like punctuation marks can be removed. The tokens then become an input for other
processes in text mining like parsing.
Asked In: Many Interviews |