d532: FastText for text representations and text classifiers

Fasttext – Library for efficient text classification and representation learning

FastText - Library for efficient text classification and representation learning
FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. https://fаsttext.сс/

Download pre-trained fasttext word vectors

Pre-trained word vectors learned on different sources can be downloaded below:

  1. wiki-news-300d-1M.vec.zip: 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
  2. wiki-news-300d-1M-subword.vec.zip: 1 million word vectors trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
  3. crawl-300d-2M.vec.zip: 2 million word vectors trained on Common Crawl (600B tokens).

These models have state of the art performance on several benchmarks (up to 88% accuracy on the popular word analogy dataset). https://www.facebook.com/groups/1174547215919768/permalink/1631075090266976/