LDA++
|
LDA++ is a C++ library and a set of accompanying console applications that enable the inference of various Latent Dirichlet Allocation models.
The project provides three console applications
and a library that can be used from your own C++ projects.
You can read the documentation site at ldaplusplus.com and there is of course an API documentation as well.
We use CMake for building the project and currently only provide the option to build from source. The LDA++ installation process is straightforward and documented at our site.
We expect that the preferred way of using LDA++ will be through the provided console applications. You can read thorough documentation for them as well. All our console applications are designed to read matrix files serialized in numpy format so that one can easily create files in a python session.
It suffices to say that the following shell session runs the Fast Supervised LDA (fsLDA) on the scikit learn digits dataset (provided you have installed LDA++).
$ python Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from sklearn.datasets import load_digits >>> import numpy as np >>> d = load_digits() >>> with open("digits.npy", "wb") as f: ... np.save(f, d.data.astype(np.int32).T) ... np.save(f, d.target.astype(np.int32)) ... >>> exit() $ fslda train digits.npy model.npy E-M Iteration 1 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 log p(y | \bar{z}, eta): -4137.75 log p(y | \bar{z}, eta): -3230.67 log p(y | \bar{z}, eta): -2758.81 log p(y | \bar{z}, eta): -2498.32 log p(y | \bar{z}, eta): -2341.4 log p(y | \bar{z}, eta): -2240.48 log p(y | \bar{z}, eta): -2172.38 log p(y | \bar{z}, eta): -2124.71 log p(y | \bar{z}, eta): -2090.4 log p(y | \bar{z}, eta): -2065.15 ... $ python Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> with open("model.npy") as f: ... alpha = np.load(f) ... beta = np.load(f) ... eta = np.load(f) ... >>> import matplotlib.pyplot as plt >>> plt.imshow(beta[0].reshape(8, 8), interpolation='nearest', cmap='gray') <matplotlib.image.AxesImage object at 0x7f4cf201b810> >>> plt.show() >>> exit()
MIT license found in the LICENSE file.