LDA++
|
#include <MultinomialLogisticRegression.hpp>
Public Member Functions | |
MultinomialLogisticRegression (const MatrixX &X, const Eigen::VectorXi &y, VectorX Cy, Scalar L) | |
MultinomialLogisticRegression (const MatrixX &X, const Eigen::VectorXi &y, Scalar L) | |
Scalar | value (const MatrixX &eta) const |
void | gradient (const MatrixX &eta, Eigen::Ref< MatrixX > grad) const |
MultinomialLogisticRegression is an implementation of the multinomial logistic loss function (without bias unit).
It follows the protocol used by GradientDescent. For the specific function implementations see value() and gradient().
ldaplusplus::optimization::MultinomialLogisticRegression< Scalar >::MultinomialLogisticRegression | ( | const MatrixX & | X, |
const Eigen::VectorXi & | y, | ||
VectorX | Cy, | ||
Scalar | L | ||
) |
X | The documents defining the minimization problem ( \(X \in \mathbb{R}^{D \times N}\)) |
y | The class indexes for each document ( \(y \in \mathbb{N}^N\)) |
Cy | A different weight for each class in the optimization problem |
L | The L2 regularization penalty for the weights |
ldaplusplus::optimization::MultinomialLogisticRegression< Scalar >::MultinomialLogisticRegression | ( | const MatrixX & | X, |
const Eigen::VectorXi & | y, | ||
Scalar | L | ||
) |
X | The documents defining the minimization problem ( \(X \in \mathbb{R}^{D \times N}\)) |
y | The class indexes for each document ( \(y \in \mathbb{N}^N\)) |
L | The L2 regularization penalty for the weights |
void ldaplusplus::optimization::MultinomialLogisticRegression< Scalar >::gradient | ( | const MatrixX & | eta, |
Eigen::Ref< MatrixX > | grad | ||
) | const |
The gradient of the objective function implemented in value().
We use \(I(y) \in \mathbb{R}^Y\) as the indicator vector of \(y\) (a vector with all the values 0 except at the yth position).
\[ \nabla_{\eta} J = -\sum_{n=1}^N C_{y_n} \left( X_n I(y_n)^T - \frac{\sum_{\hat{y}=1}^Y X_n I(\hat{y})^T \exp(\eta_{\hat{y}}^T X_n)} {\sum_{\hat{y}=1}^Y \exp(\eta_{\hat{y}}^T X_n)} \right) + L \eta \]
eta | The weights of the linear model ( \(\eta \in \mathbb{R}^{D \times Y}\)) |
grad | A matrix of dimensions equal to \(\eta\) that will hold the result |
Scalar ldaplusplus::optimization::MultinomialLogisticRegression< Scalar >::value | ( | const MatrixX & | eta | ) | const |
The value of the objective function to be minimized.
\(N\) is the number of documents (different vectors), \(X_n \in \mathbb{R}^D\) is the nth document, \(\eta_y \in \mathbb{R}^D\) is the weights vector for the class \(y\) defining the hyperplane that separates class \(y\) from all the other, finally \(y_n\) is the class of the nth document.
\[ J = -\sum_{n=1}^N C_{y_n}\left(\eta_{y_n}^T X_n - \log\left( \sum_{\hat{y}=1}^Y \exp\left( \eta_{\hat{y}}^T X_n \right) \right)\right) + \frac{L}{2} \left\| \eta \right\|_F^2 \]
eta | The weights of the linear model ( \(\eta \in \mathbb{R}^{D \times Y}\)) |