Language Models
For your convenience, we have included some ready-to-use Neural Network Models in NLarge.
Ready-to-Use Models
Some NLP models are implemented as classes for the convenience of use. Each class is implemented to provide users with different neural network architectures for text classification, allowing flexibility in model selection based on specific needs like attention mechanisms, multi-head attention, or recurrent layers such as LSTM, GRU, and vanilla RNN.
RNN.py
This module offers models based on a vanilla RNN for text classification, providing alternatives with and without max pooling.
1. TextClassifierRNN
A simple RNN-based classifier, utilizing a fully connected layer after the final hidden state for sequence classification. Suitable for shorter sequences where RNN limitations, such as gradient vanishing, are less significant.
2. TextClassifierRNNMaxPool
Extends TextClassifierRNN
by applying a max-pooling operation over the RNN outputs. This version can better capture the most relevant features across the entire sequence.
Example Usage:
LSTM.py
This module provides LSTM-based models for text classification, with and without attention mechanisms.
1. TextClassifierLSTM
Implements a bidirectional LSTM classifier, where the final hidden state of the sequence is passed to a fully connected layer for binary classification.
2. Attention
An attention layer for use with LSTM outputs. It computes attention scores for each step in the sequence, providing a weighted sum of hidden states based on their importance.
3. TextClassifierLSTMWithAttention
An LSTM classifier incorporating the attention mechanism. It uses a bidirectional LSTM to capture contextual information, followed by attention to highlight critical sequence components, which enhances interpretability.
Example Usage:
GRU.py
This module contains GRU-based models for text classification, providing options with and without attention.
1. TextClassifierGRU
Implements a GRU-based classifier using bidirectional GRU layers, allowing context from both directions in the sequence. It applies a fully connected layer on the final hidden state, suitable for capturing sequential information.
2. Attention
This class defines an attention mechanism specific to the GRU output, helping focus on the most informative parts of the sequence.
3. TextClassifierGRUWithAttention
A GRU-based classifier that incorporates attention to focus on important sequence parts before final classification. This model uses a bidirectional GRU, with attention applied to the output, and aggregates the context vector to the final output layer.
Example Usage:
attention.py
This module provides attention-based models for text classification.
1. TextClassifierAttentionNetwork
Implements a basic attention mechanism with query, key, and value layers. The model calculates attention weights, aggregates them to form a context vector, and then applies a fully connected layer followed by a sigmoid activation for binary classification.
2. MultiHeadAttention
A multi-head attention layer that splits the input into multiple heads to capture various aspects of the input representation, making it effective in handling complex relationships in the input sequence.
3. TextClassifierMultiHeadAttentionNetwork
Uses the MultiHeadAttention
class for multi-head attention followed by a fully connected layer for classification. It aggregates attention across multiple heads to increase model interpretability and capture richer sequence-level information.
Example Usage: