DALIB Basic Modules

Classifier

class dalib.modules.classifier.Classifier(backbone: torch.nn.modules.module.Module, num_classes: int, bottleneck: Optional[torch.nn.modules.module.Module] = None, bottleneck_dim: Optional[int] = -1, head: Optional[torch.nn.modules.module.Module] = None)[source]

Bases: torch.nn.modules.module.Module

A generic Classifier class for domain adaptation.

Parameters:
  • backbone (class:nn.Module object): Any backbone to extract 1-d features from data
  • num_classes (int): Number of classes
  • bottleneck (class:nn.Module object, optional): Any bottleneck layer. Use no bottleneck by default
  • bottleneck_dim (int, optional): Feature dimension of the bottleneck layer. Default: -1
  • head (class:nn.Module object, optional): Any classifier head. Use nn.Linear by default

Note

Different classifiers are used in different domain adaptation algorithms to achieve better accuracy respectively, and we provide a suggested Classifier for different algorithms. Remember they are not the core of algorithms. You can implement your own Classifier and combine it with the domain adaptation algorithm in this algorithm library.

Note

The learning rate of this classifier is set 10 times to that of the feature extractor for better accuracy by default. If you have other optimization strategies, please over-ride get_parameters.

Inputs:
  • x (tensor): input data fed to backbone
Outputs: predictions, features
  • predictions: classifier’s predictions
  • features: features after bottleneck layer and before head layer
Shape:
  • Inputs: (minibatch, *) where * means, any number of additional dimensions
  • predictions: (minibatch, num_classes)
  • features: (minibatch, features_dim)
features_dim

The dimension of features before the final head layer

get_parameters() → List[Dict[KT, VT]][source]

A parameter list which decides optimization hyper-parameters, such as the relative learning rate of each layer

Domain Discriminator

class dalib.modules.domain_discriminator.DomainDiscriminator(in_feature: int, hidden_size: int)[source]

Bases: torch.nn.modules.module.Module

Domain discriminator model from “Domain-Adversarial Training of Neural Networks”

Distinguish whether the input features come from the source domain or the target domain. The source domain label is 1 and the target domain label is 0.

Parameters:
  • in_feature (int): dimension of the input feature
  • hidden_size (int): dimension of the hidden features
Shape:
  • Inputs: (minibatch, in_feature)
  • Outputs: \((minibatch, 1)\)

GRL

class dalib.modules.grl.WarmStartGradientReverseLayer(alpha: Optional[float] = 1.0, lo: Optional[float] = 0.0, hi: Optional[float] = 1.0, max_iters: Optional[int] = 1000.0, auto_step: Optional[bool] = False)[source]

Bases: torch.nn.modules.module.Module

Gradient Reverse Layer \(\mathcal{R}(x)\) with warm start

The forward and backward behaviours are:

\[ \begin{align}\begin{aligned}\mathcal{R}(x) = x,\\\dfrac{ d\mathcal{R}} {dx} = - \lambda I.\end{aligned}\end{align} \]

\(\lambda\) is initiated at \(lo\) and is gradually changed to \(hi\) using the following schedule:

\[\lambda = \dfrac{2(hi-lo)}{1+\exp(- α \dfrac{i}{N})} - (hi-lo) + lo\]

where \(i\) is the iteration step.

Parameters:
  • alpha (float, optional): \(α\). Default: 1.0
  • lo (float, optional): Initial value of \(\lambda\). Default: 0.0
  • hi (float, optional): Final value of \(\lambda\). Default: 1.0
  • max_iters (int, optional): \(N\). Default: 1000
  • auto_step (bool, optional): If True, increase \(i\) each time forward is called. Otherwise use function step to increase \(i\). Default: False
step()[source]

Increase iteration number \(i\) by 1

Kernels

class dalib.modules.kernels.GaussianKernel(sigma: Optional[float] = None, track_running_stats: Optional[bool] = True, alpha: Optional[float] = 1.0)[source]

Bases: torch.nn.modules.module.Module

Gaussian Kernel Matrix

Gaussian Kernel k is defined by

\[k(x_1, x_2) = \exp \left( - \dfrac{\| x_1 - x_2 \|^2}{2\sigma^2} \right)\]

where \(x_1, x_2 \in R^d\) are 1-d tensors.

Gaussian Kernel Matrix K is defined on input group \(X=(x_1, x_2, ..., x_m),\)

\[K(X)_{i,j} = k(x_i, x_j)\]

Also by default, during training this layer keeps running estimates of the mean of L2 distances, which are then used to set hyperparameter \(\sigma\). Mathematically, the estimation is \(\sigma^2 = \dfrac{\alpha}{n^2}\sum_{i,j} \| x_i - x_j \|^2\). If track_running_stats is set to False, this layer then does not keep running estimates, and use a fixed \(\sigma\) instead.

Parameters:
  • sigma (float, optional): bandwidth \(\sigma\). Default: None
  • track_running_stats (bool, optional): If True, this module tracks the running mean of \(\sigma^2\). Otherwise, it won’t track such statistics and always uses fix \(\sigma^2\). Default: True
  • alpha (float, optional): \(\alpha\) which decides the magnitude of \(\sigma^2\) when track_running_stats is set to True
Inputs:
  • X (tensor): input group \(X\)
Shape:
  • Inputs: \((minibatch, F)\) where F means the dimension of input features.
  • Outputs: \((minibatch, minibatch)\)