Masked classes for pruning BERT-like Transformers


The classes in this module are adapted from Victor Sanh's implementation of Movement Pruning: Adaptive Sparsity by Fine-Tuning in the examples/research_projects of the transformers repository. The main changes are as follows:

  • To make these classes compatible with v4 of transformers we have replaced all instances of BertLayerNorm with torch.nn.LayerNorm
  • In the forward method of TopKBinarizer, we check whether threshold is a float or a list, since the latter occurs in the datasets format.

Masked versions of BERT

In order to compute the adaptive mask during pruning, a special set of "masked" BERT classes is required to account for sparsity in the model's weights. François Lagunas is planning to release a generic version of these classes around March 2021, so we should consider using those when they become available.

class MaskedBertConfig[source]

MaskedBertConfig(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, pad_token_id=0, pruning_method='topK', mask_init='constant', mask_scale=0.0, **kwargs) :: PretrainedConfig

A class replicating the ~transformers.BertConfig with additional parameters for pruning/masking configuration.

class MaskedBertPreTrainedModel[source]

MaskedBertPreTrainedModel(config:PretrainedConfig, *inputs, **kwargs) :: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

class MaskedBertModel[source]

MaskedBertModel(config) :: MaskedBertPreTrainedModel

The MaskedBertModel class replicates the :class:~transformers.BertModel class and adds specific inputs to compute the adaptive mask on the fly. Note that we freeze the embeddings modules from their pre-trained values.

class BertEmbeddings[source]

BertEmbeddings(config) :: Module

Construct the embeddings from word, position and token_type embeddings.

class BertEncoder[source]

BertEncoder(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class MaskedBertForQuestionAnswering[source]

MaskedBertForQuestionAnswering(config) :: MaskedBertPreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

class BertLayer[source]

BertLayer(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class BertAttention[source]

BertAttention(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class BertSelfAttention[source]

BertSelfAttention(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class MaskedLinear[source]

MaskedLinear(in_features:int, out_features:int, bias:bool=True, mask_init:str='constant', mask_scale:float=0.0, pruning_method:str='topK') :: Linear

Fully Connected layer with on the fly adaptive mask. If needed, a score matrix is created to store the importance of each associated weight.

class BertSelfOutput[source]

BertSelfOutput(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class BertIntermediate[source]

BertIntermediate(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class BertOutput[source]

BertOutput(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class BertPooler[source]

BertPooler(config) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class TopKBinarizer[source]

TopKBinarizer() :: Function

Top-k Binarizer. Computes a binary mask M from a real value matrix S such that M_{i,j} = 1 if and only if S_{i,j} is among the k% highest values of S.

Implementation is inspired from: https://github.com/allenai/hidden-networks What's hidden in a randomly weighted neural network? Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari