Gluon Face Toolkit

Gluon Face is a toolkit based on MXnet Gluon, provides SOTA deep learning algorithm and models in face recognition. If you are new to mxnet, please check out dmlc 60-minute crash course.

Hint

For Chinese readers, here is the zh-doc.

Gluon Face provides implement of losses in recent, including SoftmaxCrossEntropyLoss, ArcLoss, TripletLoss, RingLoss, CosLoss, L2Softmax, ASoftmax, CenterLoss, ContrastiveLoss, … , and we will keep updating in future.

Hint

Github: see details in gluon face.

If there is any method we overlooked, please open an issue.

Losses in GluonFR:

The last column of this chart is the best LFW accuracy reported in paper, they are trained with different data and networks, later we will give our results of these method with same train data and network.

Method Paper Visualization of MNIST LFW
Contrastive Loss ContrastiveLoss
Triplet 1503.03832
99.63±0.09
Center Loss CenterLoss img2 99.28
L2-Softmax 1703.09507
99.33
A-Softmax 1704.08063
99.42
CosLoss/AMSoftmax 1801.05599/1801.05599 img3 99.17
Arcloss 1801.07698 img4 99.82
Ring loss 1803.00130 img5 99.52
LGM Loss 1803.02988 img6 99.20±0.03

Authors

{ haoxintong Yangxv }

Discussion

中文社区Gluon-Forum Feel free to use English here :D.

References

  1. MXNet Documentation and Tutorials https://zh.diveintodeeplearning.org/
  2. NVIDIA DALI documentationNVIDIA DALI documentation
  3. Deepinsight insightface

Installation

Gluon Face supports Python 3.5 or later. To install this package you need install GluonCV and MXNet first:

pip install gluoncv --pre
pip install mxnet-mkl --pre --upgrade
# if cuda XX is installed
pip install mxnet-cuXXmkl --pre --upgrade

Then install gluonfr:

  • From Socure(recommend)
pip install git+https://github.com/THUFutureLab/gluon-face.git@master
  • Pip
pip install gluonfr

Datasets

gluonfr.data provides input pipeline for training and validation, all datasets is aligned by mtcnn and cropped to (112, 112) by DeepInsight, they converted images to train.rec, train.idx and val_data.bin files, please check out [insightface/Dataset-Zoo] for more information. In examples/dali_utils.py, there is a simple example of Nvidia-DALI. It is worth trying when data augmentation with cpu can not satisfy the speed of gpu training,

The files should be prepared like:

face/
    emore/
        train.rec
        train.idx
        property
    ms1m/
        train.rec
        train.idx
        property
    lfw.bin
    agedb_30.bin
    ...
    vgg2_fp.bin

We use ~/.mxnet/datasets as default dataset root to match mxnet setting.

Model Zoo

Mobilefacenet Result.

TestSet Ours Insightface Proposed
LFW: 99.56 99.50 99.55
CFP_FP: 92.98 88.94
AgeDB30: 95.86 95.91 96.07

Reference:

1. Our code train script and log/model in (Baidu:y5zh, Google Drive).

  1. Insightface result.
  2. Mobilefacenets papers.(No open project)
Details
Flip False True
lfw: 0.995500+-0.003337 0.995667+-0.003432
calfw: 0.951000+-0.012069 0.973083+-0.022889
cplfw: 0.882000+-0.014295 0.938556+-0.045234
cfp_fp: 0.927714+-0.015309 0.929880+-0.035907
agedb_30: 0.958667+-0.008492 0.934903+-0.033667
cfp_ff: 0.995571+-0.002744 0.944868+-0.037657
vgg2_fp: 0.920600+-0.010920 0.940581+-0.032677
Information
  1. Github has some projects train to a high level, but with embedding_size of 512,compare with them we use embedding_size of 128 which origin paper proposed, model size is only 4.1M.
  2. Welcome to use our train script to do more exploration, and if you get better results you could make a pr to us.
  3. We Pre-trained model through L2-Regularization, output is cos(theta).

API Reference

gluonfr.data

Hint

Please refer to Datasets for the description of the datasets listed in this page, and how to download and extract them.

API Reference

This module provides popular face recognition datasets.

class gluonfr.data.FRTrainRecordDataset

A dataset wrapping over a rec serialized file provided by InsightFace Repo.

Parameters:
  • name (str. Name of val dataset.) –
  • root (str. Path to face folder. Default is '$(HOME)/mxnet/datasets/face') –
  • transform (function, default None) –

    A user defined callback that transforms each sample. For example:

    transform=lambda data, label: (data.astype(np.float32)/255, label)
    
class gluonfr.data.FRValDataset

A dataset wrapping over a pickle serialized (.bin) file provided by InsightFace Repo.

Parameters:
  • name (str.) – Name of val dataset.
  • root (str.) – Path to face folder. Default is ‘$(HOME)/mxnet/datasets/face’
  • transform (callable, default None.) –

    A function that takes data and transforms them:

    transform = lambda data: data.astype(np.float32)/255
    

gluonfr.nn

Neural Network Components.

Hint

Not every component listed here is HybridBlock, which means some of them are not hybridizable. However, we are trying our best to make sure components required during inference are hybridizable so the entire network can be exported and run in other languages.

For example, encoders are usually non-hybridizable but are only required during training. In contrast, decoders are mostly HybridBlock s.

Basic Blocks

Blocks that usually used in face recognition.

API Reference

Basic Blocks used in GluonFR.

class gluonfr.nn.basic_blocks.NormDense

Norm Dense

class gluonfr.nn.basic_blocks.SELayer

SE Layer

class gluonfr.nn.basic_blocks.FrBase

This is base class for all face recognition network. In this class, we defined the NormDense and control flow of the sub classes. In any sub classes, only need to implement features and embedding_layer. Normally we add embedding_layer to features.

Parameters:
  • classes (int) – Number of classification classes.
  • embedding_size (int) – Units of embedding layer.
  • weight_norm (bool, default False) – Whether use weight norm in NormDense layer.
  • feature_norm (bool, default False) – Whether use features norm in NormDense layer.
  • need_cls_layer (bool, default True) – Whether use NormDense layer.Normally it depends on your loss function. When you use Softmax, ArcLoss or based on Softmax loss, you need to set it to True. When you only need embedding output, like you are predicting or training with triplet loss, you need to set it to False.

gluonfr.loss

gluonfr.loss.get_loss Return the loss by name.
gluonfr.loss.get_loss_list Get the entire list of loss names in losses.
API Reference

Custom losses

gluonfr.loss.get_loss(name, **kwargs)[source]

Return the loss by name.

Parameters:
  • name (str.) – Available losses name in gluon face
  • kwargs (str.) – Check the docs for details.
Returns:

The loss.

Return type:

HybridBlock

gluonfr.loss.get_loss_list()[source]

Get the entire list of loss names in losses.

Returns:Entire list of loss names in losses.
Return type:list of str.
class gluonfr.loss.ArcLoss

ArcLoss from “ArcFace: Additive Angular Margin Loss for Deep Face Recognition” paper.

Parameters:
  • classes (int.) – Number of classes.
  • m (float.) – Margin parameter for loss.
  • s (int.) – Scale parameter for loss.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.TripletLoss

Calculates triplet loss given three input tensors and a positive margin. Triplet loss measures the relative similarity between prediction, a positive example and a negative example:

\[L = \sum_i \max(\Vert {pred}_i - {pos_i} \Vert_2^2 - \Vert {pred}_i - {neg_i} \Vert_2^2 + {margin}, 0)\]

pred, positive and negative can have arbitrary shape as long as they have the same number of elements.

Parameters:
  • margin (float) – Margin of separation between correct and incorrect pair.
  • weight (float or None) – Global scalar weight for loss.
  • batch_axis (int, default 0) – The axis that represents mini-batch.
Inputs:
  • pred: prediction tensor with arbitrary shape
  • positive: positive example tensor with arbitrary shape. Must have the same size as pred.
  • negative: negative example tensor with arbitrary shape Must have the same size as pred.
Outputs:
  • loss: loss tensor with shape (batch_size,).
class gluonfr.loss.RingLoss

Computes the Ring Loss from “Ring loss: Convex Feature Normalization for Face Recognition” paper.

\[L = -\sum_i \log \softmax({pred})_{i,{label}_i} + \frac{\lambda}{2m} \sum_{i=1}^{m} (\Vert \mathcal{F}({x}_i)\Vert_2 - R )^2\]
Parameters:
  • lamda (float.) – The loss weight enforcing a trade-off between the softmax loss and ring loss.
  • r_init (float.) – The initial value of Hyper Parameter R.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.CosLoss
CosLoss from

“CosFace: Large Margin Cosine Loss for Deep Face Recognition” paper.

It is also AM-Softmax from “Additive Margin Softmax for Face Verification” paper.

Parameters:
  • classes (int.) – Number of classes.
  • m (float, default 0.4) – Margin parameter for loss.
  • s (int, default 64) – Scale parameter for loss.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.L2Softmax

L2Softmax from “L2-constrained Softmax Loss for Discriminative Face Verification” paper.

Parameters:
  • classes (int.) – Number of classes.
  • alpha (float.) – The scaling parameter, a hypersphere with small alpha will limit surface area for embedding features.
  • p (float, default is 0.9.) – The expected average softmax probability for correctly classifying a feature.
  • from_normx (bool, default is False.) – Whether input has already been normalized.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.ASoftmax

ASoftmax from “SphereFace: Deep Hypersphere Embedding for Face Recognition” paper. input(weight, x) has already been normalized

Parameters:
  • classes (int.) – Number of classes.
  • m (float.) – Margin parameter for loss.
  • s (int.) – Scale parameter for loss.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.CenterLoss

Computes the Center Loss from “A Discriminative Feature Learning Approach for Deep Face Recognition” paper.

Implementation is refer to “https://github.com/ShownX/mxnet-center-loss/blob/master/center_loss.py

Parameters:
  • classes (int.) – Number of classes.
  • lamda (float) – The loss weight enforcing a trade-off between the softmax loss and center loss.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.ContrastiveLoss

Computes the contrastive loss. See “Dimensionality Reduction by Learning an Invariant Mapping” paper. This loss encourages the embedding to be close to each other for the samples of the same label and the embedding to be far apart at least by the margin constant for the samples of different labels.

Parameters:margin (float, default is 1.) – Margin term in the loss definition.
Inputs:
  • anchor: prediction tensor. Embeddings should be l2 normalized.
  • positive: positive example tensor with arbitrary shape. Must have the same size as anchor. Embeddings should be l2 normalized.
  • labels: array with shape (batch_size,) of binary labels indicating positive vs negative pair.
Outputs:
  • loss: loss tensor with shape (batch_size,).Dimensions other than batch_axis are averaged out.
class gluonfr.loss.LGMLoss

LGM Loss from “Rethinking Feature Distribution for Loss Functions in Image Classification” paper.

Implementation is refer to https://github.com/LeeJuly30/L-GM-Loss-For-Gluon/blob/master/L_GM.py

Parameters:
  • num_classes (int.) – The num of classes.
  • embedding_size (int.) – The size of embedding feature.
  • alpha (float.) – A non-negative parameter controlling the size of the expected margin between two classes on the training set.
  • lamda (float.) – A non-negative weighting coefficient.
  • lr_mult (float.) – Var updating need a relatively low learning rate compared to the overall learning rate.
class gluonfr.loss.MPSLoss

Computes the MPS Loss from “DocFace: Matching ID Document Photos to Selfies” paper.

Parameters:m (float) – Margin parameter for loss.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.GitLoss

Computes the Git Loss from “Git Loss for Deep Face Recognition” paper.

This implementation require the batch size not changing in training or validation. Commonly, it is ok, as when we train models last batch discard is applied, and no need for validation to compute the loss.

Parameters:
  • classes (int.) – Number of classes.
  • embedding_size (int.) – Size of feature.
  • lamda_c (float.) – The loss weight enforcing a trade-off between the softmax loss and center loss.
  • lamda_g (float.) – The loss weight enforcing a trade-off between the softmax loss and git loss.
  • batch_size_per_gpu (int.) – This size is sample numbers in each gpu or device, not total batch size
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.COCOLoss

Computes the COCO Loss from “Rethinking Feature Discrimination and Polymerization for Large-scale Recognition” paper.

This loss can be replaced by NormDense with Softmax, it is not recommended to use this.

Parameters:
  • classes (int.) – Number of classes.
  • embedding_size (int.) – Size of feature.
  • alpha (float.) – The scaling parameter, a hypersphere with small alpha will limit surface area for embedding features.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
class gluonfr.loss.SVXSoftmax

SVXSoftmax from “Support Vector Guided Softmax Loss for Face Recognition” paper.

When use default parameter, the designed SV-X-Softmax loss becomes identical to the original softmax loss.

Parameters:
  • classes (int.) – Number of classes.
  • s (int.) – Scale parameter for loss.
  • t (float.) – Indicator parameter of SV.
  • m1 (float.) – Margin parameter for sphere softmax.
  • m2 (float.) – Margin parameter for cos/am softmax.
  • m3 (float.) – Margin parameter for arc softmax.
Outputs:
  • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.

gluonfr.model_zoo

gluonfr.model_zoo.get_model Returns a model by name.
gluonfr.model_zoo.get_model_list Get the entire list of model names in model_zoo.

Hint

This is the recommended method for getting a pre-defined model.

API Reference

Models for face recognition

class gluonfr.model_zoo.AttentionNet

AttentionNet Model from “Residual Attention Network for Image Classification” paper.

Parameters:
  • classes (int.) – Number of classification classes.
  • modules (list.) – The number of Attention Module in each stage.
  • p (int.) – Number of pre-processing Residual Units before split into trunk branch and mask branch.
  • t (int.) – Number of Residual Units in trunk branch.
  • r (int.) – Number of Residual Units between adjacent pooling layer in the mask branch.
  • kwargs
class gluonfr.model_zoo.AttentionNetFace

AttentionNet Model for input 112x112.

Parameters:
  • classes (int.) – Number of classification classes.
  • modules (list.) – The number of Attention Module in each stage.
  • p (int.) – Number of pre-processing Residual Units before split into trunk branch and mask branch.
  • t (int.) – Number of Residual Units in trunk branch.
  • r (int.) – Number of Residual Units between adjacent pooling layer in the mask branch.
  • embedding_size (int) – Units of embedding layer.
  • weight_norm (bool, default False) – Whether use weight norm in NormDense layer.
  • feature_norm (bool, default False) – Whether use features norm in NormDense layer.
  • need_cls_layer (bool, default True) – Whether use NormDense layer.Normally it depends on your loss function. When you use Softmax, ArcLoss or based on Softmax loss, you need to set it to True. When you only need embedding output, like you are predicting or training with triplet loss, you need to set it to False.
class gluonfr.model_zoo.MobileFaceNet

Mobile FaceNet

gluonfr.model_zoo.attention_net128(classes=-1, need_cls_layer=True, **kwargs)[source]

AttentionNet 128 Model for face recognition.

Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.attention_net164(classes=-1, need_cls_layer=True, **kwargs)[source]

AttentionNet 164 Model for face recognition.

Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.attention_net236(classes=-1, need_cls_layer=True, **kwargs)[source]

AttentionNet 236 Model for face recognition.

Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.attention_net452(classes=-1, need_cls_layer=True, **kwargs)[source]

AttentionNet 452 Model for face recognition.

Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.attention_net56(classes=-1, need_cls_layer=True, **kwargs)[source]

AttentionNet 56 Model for face recognition.

Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.attention_net92(classes=-1, need_cls_layer=True, **kwargs)[source]

AttentionNet 92 Model for face recognition.

Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.get_attention_face(classes=-1, num_layers=128, embedding_size=512, need_cls_layer=True, **kwargs)[source]

AttentionNet Model for 112x112 face images from “Residual Attention Network for Image Classification” paper.

Parameters:
  • classes (int, -1) – Number of classification classes.
  • num_layers (int, 128) – Numbers of layers. Options are 56, 92, 128, 164, 236, 452.
  • embedding_size (int, 256) – Feature dimensions of the embedding layers.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.get_attention_net(classes, num_layers, **kwargs)[source]

AttentionNet Model from “Residual Attention Network for Image Classification” paper.

Parameters:
  • classes (int,) – Number of classification classes.
  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 236, 452.
gluonfr.model_zoo.get_mobile_facenet(classes=-1, need_cls_layer=True, **kwargs)[source]
Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.get_mobile_facenet_re(classes=-1, need_cls_layer=True, **kwargs)[source]
Parameters:
  • classes (int, -1) – Number of classification classes.
  • need_cls_layer (bool, default True) – Whether to use NormDense output layer.
gluonfr.model_zoo.get_model(name, **kwargs)[source]

Returns a model by name.

Parameters:
  • name (str) – Name of the model.
  • classes (int) – Number of classes for the output layer.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
Returns:

The model.

Return type:

HybridBlock

gluonfr.model_zoo.get_model_list()[source]

Get the entire list of model names in model_zoo.

Returns:Entire list of model names in model_zoo.
Return type:list of str
gluonfr.model_zoo.get_se_resnet(num_layers, **kwargs)[source]

SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • version (int) – Version of ResNet. Options are 1, 2.
  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluonfr.model_zoo.se_resnet101_v2(**kwargs)[source]

SE_ResNet-101 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluonfr.model_zoo.se_resnet152_v2(**kwargs)[source]

SE_ResNet-152 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluonfr.model_zoo.se_resnet18_v2(**kwargs)[source]

SE_ResNet-18 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluonfr.model_zoo.se_resnet34_v2(**kwargs)[source]

SE_ResNet-34 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluonfr.model_zoo.se_resnet50_v2(**kwargs)[source]

SE_ResNet-50 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

gluonfr.metrics

Metrics used in training a face recognition model.

API Reference

This module provides metric used in face recognition

class gluonfr.metrics.FaceVerification

Compute confusion matrix of 1:1 problem in face verification or other fields. Use update() to collect the outputs and compute distance in each batch, then use get() to compute the confusion matrix and accuracy of the val dataset.

Parameters:
  • nfolds (int, default is 10) –
  • thresholds (ndarray, default is None.) – Use np.arange to generate thresholds. If thresholds=None, np.arange(0, 2, 0.01) will be used for euclidean distance.
  • far_target (float, default is 1e-3.) – This is used to get the verification accuracy of expected far.
  • dist_type (int, default is 0.) – Option value is {0, 1}, 0 for euclidean distance, 1 for cosine similarity. Here for cosine distance, we use 1 - cosine as the final distances.

gluonfr.utils

We implemented a broad range of utility functions which cover visualization, file handler, download and training helpers.

Visualization
plot_accuracy
plot_roc
API Reference