The Principal Dev – Masterclass for Tech Leads

The Principal Dev – Masterclass for Tech Leads28-29 May

Join

libfacedetection

This is an open source library for CNN-based face detection in images. The CNN model has been converted to static variables in C source files. The source code does not depend on any other libraries. What you need is just a C++ compiler. You can compile the source code under Windows, Linux, ARM and any platform with a C++ compiler.

SIMD instructions are used to speed up the detection. You can enable AVX2 if you use Intel CPU or NEON for ARM.

The model files are provided in src/facedetectcnn-data.cpp (C++ arrays) & the model (ONNX) from OpenCV Zoo. You can try our scripts (C++ & Python) in opencv_dnn/ with the ONNX model. View the network architecture here.

Please note that OpenCV DNN does not support the latest version of YuNet with dynamic input shape. Please ensure you have the exact same input shape as the one in the ONNX model to run latest YuNet with OpenCV DNN.

examples/detect-image.cpp and examples/detect-camera.cpp show how to use the library.

The library was trained by libfacedetection.train.

Examples

How to use the code

You can copy the files in directory src/ into your project, and compile them as the other files in your project. The source code is written in standard C/C++. It should be compiled at any platform which supports C/C++.

Some tips:

You can also compile the source code to a static or dynamic library, and then use it in your project.

How to compile

CNN-based Face Detection on Intel CPU

Using AVX2 instructions

Method Time FPS Time FPS
X64 X64 X64 X64
Single-thread Single-thread Multi-thread Multi-thread
cnn (CPU, 640x480) 50.02ms 19.99 6.55ms 152.65
cnn (CPU, 320x240) 13.09ms 76.39 1.82ms 550.54
cnn (CPU, 160x120) 3.61ms 277.37 0.57ms 1745.13
cnn (CPU, 128x96) 2.11ms 474.60 0.33ms 2994.23

Using AVX512 instructions

Method Time FPS Time FPS
X64 X64 X64 X64
Single-thread Single-thread Multi-thread Multi-thread
cnn (CPU, 640x480) 46.47ms 21.52 6.39ms 156.47
cnn (CPU, 320x240) 12.10ms 82.67 1.67ms 599.31
cnn (CPU, 160x120) 3.37ms 296.47 0.46ms 2155.80
cnn (CPU, 128x96) 1.98ms 504.72 0.31ms 3198.63

CNN-based Face Detection on ARM Linux (Raspberry Pi 4 B)

Method Time FPS Time FPS
Single-thread Single-thread Multi-thread Multi-thread
cnn (CPU, 640x480) 404.63ms 2.47 125.47ms 7.97
cnn (CPU, 320x240) 105.73ms 9.46 32.98ms 30.32
cnn (CPU, 160x120) 26.05ms 38.38 7.91ms 126.49
cnn (CPU, 128x96) 15.06ms 66.38 4.50ms 222.28

Performance on WIDER Face

Run on default settings: scales=[1.], confidence_threshold=0.02, floating point:

AP_easy=0.887, AP_medium=0.871, AP_hard=0.768

Highway Optimized Version

An independent Highway-based implementation has been added under highway/. It keeps the original implementation untouched and exposes a separate C API:

#include "facedetect_hw.h"

int* results = facedetect_hw_cnn(result_buffer, bgr_image_data,
                                 width, height, step);

The Highway version follows the same deployment model as the original project: build separately for each instruction set/platform. It does not currently use Highway runtime dynamic dispatch. The current x86 performance path is:

Backend Description
pure Highway Portable Highway kernels for pointwise, depthwise, maxpool, image/network/postprocess flow
x86 hybrid AVX2 Highway packed pointwise plus guarded AVX2/FMA intrinsics for selected depthwise/maxpool ceiling kernels

The public API uses thread-local internal workspaces, so the recommended multi-threading model is external parallelism: call facedetect_hw_cnn from multiple threads, with one result buffer per calling thread.

Measured resolution benchmark:

ISA / Backend 640x480 single 640x480 external MT 320x240 single 320x240 external MT 160x120 single 160x120 external MT 128x96 single 128x96 external MT
origin default/scalar 65.66 ms / 15.23 FPS 5.58 ms / 179.19 FPS 16.41 ms / 60.93 FPS 1.48 ms / 674.91 FPS 3.69 ms / 270.69 FPS 0.47 ms / 2110.46 FPS 2.20 ms / 453.71 FPS 0.30 ms / 3291.92 FPS
origin AVX2 33.42 ms / 29.92 FPS 4.56 ms / 219.10 FPS 7.70 ms / 129.91 FPS 1.08 ms / 926.34 FPS 1.49 ms / 673.02 FPS 0.23 ms / 4390.31 FPS 0.88 ms / 1140.45 FPS 0.15 ms / 6750.38 FPS
origin AVX512 N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU
Highway scalar 53.50 ms / 18.69 FPS 4.64 ms / 215.70 FPS 13.93 ms / 71.81 FPS 1.39 ms / 718.85 FPS 3.46 ms / 288.90 FPS 0.44 ms / 2298.29 FPS 2.06 ms / 484.78 FPS 0.32 ms / 3112.95 FPS
Highway SSE/default 14.84 ms / 67.39 FPS 2.62 ms / 381.29 FPS 3.84 ms / 260.39 FPS 0.48 ms / 2062.60 FPS 0.97 ms / 1036.19 FPS 0.18 ms / 5678.19 FPS 0.58 ms / 1710.98 FPS 0.10 ms / 10125.38 FPS
Highway AVX2 pure 9.20 ms / 108.72 FPS 2.45 ms / 408.33 FPS 2.34 ms / 427.32 FPS 0.43 ms / 2323.32 FPS 0.58 ms / 1727.04 FPS 0.13 ms / 7747.01 FPS 0.35 ms / 2894.80 FPS 0.06 ms / 17268.13 FPS
Highway AVX2 hybrid 8.48 ms / 117.89 FPS 2.44 ms / 410.30 FPS 2.14 ms / 468.15 FPS 0.39 ms / 2538.84 FPS 0.53 ms / 1903.34 FPS 0.09 ms / 10740.69 FPS 0.32 ms / 3158.93 FPS 0.05 ms / 19047.05 FPS
Highway AVX512 pure N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU N/A on this CPU

Notes:

Rust Crate

A pure Rust implementation is available under rust/ as the crate libfacedetection_rs. It uses the same CNN model data, generated into Rust source so the crate is self-contained and does not need to parse the C++ model file at build time.

The Rust crate is intended for normal Rust dependency usage:

[dependencies]
libfacedetection_rs = "0.1"

Minimal usage:

use libfacedetection_rs::{image, Detector};

fn main() -> Result<(), libfacedetection_rs::DetectError> {
    let image = image::open("face.jpg")?;
    let mut detector = Detector::new();
    let detection = detector.detect(&image)?;

    println!("faces: {}", detection.face_count());
    Ok(())
}

Current crate shape:

Item Status
Public API Safe Rust API: Detector, Detection, Face, DetectError
Image loading Via the image crate, with PNG/JPEG enabled
Model data Generated static Rust source in rust/src/generated/model_data.rs
Acceleration Scalar fallback plus runtime AVX2 dispatch on x86/x86_64
C ABI Not published in the Rust crate 0.1.0; C wrapper may be split later

Measured resolution benchmark against the original C++ implementation:

Backend 640x480 single 640x480 external MT 320x240 single 320x240 external MT 160x120 single 160x120 external MT 128x96 single 128x96 external MT
origin C++ AVX2 35.93 ms / 27.83 FPS 5.07 ms / 197.39 FPS 7.57 ms / 132.07 FPS 0.93 ms / 1070.13 FPS 1.49 ms / 669.31 FPS 0.23 ms / 4339.53 FPS 0.89 ms / 1122.29 FPS 0.15 ms / 6879.46 FPS
Rust AVX2 8.73 ms / 114.60 FPS 2.18 ms / 458.86 FPS 2.25 ms / 444.14 FPS 0.36 ms / 2809.03 FPS 0.53 ms / 1880.12 FPS 0.08 ms / 11948.88 FPS 0.32 ms / 3125.11 FPS 0.07 ms / 14812.76 FPS

Notes:

Author

Contributors

All contributors who contribute at GitHub.com are listed here.

The contributors who were not listed at GitHub.com:

Acknowledgment

The work was partly supported by the Science Foundation of Shenzhen (Grant No. 20170504160426188).

Citation

The master thesis of Mr. Wei Wu. All details of the algorithm are in the thesis. The thesis can be downloaded at 吴伟硕士毕业论文

@thesis{wu2023thesisyunet,
    author      = {吴伟},
    title       = {面向边缘设备的高精度毫秒级人脸检测技术研究},
    type        = {硕士学位论文},
    institution = {南方科技大学},
    year        = {2023},
}

The paper for the main idea of this repository https://link.springer.com/article/10.1007/s11633-023-1423-y.

@article{wu2023miryunet,
	title     = {YuNet: A Tiny Millisecond-level Face Detector},
	author    = {Wu, Wei and Peng, Hanyang and Yu, Shiqi},
	journal   = {Machine Intelligence Research},
	pages     = {1--10},
	year      = {2023},
	doi       = {10.1007/s11633-023-1423-y},
	publisher = {Springer}
}

The survey paper on face detection to evaluate different methods. It can be open-accessed at https://ieeexplore.ieee.org/document/9580485

@article{feng2022face,
	author  = {Feng, Yuantao and Yu, Shiqi and Peng, Hanyang and Li, Yan-Ran and Zhang, Jianguo},
	journal = {IEEE Transactions on Biometrics, Behavior, and Identity Science}, 
	title   = {Detect Faces Efficiently: A Survey and Evaluations}, 
	year    = {2022},
	volume  = {4},
	number  = {1},
	pages   = {1-18},
	doi     = {10.1109/TBIOM.2021.3120412}
}

The loss used in training is EIoU, a novel extended IoU. The paper can be open-accessed at https://ieeexplore.ieee.org/document/9429909.

@article{peng2021eiou,
	author  = {Peng, Hanyang and Yu, Shiqi},
	journal = {IEEE Transactions on Image Processing}, 
	title   = {A Systematic IoU-Related Method: Beyond Simplified Regression for Better Localization}, 
	year    = {2021},
	volume  = {30},
	pages   = {5032-5044},
	doi     = {10.1109/TIP.2021.3077144}
}

Join libs.tech

...and unlock some superpowers

GitHub

We won't share your data with anyone else.