NobleBlocks

Peng Cheng Laboratory

facilityShenzhen, China

Research output, citation impact, and the most-cited recent papers from Peng Cheng Laboratory (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
15.7K
Citations
528.9K
h-index
247
i10-index
9.7K
Also known as
Peng Cheng LaboratoryShenzhen Cyberspace Science and Technology Laboratory鹏城实验室

Top-cited papers from Peng Cheng Laboratory

A Survey on Multi-Task Learning
Yu Zhang, Qiang Yang
2021· IEEE Transactions on Knowledge and Data Engineering2.1Kdoi:10.1109/tkde.2021.3070203

Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL from the perspective of algorithmic modeling, applications and theoretical analyses. For algorithmic modeling, we give a definition of MTL and then classify different MTL algorithms into five categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach and decomposition approach as well as discussing the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, we review online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works in this paper. Finally, we present theoretical analyses and discuss several future directions for MTL.

The Roadmap to 6G: AI Empowered Wireless Networks
Khaled B. Letaief, Wei Chen, Yuanming Shi, Jun Zhang +1 more
2019· Rare & Special e-Zone (The Hong Kong University of Science and Technology)2.0Kdoi:10.1109/mcom.2019.1900271

The recent upsurge of diversified mobile applications, especially those supported by AI, is spurring heated discussions on the future evolution of wireless communications. While 5G is being deployed around the world, efforts from industry and academia have started to look beyond 5G and conceptualize 6G. We envision 6G to undergo an unprecedented transformation that will make it substantially different from the previous generations of wireless cellular systems. In particular, 6G will go beyond mobile Internet and will be required to support ubiquitous AI services from the core to the end devices of the network. Meanwhile, AI will play a critical role in designing and optimizing 6G architectures, protocols, and operations. In this article, we discuss potential technologies for 6G to enable mobile AI applications, as well as AI-enabled methodologies for 6G network design and optimization. Key trends in the evolution to 6G will also be discussed.

Pre-Trained Image Processing Transformer
Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu +4 more
20212.0Kdoi:10.1109/cvpr46437.2021.01212

As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at https://github.com/huawei-noah/Pretrained-IPT and https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/IPT

Second-Order Attention Network for Single Image Super-Resolution
Tao Dai, Jianrui Cai, Yongbing Zhang, Shu‐Tao Xia +1 more
20191.9Kdoi:10.1109/cvpr.2019.01132

Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and obtained remarkable performance. However, most of the existing CNN-based SISR methods mainly focus on wider or deeper architecture design, neglecting to explore the feature correlations of intermediate layers, hence hindering the representational power of CNNs. To address this issue, in this paper, we propose a second-order attention network (SAN) for more powerful feature expression and feature correlation learning. Specifically, a novel train- able second-order channel attention (SOCA) module is developed to adaptively rescale the channel-wise features by using second-order feature statistics for more discriminative representations. Furthermore, we present a non-locally enhanced residual group (NLRG) structure, which not only incorporates non-local operations to capture long-distance spatial contextual information, but also contains repeated local-source residual attention groups (LSRAG) to learn increasingly abstract feature representations. Experimental results demonstrate the superiority of our SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Jun Liu, Amir Shahroudy, Mauricio Pérez, Gang Wang +2 more
2019· IEEE Transactions on Pattern Analysis and Machine Intelligence1.7Kdoi:10.1109/tpami.2019.2916873

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.

Toward Convolutional Blind Denoising of Real Photographs
Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo +1 more
20191.2Kdoi:10.1109/cvpr.2019.00181

While deep convolutional neural networks (CNNs) have achieved impressive success in image denoising with additive white Gaussian noise (AWGN), their performance remains limited on real-world noisy photographs. The main reason is that their learned models are easy to overfit on the simplified AWGN model which deviates severely from the complicated real-world noise model. In order to improve the generalization ability of deep CNN denoisers, we suggest training a convolutional blind denoising network (CBDNet) with more realistic noise model and real-world noisy-clean image pairs. On the one hand, both signal-dependent noise and in-camera signal processing pipeline is considered to synthesize realistic noisy images. On the other hand, real-world noisy photographs and their nearly noise-free counterparts are also included to train our CBDNet. To further provide an interactive strategy to rectify denoising result conveniently, a noise estimation subnetwork with asymmetric learning to suppress under-estimation of noise level is embedded into CBDNet. Extensive experimental results on three datasets of real-world noisy photographs clearly demonstrate the superior performance of CBDNet over state-of-the-arts in terms of quantitative met- rics and visual quality. The code has been made available at https://github.com/GuoShi28/CBDNet.

Attention on Attention for Image Captioning
Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei
20191.0Kdoi:10.1109/iccv.2019.00473

Attention mechanisms are widely used in current encoder/decoder frameworks of image captioning, where a weighted average on encoded vectors is generated at each time step to guide the caption decoding process. However, the decoder has little idea of whether or how well the attended vector and the given attention query are related, which could make the decoder give misled results. In this paper, we propose an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries. AoA first generates an information vector and an attention gate using the attention result and the current context, then adds another attention by applying element-wise multiplication to them and finally obtains the attended information, the expected useful knowledge. We apply AoA to both the encoder and the decoder of our image captioning model, which we name as AoA Network (AoANet). Experiments show that AoANet outperforms all previously published methods and achieves a new state-of-the-art performance of 129.8 CIDEr-D score on MS COCO Karpathy offline test split and 129.6 CIDEr-D (C40) score on the official online testing server. Code is available at https://github.com/husthuaan/AoANet.

Siamese Box Adaptive Network for Visual Tracking
Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang +1 more
2020977doi:10.1109/cvpr42600.2020.00670

Most of the existing trackers usually rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. Unfortunately, they typically call for tedious and heuristic configurations. To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN). SiamBAN views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN. The no-prior box design avoids hyper-parameters associated with the candidate boxes, making SiamBAN more flexible and general. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019, OTB100, NFS, UAV123, and LaSOT demonstrate that SiamBAN achieves state-of-the-art performance and runs at 40 FPS, confirming its effectiveness and efficiency. The code will be available at https://github.com/hqucv/siamban.

FedHealth: A Federated Transfer Learning Framework for Wearable Healthcare
Yiqiang Chen, Xin Qin, Jindong Wang, Chaohui Yu +1 more
2020· IEEE Intelligent Systems965doi:10.1109/mis.2020.2988604

With the rapid development of computing technology, wearable devices make it easy to get access to people's health information. Smart healthcare achieves great success by training machine learning models on a large quantity of user personal data. However, there are two critical challenges. First, user data often exist in the form of isolated islands, making it difficult to perform aggregation without compromising privacy security. Second, the models trained on the cloud fail on personalization. In this article, we propose FedHealth, the first federated transfer learning framework for wearable healthcare to tackle these challenges. FedHealth performs data aggregation through federated learning, and then builds relatively personalized models by transfer learning. Wearable activity recognition experiments and real Parkinson's disease auxiliary diagnosis application have evaluated that FedHealth is able to achieve accurate and personalized healthcare without compromising privacy and security. FedHealth is general and extensible in many healthcare applications.

Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu +3 more
2022· 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)941doi:10.1109/cvpr52688.2022.00571

This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Aiming at generating an image of high visual quality, previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. These approaches neglect that modality differences implying the complementary information are extremely important for both fusion and subsequent detection task. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network. The fusion network with one generator and dual discriminators seeks commons while learning from differences, which preserves structural information of targets from the infrared and textural details from the visible. Furthermore, we build a synchronized imaging system with calibrated infrared and optical sensors, and collect currently the most comprehensive benchmark covering a wide range of scenarios. Extensive experiments on several public datasets and our benchmark demonstrate that our method outputs not only visually appealing fusion but also higher detection mAP than the state-of-the-art approaches. The source code and benchmark are available at https://github.com/dlut-dimt/TarDAL.

Toward Fast, Flexible, and Robust Low-Light Image Enhancement
Long Ma, Tengyu Ma, Risheng Liu, Xin Fan +1 more
2022· 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)939doi:10.1109/cvpr52688.2022.00555

Existing low-light image enhancement techniques are mostly not only difficult to deal with both visual quality and computational efficiency but also commonly invalid in unknown complex scenarios. In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. To be specific, we establish a cascaded illumination learning process with weight sharing to handle this task. Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage, producing the gains that only use the single basic block for inference (yet has not been exploited in previous works), which drastically diminishes computation cost. We then define the unsupervised training loss to elevate the model capability that can adapt general scenes. Further, we make comprehensive explorations to excavate SCI's inherent properties (lacking in existing works) including operation-insensitive adaptability (acquiring stable performance under the settings of different simple operations) and model-irrelevant generality (can be applied to illumination-based existing works to improve performance). Finally, plenty of experiments and ablation studies fully indicate our superiority in both quality and efficiency. Applications on low-light face detection and nighttime semantic segmentation fully reveal the latent practical values for SCI. The source code is available at https://github.com/vis-opt-group/SCI.

Deep Residual Learning for Image Recognition: A Survey
Muhammad Shafiq, Zhaoquan Gu
2022· Applied Sciences910doi:10.3390/app12188972

Deep Residual Networks have recently been shown to significantly improve the performance of neural networks trained on ImageNet, with results beating all previous methods on this dataset by large margins in the image classification task. However, the meaning of these impressive numbers and their implications for future research are not fully understood yet. In this survey, we will try to explain what Deep Residual Networks are, how they achieve their excellent results, and why their successful implementation in practice represents a significant advance over existing techniques. We also discuss some open questions related to residual learning as well as possible applications of Deep Residual Networks beyond ImageNet. Finally, we discuss some issues that still need to be resolved before deep residual learning can be applied on more complex problems.

Toward Smart Wireless Communications via Intelligent Reflecting Surfaces: A Contemporary Survey
Shimin Gong, Xiao Lu, Dinh Thai Hoang, Dusit Niyato +3 more
2020· IEEE Communications Surveys & Tutorials905doi:10.1109/comst.2020.3004197

This paper presents a literature review on recent applications and design aspects of the intelligent reflecting surface (IRS) in the future wireless networks. Conventionally, the network optimization has been limited to transmission control at two endpoints, i.e., end users and network controller. The fading wireless channel is uncontrollable and becomes one of the main limiting factors for performance improvement. The IRS is composed of a large array of scattering elements, which can be individually configured to generate additional phase shifts to the signal reflections. Hence, it can actively control the signal propagation properties in favor of signal reception, and thus realize the notion of a smart radio environment. As such, the IRS's phase control, combined with the conventional transmission control, can potentially bring performance gain compared to wireless networks without IRS. In this survey, we first introduce basic concepts of the IRS and the realizations of its reconfigurability. Then, we focus on applications of the IRS in wireless communications. We overview different performance metrics and analytical approaches to characterize the performance improvement of IRS-assisted wireless networks. To exploit the performance gain, we discuss the joint optimization of the IRS's phase control and the transceivers' transmission control in different network design problems, e.g., rate maximization and power minimization problems. Furthermore, we extend the discussion of IRS-assisted wireless networks to some emerging use cases. Finally, we highlight important practical challenges and future research directions for realizing IRS-assisted wireless networks in beyond 5G communications.

HRank: Filter Pruning Using High-Rank Feature Map
Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang +3 more
2020858doi:10.1109/cvpr42600.2020.00160

Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.

Multi-Scale Interactive Network for Salient Object Detection
Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
2020843doi:10.1109/cvpr42600.2020.00943

Deep-learning based salient object detection methods achieve great progress. However, the variable scale and unknown category of salient objects are great challenges all the time. These are closely related to the utilization of multi-level and multi-scale features. In this paper, we propose the aggregate interaction modules to integrate the features from adjacent levels, in which less noise is introduced because of only using small up-/down-sampling rates. To obtain more efficient multi-scale features from the integrated features, the self-interaction modules are embedded in each decoder unit. Besides, the class imbalance issue caused by the scale variation weakens the effect of the binary cross entropy loss and results in the spatial inconsistency of the predictions. Therefore, we exploit the consistency-enhanced loss to highlight the fore-/back-ground difference and preserve the intra-class consistency. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches. The source code will be publicly available at https://github.com/lartpang/MINet.

A Survey on Deep Semi-Supervised Learning
Xiangli Yang, Zixing Song, Irwin King, Zenglin Xu
2022· IEEE Transactions on Knowledge and Data Engineering814doi:10.1109/tkde.2022.3220219

Deep semi-supervised learning is a fast-growing field with a range of practical applications. This paper provides a comprehensive survey on both fundamentals and recent advances in deep semi-supervised learning methods from perspectives of model design and unsupervised loss functions. We first present a taxonomy for deep semi-supervised learning that categorizes existing methods, including deep generative methods, consistency regularization methods, graph-based methods, pseudo-labeling methods, and hybrid methods. Then we provide a comprehensive review of 60 representative methods and offer a detailed comparison of these methods in terms of the type of losses, architecture differences, and test performance results. In addition to the progress in the past few years, we further discuss some shortcomings of existing methods and provide some tentative heuristic solutions for solving these open problems.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction
Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao +1 more
2021· npj Digital Medicine811doi:10.1038/s41746-021-00455-y

Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21-6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.

VerifyNet: Secure and Verifiable Federated Learning
Guowen Xu, Hongwei Li, Sen Liu, Kan Yang +1 more
2019· IEEE Transactions on Information Forensics and Security784doi:10.1109/tifs.2019.2929409

As an emerging training model with neural networks, federated learning has received widespread attention due to its ability to update parameters without collecting users' raw data. However, since adversaries can track and derive participants' privacy from the shared gradients, federated learning is still exposed to various security and privacy threats. In this paper, we consider two major issues in the training process over deep neural networks (DNNs): 1) how to protect user's privacy (i.e., local gradients) in the training process and 2) how to verify the integrity (or correctness) of the aggregated results returned from the server. To solve the above problems, several approaches focusing on secure or privacy-preserving federated learning have been proposed and applied in diverse scenarios. However, it is still an open problem enabling clients to verify whether the cloud server is operating correctly, while guaranteeing user's privacy in the training process. In this paper, we propose VerifyNet, the first privacy-preserving and verifiable federated learning framework. In specific, we first propose a double-masking protocol to guarantee the confidentiality of users' local gradients during the federated learning. Then, the cloud server is required to provide the Proof about the correctness of its aggregated results to each user. We claim that it is impossible that an adversary can deceive users by forging Proof, unless it can solve the NP-hard problem adopted in our model. In addition, VerifyNet is also supportive of users dropping out during the training process. The extensive experiments conducted on real-world data also demonstrate the practical performance of our proposed scheme.

Conformer: Local Features Coupling Global Representations for Visual Recognition
Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie +3 more
2021· 2021 IEEE/CVF International Conference on Computer Vision (ICCV)778doi:10.1109/iccv48922.2021.00042

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network. Code is available at github.com/pengzhiliang/Conformer.

Real-World Underwater Enhancement: Challenges, Benchmarks, and Solutions Under Natural Light
Risheng Liu, Xin Fan, Ming Zhu, Minjun Hou +1 more
2020· IEEE Transactions on Circuits and Systems for Video Technology777doi:10.1109/tcsvt.2019.2963772

Underwater image enhancement is such an important low-level vision task with many applications that numerous algorithms have been proposed in recent years. These algorithms developed upon various assumptions demonstrate successes from various aspects using different data sets and different metrics. In this work, we setup an undersea image capturing system, and construct a large-scale Real-world Underwater Image Enhancement (RUIE) data set divided into three subsets. The three subsets target at three challenging aspects for enhancement, i.e., image visibility quality, color casts, and higher-level detection/classification, respectively. We conduct extensive and systematic experiments on RUIE to evaluate the effectiveness and limitations of various algorithms to enhance visibility and correct color casts on images with hierarchical categories of degradation. Moreover, underwater image enhancement in practice usually serves as a preprocessing step for mid-level and high-level vision tasks. We thus exploit the object detection performance on enhanced images as a brand new task-specific evaluation criterion. The findings from these evaluations not only confirm what is commonly believed, but also suggest promising solutions and new directions for visibility enhancement, color correction, and object detection on real-world underwater images. The benchmark is available at: https://github.com/dlut-dimt/Realworld-Underwater-Image-Enhancement-RUIE-Benchmark.