NobleBlocks

Institute of Automation

facilityBeijing, China

Research output, citation impact, and the most-cited recent papers from Institute of Automation (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
14.0K
Citations
1.1M
h-index
368
i10-index
15.0K
Also known as
Institute of Automation自动化研究所

Top-cited papers from Institute of Automation

Squeeze-and-Excitation Networks
Jie Hu, Li Shen, Samuel Albanie, Gang Sun +1 more
2019· IEEE Transactions on Pattern Analysis and Machine Intelligence12.4Kdoi:10.1109/tpami.2019.2913372

The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of ∼ 25 percent. Models and code are available at https://github.com/hujie-frank/SENet.

Dual Attention Network for Scene Segmentation
Jun Fu, Jing Liu, Haijie Tian, Yong Li +3 more
20196.8Kdoi:10.1109/cvpr.2019.00326

In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism. Unlike previous works that capture contexts by multi-scale features fusion, we propose a Dual Attention Networks (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attention module selectively aggregates the features at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results. We achieve new state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset. In particular, a Mean IoU score of 81.5% on Cityscapes test set is achieved without using coarse data.

The Human Brainnetome Atlas: A New Brain Atlas Based on Connectional Architecture
Lingzhong Fan, Hai Li, Junjie Zhuo, Yu Zhang +4 more
2016· Cerebral Cortex3.1Kdoi:10.1093/cercor/bhw157

The human brain atlases that allow correlating brain anatomy with psychological and cognitive functions are in transition from ex vivo histology-based printed atlases to digital brain maps providing multimodal in vivo information. Many current human brain atlases cover only specific structures, lack fine-grained parcellations, and fail to provide functionally important connectivity information. Using noninvasive multimodal neuroimaging techniques, we designed a connectivity-based parcellation framework that identifies the subdivisions of the entire human brain, revealing the in vivo connectivity architecture. The resulting human Brainnetome Atlas, with 210 cortical and 36 subcortical subregions, provides a fine-grained, cross-validated atlas and contains information on both anatomical and functional connections. Additionally, we further mapped the delineated structures to mental processes by reference to the BrainMap database. It thus provides an objective and stable starting point from which to explore the complex relationships between structure, connectivity, and function, and eventually improves understanding of how the human brain works. The human Brainnetome Atlas will be made freely available for download at http://atlas.brainnetome.org, so that whole brain parcellations, connections, and functional data will be readily available for researchers to use in their investigations into healthy and pathological states.

Action recognition by dense trajectories
Heng Wang, Alexander Kläser, Cordelia Schmid, Cheng‐Lin Liu
20112.2Kdoi:10.1109/cvpr.2011.5995407

Feature trajectories have shown to be efficient for representing videos. Typically, they are extracted using the KLT tracker or matching SIFT descriptors between frames. However, the quality as well as quantity of these trajectories is often not sufficient. Inspired by the recent success of dense sampling in image classification, we propose an approach to describe videos by dense trajectories. We sample dense points from each frame and track them based on displacement information from a dense optical flow field. Given a state-of-the-art optical flow algorithm, our trajectories are robust to fast irregular motions as well as shot boundaries. Additionally, dense trajectories cover the motion information in videos well. We, also, investigate how to design descriptors to encode the trajectory information. We introduce a novel descriptor based on motion boundary histograms, which is robust to camera motion. This descriptor consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos. We evaluate our video description in the context of action classification with a bag-of-features approach. Experimental results show a significant improvement over the state of the art on four datasets of varying difficulty, i.e. KTH, YouTube, Hollywood2 and UCF sports.

A Survey on Visual Surveillance of Object Motion and Behaviors
Wenhan Hu, T.N. Tan, Liang Wang, Stephen J. Maybank
2004· IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)2.1Kdoi:10.1109/tsmcc.2004.829274

Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of twoand three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance.

Data-Driven Intelligent Transportation Systems: A Survey
Junping Zhang, Fei‐Yue Wang, Kunfeng Wang, Wei-Hua Lin +2 more
2011· IEEE Transactions on Intelligent Transportation Systems1.8Kdoi:10.1109/tits.2011.2158001

For the last two decades, intelligent transportation systems (ITS) have emerged as an efficient way of improving the performance of transportation systems, enhancing travel security, and providing more choices to travelers. A significant change in ITS in recent years is that much more data are collected from a variety of sources and can be processed into various forms for different stakeholders. The availability of a large amount of data can potentially lead to a revolution in ITS development, changing an ITS from a conventional technology-driven system into a more powerful multifunctional data-driven intelligent transportation system (D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS) : a system that is vision, multisource, and learning algorithm driven to optimize its performance. Furthermore, D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS is trending to become a privacy-aware people-centric more intelligent system. In this paper, we provide a survey on the development of D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS, discussing the functionality of its key components and some deployment issues associated with D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS Future research directions for the development of D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS is also presented.

Fast Online Object Tracking and Segmentation: A Unifying Approach
Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu +1 more
20191.5Kdoi:10.1109/cvpr.2019.00142

In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. Our method, dubbed SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task. Once trained, SiamMask solely relies on a single bounding box initialisation and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second. Despite its simplicity, versatility and fast speed, our strategy allows us to establish a new state-of-the-art among real-time trackers on VOT-2018, while at the same time demonstrating competitive performance and the best speed for the semi-supervised video object segmentation task on DAVIS-2016 and DAVIS-2017.

Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks
Daojian Zeng, Kang Liu, Yubo Chen, Jun Zhao
20151.2Kdoi:10.18653/v1/d15-1203

Two problems arise when using distant supervision for relation extraction. First, in this method, an already existing knowledge base is heuristically aligned to texts, and the alignment results are treated as labeled data. However, the heuristic alignment can fail, resulting in wrong label problem. In addition, in previous approaches, statistical models have typically been applied to ad hoc features. The noise that originates from the feature extraction process can cause poor performance.

Silhouette analysis-based gait recognition for human identification
Liang Wang, Tieniu Tan, Huazhong Ning, Weiming Hu
2003· IEEE Transactions on Pattern Analysis and Machine Intelligence1.2Kdoi:10.1109/tpami.2003.1251144

Human identification at a distance has recently gained growing interest from computer vision researchers. Gait recognition aims essentially to address this problem by identifying people based on the way they walk. In this paper, a simple but efficient gait recognition algorithm using spatial-temporal silhouette analysis is proposed. For each image sequence, a background subtraction algorithm and a simple correspondence procedure are first used to segment and track the moving silhouettes of a walking figure. Then, eigenspace transformation based on principal component analysis (PCA) is applied to time-varying distance signals derived from a sequence of silhouette images to reduce the dimensionality of the input feature space. Supervised pattern classification techniques are finally performed in the lower-dimensional eigenspace for recognition. This method implicitly captures the structural and transitional characteristics of gait. Extensive experimental results on outdoor image sequences demonstrate that the proposed algorithm has an encouraging recognition performance with relatively low computational cost.

Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification
Weihua Chen, Xiaotang Chen, Jianguo Zhang, Kaiqi Huang
20171.2Kdoi:10.1109/cvpr.2017.145

Person re-identification (ReID) is an important task in wide area video surveillance which focuses on identifying people across different cameras. Recently, deep learning networks with a triplet loss become a common framework for person ReID. However, the triplet loss pays main attentions on obtaining correct orders on the training set. It still suffers from a weaker generalization capability from the training set to the testing set, thus resulting in inferior performance. In this paper, we design a quadruplet loss, which can lead to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss. As a result, our model has a better generalization ability and can achieve a higher performance on the testing set. In particular, a quadruplet deep network using a margin-based online hard negative mining is proposed based on the quadruplet loss for the person ReID. In extensive experiments, the proposed network outperforms most of the state-of-the-art algorithms on representative datasets which clearly demonstrates the effectiveness of our proposed method.

HMDD v2.0: a database for experimentally supported human microRNA and disease associations
Yang Li, Chengxiang Qiu, Jian Tu, Bin Geng +3 more
2013· Nucleic Acids Research1.2Kdoi:10.1093/nar/gkt1023

Comprehensive databases of microRNA-disease associations are continuously demanded in biomedical researches. The recently launched version 3.0 of Human MicroRNA Disease Database (HMDD v3.0) manually collects a significant number of miRNA-disease association entries from literature. Comparing to HMDD v2.0, this new version contains 2-fold more entries. Besides, the associations have been more accurately classified based on literature-derived evidence code, which results in six generalized categories (genetics, epigenetics, target, circulation, tissue and other) covering 20 types of detailed evidence code. Furthermore, we added new functionalities like network visualization on the web interface. To exemplify the utility of the database, we compared the disease spectrum width of miRNAs (DSW) and the miRNA spectrum width of human diseases (MSW) between version 3.0 and 2.0 of HMDD. HMDD is freely accessible at http://www.cuilab.cn/hmdd. With accumulating evidence of miRNA-disease associations, HMDD database will keep on growing in the future.

Efficient Image Dehazing with Boundary Constraint and Contextual Regularization
Gaofeng Meng, Ying Wang, Jiangyong Duan, Shiming Xiang +1 more
20131.2Kdoi:10.1109/iccv.2013.82

Images captured in foggy weather conditions often suffer from bad visibility. In this paper, we propose an efficient regularization method to remove hazes from a single input image. Our method benefits much from an exploration on the inherent boundary constraint on the transmission function. This constraint, combined with a weighted L_1-norm based contextual regularization, is modeled into an optimization problem to estimate the unknown scene transmission. A quite efficient algorithm based on variable splitting is also presented to solve the problem. The proposed method requires only a few general assumptions and can restore a high-quality haze-free image with faithful colors and fine image details. Experimental results on a variety of haze images demonstrate the effectiveness and efficiency of the proposed method.

A Light CNN for Deep Face Representation With Noisy Labels
Xiang Wu, Ran He, Zhenan Sun, Tieniu Tan
2018· IEEE Transactions on Information Forensics and Security1.1Kdoi:10.1109/tifs.2018.2833032

The volume of convolutional neural network (CNN) models proposed for face recognition has been continuously growing larger to better fit the large amount of training data. When training data are obtained from the Internet, the labels are likely to be ambiguous and inaccurate. This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels. First, we introduce a variation of maxout activation, called max-feature-map (MFM), into each convolutional layer of CNN. Different from maxout activation that uses many feature maps to linearly approximate an arbitrary convex activation function, MFM does so via a competitive relationship. MFM can not only separate noisy and informative signals but also play the role of feature selection between two feature maps. Second, three networks are carefully designed to obtain better performance, meanwhile, reducing the number of parameters and computational costs. Finally, a semantic bootstrapping method is proposed to make the prediction of the networks more consistent with noisy labels. Experimental results show that the proposed framework can utilize large-scale noisy data to learn a Light model that is efficient in computational costs and storage spaces. The learned single network with a 256-D representation achieves state-of-the-art results on various face benchmarks without fine-tuning.

Robust Image Segmentation Using FCM With Spatial Constraints Based on New Kernel-Induced Distance Measure
S. Chen, David Zhang
2004· IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)1.1Kdoi:10.1109/tsmcb.2004.831165

Fuzzy c-means clustering (FCM) with spatial constraints (FCM_S) is an effective algorithm suitable for image segmentation. Its effectiveness contributes not only to the introduction of fuzziness for belongingness of each pixel but also to exploitation of spatial contextual information. Although the contextual information can raise its insensitivity to noise to some extent, FCM_S still lacks enough robustness to noise and outliers and is not suitable for revealing non-Euclidean structure of the input data due to the use of Euclidean distance (L2 norm). In this paper, to overcome the above problems, we first propose two variants, FCM_S1 and FCM_S2, of FCM_S to aim at simplifying its computation and then extend them, including FCM_S, to corresponding robust kernelized versions KFCM_S, KFCM_S1 and KFCM_S2 by the kernel methods. Our main motives of using the kernel methods consist in: inducing a class of robust non-Euclidean distance measures for the original data space to derive new objective functions and thus clustering the non-Euclidean structures in data; enhancing robustness of the original clustering algorithms to noise and outliers, and still retaining computational simplicity. The experiments on the artificial and real-world datasets show that our proposed algorithms, especially with spatial constraints, are more effective.

Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
Linhao Dong, Shuang Xu, Bo Xu
20181.1Kdoi:10.1109/icassp.2018.8462506

Recurrent sequence-to-sequence models using encoder-decoder architecture have made great progress in speech recognition task. However, they suffer from the drawback of slow training speed because the internal recurrence limits the training parallelization. In this paper, we present the Speech-Transformer, a no-recurrence sequence-to-sequence model entirely relies on attention mechanisms to learn the positional dependencies, which can be trained faster with more efficiency. We also propose a 2D-Attention mechanism, which can jointly attend to the time and frequency axes of the 2-dimensional speech inputs, thus providing more expressive representations for the Speech-Transformer. Evaluated on the Wall Street Journal (WSJ) speech recognition dataset, our best model achieves competitive word error rate (WER) of 10.9%, while the whole training process only takes 1.2 days on 1 GPU, significantly faster than the published results of recurrent sequence-to-sequence models.

Disrupted small-world networks in schizophrenia
Yong Liu, Meng Liang, Yuan Zhou, Yong He +4 more
2008· Brain1.1Kdoi:10.1093/brain/awn018

The human brain has been described as a large, sparse, complex network characterized by efficient small-world properties, which assure that the brain generates and integrates information with high efficiency. Many previous neuroimaging studies have provided consistent evidence of 'dysfunctional connectivity' among the brain regions in schizophrenia; however, little is known about whether or not this dysfunctional connectivity causes disruption of the topological properties of brain functional networks. To this end, we investigated the topological properties of human brain functional networks derived from resting-state functional magnetic resonance imaging (fMRI). Data was obtained from 31 schizophrenia patients and 31 healthy subjects; then functional connectivity between 90 cortical and sub-cortical regions was estimated by partial correlation analysis and thresholded to construct a set of undirected graphs. Our findings demonstrated that the brain functional networks had efficient small-world properties in the healthy subjects; whereas these properties were disrupted in the patients with schizophrenia. Brain functional networks have efficient small-world properties which support efficient parallel information transfer at a relatively low cost. More importantly, in patients with schizophrenia the small-world topological properties are significantly altered in many brain regions in the prefrontal, parietal and temporal lobes. These findings are consistent with a hypothesis of dysfunctional integration of the brain in this illness. Specifically, we found that these altered topological measurements correlate with illness duration in schizophrenia. Detection and estimation of these alterations could prove helpful for understanding the pathophysiological mechanism as well as for evaluation of the severity of schizophrenia.

Graph Contrastive Learning with Adaptive Augmentation
Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu +2 more
20211.0Kdoi:10.1145/3442381.3449802

Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes—a crucial component in CL—remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.

A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition
Shiqi Yu, Daoliang Tan, Tieniu Tan
20061.0Kdoi:10.1109/icpr.2006.67

Gait recognition has gained increasing interest from researchers, but there is still no standard evaluation method to compare the performance of different gait recognition algorithms. In this paper, a framework is proposed in an attempt to tackle this problem. The framework consists of a large gait database, a large set of well designed experiments and some evaluation metrics. There are 124 subjects in the database, and the gait data was captured from 11 views. Three variations, namely view angle, clothing and carrying condition changes, are separately considered in the database. The database is one of the largest database among the existing databases. Three sets of experiments, including a total of 363 experiments, are designed in the framework. Some metrics are proposed to evaluate gait recognition algorithms

An SVD-based watermarking scheme for protecting rightful ownership
Ruizhen Liu, Tieniu Tan
2002· IEEE Transactions on Multimedia990doi:10.1109/6046.985560

Digital watermarking has been proposed as a solution to the problem of copyright protection of multimedia documents in networked environments. There are two important issues that watermarking algorithms need to address. First, watermarking schemes are required to provide trustworthy evidence for protecting rightful ownership. Second, good watermarking schemes should satisfy the requirement of robustness and resist distortions due to common image manipulations (such as filtering, compression, etc.). In this paper, we propose a novel watermarking algorithm based on singular value decomposition (SVD). Analysis and experimental results show that the new watermarking method performs well in both security and robustness.

Personal identification based on iris texture analysis
Li Ma, Tieniu Tan, Yunhong Wang, Dexin Zhang
2003· IEEE Transactions on Pattern Analysis and Machine Intelligence956doi:10.1109/tpami.2003.1251145

With an increasing emphasis on security, automated personal identification based on biometrics has been receiving extensive attention over the past decade. Iris recognition, as an emerging biometric recognition approach, is becoming a very active topic in both research and practical applications. In general, a typical iris recognition system includes iris imaging, iris liveness detection, and recognition. This paper focuses on the last issue and describes a new scheme for iris recognition from an image sequence. We first assess the quality of each image in the input sequence and select a clear iris image from such a sequence for subsequent recognition. A bank of spatial filters, whose kernels are suitable for iris recognition, is then used to capture local characteristics of the iris so as to produce discriminating texture features. Experimental results show that the proposed method has an encouraging performance. In particular, a comparative study of existing methods for iris recognition is conducted on an iris image database including 2,255 sequences from 213 subjects. Conclusions based on such a comparison using a nonparametric statistical method (the bootstrap) provide useful information for further research.