Xi'an Institute of Optics and Precision Mechanics
facilityXi'an, China
Research output, citation impact, and the most-cited recent papers from Xi'an Institute of Optics and Precision Mechanics (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Xi'an Institute of Optics and Precision Mechanics
Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various data sets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning data sets and methods for scene classification is still lacking. In addition, almost all existing data sets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale data set, termed “NWPU-RESISC45,” which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This data set contains 31 500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 1) is large-scale on the scene classes and the total image number; 2) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion; and 3) has high within-class diversity and between-class similarity. The creation of this data set will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed data set, and the results are reported as a useful baseline for future research.
Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become an active task in the remote sensing area, and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing data sets for aerial scene classification, such as UC-Merced data set and WHU-RS19, contain relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image data set (AID): a large-scale data set for aerial scene classification. The goal of AID is to advance the state of the arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than 10000 aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark.
nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed value (even though set by experts) to all test samples. Previous solutions assign different values to different test samples by the cross validation method but are usually time-consuming. This paper proposes a kTree method to learn different optimal values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal values. In the test stage, the kTree fast outputs the optimal value for each test sample, and then, the kNN classification can be conducted using the learned optimal value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different values to different test samples. This paper further proposes an improvement version of kTree method (namely, k*Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k*Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k*Tree) are much more efficient than the compared methods in terms of classification tasks.
Recent technological advancements have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The term big data was coined to capture the meaning of this emerging trend. In addition to its sheer volume, big data also exhibits other unique characteristics as compared with traditional data. For instance, big data is commonly unstructured and require more real-time analysis. This development calls for new system architectures for data acquisition, transmission, storage, and large-scale data processing mechanisms. In this paper, we present a literature survey and system tutorial for big data analytics platforms, aiming to provide an overall picture for nonexpert readers and instill a do-it-yourself spirit for advanced audiences to customize their own big-data solutions. First, we present the definition of big data and discuss big data challenges. Next, we present a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics. These four modules form a big data value chain. Following that, we present a detailed survey of numerous approaches and mechanisms from research and industry communities. In addition, we present the prevalent Hadoop framework for addressing big data challenges. Finally, we outline several evaluation benchmarks and potential research directions for big data systems.
Sparse representation has attracted much attention from researchers in fields of signal processing, image processing, computer vision, and pattern recognition. Sparse representation also has a good reputation in both theoretical research and practical applications. Many different algorithms have been proposed for sparse representation. The main purpose of this paper is to provide a comprehensive study and an updated review on sparse representation and to supply guidance for researchers. The taxonomy of sparse representation methods can be studied from various viewpoints. For example, in terms of different norm minimizations used in sparsity constraints, the methods can be roughly categorized into five groups: 1) sparse representation with l <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> -norm minimization; 2) sparse representation with lp-norm (0 <; p <; 1) minimization; 3) sparse representation with l <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> -norm minimization; 4) sparse representation with l <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> ,1-norm minimization; and 5) sparse representation with l2-norm minimization. In this paper, a comprehensive overview of sparse representation is provided. The available sparse representation algorithms can also be empirically categorized into four groups: 1) greedy strategy approximation; 2) constrained optimization; 3) proximity algorithm-based optimization; and 4) homotopy algorithm-based sparse representation. The rationales of different algorithms in each category are analyzed and a wide range of sparse representation applications are summarized, which could sufficiently reveal the potential nature of the sparse representation theory. In particular, an experimentally comparative study of these sparse representation algorithms was presented.
Regular machine learning and data mining techniques study the training data for future inferences under a major assumption that the future data are within the same feature space or have the same distribution as the training data. However, due to the limited availability of human labeled training data, training data that stay in the same feature space or have the same distribution as the future data cannot be guaranteed to be sufficient enough to avoid the over-fitting problem. In real-world applications, apart from data in the target domain, related data in a different domain can also be included to expand the availability of our prior knowledge about the target future data. Transfer learning addresses such cross-domain learning problems by extracting useful information from data in a related domain and transferring them for being used in target tasks. In recent years, with transfer learning being applied to visual categorization, some typical problems, e.g., view divergence in action recognition tasks and concept drifting in image classification tasks, can be efficiently solved. In this paper, we survey state-of-the-art transfer learning algorithms in visual categorization applications, such as object recognition, image classification, and human action recognition.
Complex optical photon states with entanglement shared among several modes are critical to improving our fundamental understanding of quantum mechanics and have applications for quantum information processing, imaging, and microscopy. We demonstrate that optical integrated Kerr frequency combs can be used to generate several bi- and multiphoton entangled qubits, with direct applications for quantum communication and computation. Our method is compatible with contemporary fiber and quantum memory infrastructures and with chip-scale semiconductor technology, enabling compact, low-cost, and scalable implementations. The exploitation of integrated Kerr frequency combs, with their ability to generate multiple, customizable, and complex quantum states, can provide a scalable, practical, and compact platform for quantum technologies.
Recovering a large matrix from a small subset of its entries is a challenging problem arising in many real applications, such as image inpainting and recommender systems. Many existing approaches formulate this problem as a general low-rank matrix approximation problem. Since the rank operator is nonconvex and discontinuous, most of the recent theoretical studies use the nuclear norm as a convex relaxation. One major limitation of the existing approaches based on nuclear norm minimization is that all the singular values are simultaneously minimized, and thus the rank may not be well approximated in practice. In this paper, we propose to achieve a better approximation to the rank of matrix by truncated nuclear norm, which is given by the nuclear norm subtracted by the sum of the largest few singular values. In addition, we develop a novel matrix completion algorithm by minimizing the Truncated Nuclear Norm. We further develop three efficient iterative procedures, TNNR-ADMM, TNNR-APGL, and TNNR-ADMMAP, to solve the optimization problem. TNNR-ADMM utilizes the alternating direction method of multipliers (ADMM), while TNNR-AGPL applies the accelerated proximal gradient line search method (APGL) for the final optimization. For TNNR-ADMMAP, we make use of an adaptive penalty according to a novel update rule for ADMM to achieve a faster convergence rate. Our empirical study shows encouraging results of the proposed algorithms in comparison to the state-of-the-art matrix completion algorithms on both synthetic and real visual datasets.
Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/non-text classification and location regression. Regression plays a key role in the acquisition of bounding boxes in these methods, but it is not indispensable because text/non-text prediction can also be considered as a kind of semantic segmentation that contains full location information in itself. However, text instances in scene images often lie very close to each other, making them very difficult to separate via semantic segmentation. Therefore, instance segmentation is needed to address this problem. In this paper, PixelLink, a novel scene text detection algorithm based on instance segmentation, is proposed. Text instances are first segmented out by linking pixels within the same instance together. Text bounding boxes are then extracted directly from the segmentation result without location regression. Experiments show that, compared with regression-based methods, PixelLink can achieve better or comparable performance on several benchmarks, while requiring many fewer training iterations and less training data.
Over the past decade, optical frequency combs generated by high-Q microresonators, or optical microcombs, which feature compact device footprints, low power consumption, and high repetition rates in broad optical bandwidths, have led to a revolution in a wide range of fields including metrology, telecommunications, radio frequency (RF) photonics, spectroscopy, sensing, and quantum optics. Among these, an application that has attracted great interest is the use of optical microcombs for RF photonics, where they offer enhanced functionalities as well as reduced size and power consumption over other approaches. This paper reviews the recent advances in this emerging field. We provide an overview of the main achievements that have been obtained to date, and highlight the strong potential of optical microcombs for RF photonics applications. We also discuss some of the open challenges and limitations that need to be addressed for practical applications.
For many computer vision applications, the data sets distribute on certain low-dimensional subspaces. Subspace clustering is to find such underlying subspaces and cluster the data points correctly. In this paper, we propose a novel multi-view subspace clustering method. The proposed method performs clustering on the subspace representation of each view simultaneously. Meanwhile, we propose to use a common cluster structure to guarantee the consistence among different views. In addition, an efficient algorithm is proposed to solve the problem. Experiments on four benchmark data sets have been performed to validate our proposed method. The promising results demonstrate the effectiveness of our method.
Real-time spectroscopy access to ultrafast fiber lasers opens new opportunities for exploring complex soliton interaction dynamics. Here, we have reported the first observation, to the best of our knowledge, of the entire buildup process of soliton molecules (SMs) in a mode-locked laser. We have observed that the birth dynamics of a stable SM experiences five different stages, i.e., the raised relaxation oscillation (RO) stage, beating dynamics stage, transient single pulse stage, transient bound state, and finally the stable bound state. We have discovered that the evolution of pulses in the raised RO stage follows a law that only the strongest one can ultimately survive and, meanwhile, the pulses periodically appear at the same temporal positions for all lasing spikes during the same RO stage (named as memory ability) but they lose such ability between different RO stages. Moreover, we have found that the buildup dynamics of SMs is quite sensitive to both the polarization state of intracavity light and the fluctuation of pump power. These results provide new perspectives into the ultrafast transient process in mode-locked lasers and the dynamics of complex nonlinear systems.
Image super-resolution (SR) reconstruction is essentially an ill-posed problem, so it is important to design an effective prior. For this purpose, we propose a novel image SR method by learning both non-local and local regularization priors from a given low-resolution image. The non-local prior takes advantage of the redundancy of similar patches in natural images, while the local prior assumes that a target pixel can be estimated by a weighted average of its neighbors. Based on the above considerations, we utilize the non-local means filter to learn a non-local prior and the steering kernel regression to learn a local prior. By assembling the two complementary regularization terms, we propose a maximum a posteriori probability framework for SR recovery. Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually.
Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, notable progress has been made in scene classification and target detection. However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at https://github.com/201528014227051/RSICD_optimal.
Feature selection has aroused considerable research interests during the last few decades. Traditional learning-based feature selection methods separate embedding learning and feature ranking. In this paper, we propose a novel unsupervised feature selection framework, termed as the joint embedding learning and sparse regression (JELSR), in which the embedding learning and sparse regression are jointly performed. Specifically, the proposed JELSR joins embedding learning with sparse regression to perform feature selection. To show the effectiveness of the proposed framework, we also provide a method using the weight via local linear approximation and adding the l2,1 -norm regularization, and design an effective algorithm to solve the corresponding optimization problem. Furthermore, we also conduct some insightful discussion on the proposed feature selection approach, including the convergence analysis, computational complexity, and parameter determination. In all, the proposed framework not only provides a new perspective to view traditional methods but also evokes some other deep researches for feature selection. Compared with traditional unsupervised feature selection methods, our approach could integrate the merits of embedding learning and sparse regression. Promising experimental results on different kinds of data sets, including image, voice data and biological data, have validated the effectiveness of our proposed algorithm.
Many state-of-the-art methods have been proposed for infrared small target detection. They work well on the images with homogeneous backgrounds and high-contrast targets. However, when facing highly heterogeneous backgrounds, they would not perform very well, mainly due to: 1) the existence of strong edges and other interfering components, 2) not utilizing the priors fully. Inspired by this, we propose a novel method to exploit both local and nonlocal priors simultaneously. First, we employ a new infrared patch-tensor (IPT) model to represent the image and preserve its spatial correlations. Exploiting the target sparse prior and background nonlocal self-correlation prior, the target-background separation is modeled as a robust low-rank tensor recovery problem. Moreover, with the help of the structure tensor and reweighted idea, we design an entrywise local-structure-adaptive and sparsity enhancing weight to replace the globally constant weighting parameter. The decomposition could be achieved via the elementwise reweighted higher order robust principal component analysis with an additional convergence condition according to the practical situation of target detection. Extensive experiments demonstrate that our model outperforms the other state-of-the-arts, in particular for the images with very dim targets and heavy clutters.
We demonstrate advanced transversal radio frequency (RF) and microwave functions based on a Kerr optical comb source generated by an integrated micro-ring resonator. We achieve extremely high performance for an optical true time delay aimed at tunable phased array antenna applications, as well as reconfigurable microwave photonic filters. Our results agree well with theory. We show that our true time delay would yield a phased array antenna with features that include high angular resolution and a wide range of beam steering angles, while the microwave photonic filters feature high Q factors, wideband tunability, and highly reconfigurable filtering shapes. These results show that our approach is a competitive solution to implementing reconfigurable, high performance and potentially low cost RF and microwave signal processing functions for applications including radar and communication systems.
In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.
We report a broadband RF channelizer based on an integrated optical frequency Kerr micro-comb source, with an RF channelizing bandwidth of 90 GHz, a high RF spectral slice resolution of 1.04 GHz, and experimentally verify the RF performance up to 19 GHz. This approach of realizing RF channelizers offers reduced complexity, size, and potential cost for a wide range of applications to microwave signal detection.
We propose and experimentally demonstrate a microwave photonic intensity differentiator based on a Kerr optical comb generated by a compact integrated micro-ring resonator (MRR). The on-chip Kerr optical comb, containing a large number of comb lines, serves as a high-performance multi-wavelength source for implementing a transversal filter, which will greatly reduce the cost, size, and complexity of the system. Moreover, owing to the compactness of the integrated MRR, frequency spacings of up to 200-GHz can be achieved, enabling a potential operation bandwidth of over 100 GHz. By programming and shaping individual comb lines according to calculated tap weights, a reconfigurable intensity differentiator with variable differentiation orders can be realized. The operation principle is theoretically analyzed, and experimental demonstrations of the first-, second-, and third-order differentiation functions based on this principle are presented. The radio frequency amplitude and phase responses of multi-order intensity differentiations are characterized, and system demonstrations of real-time differentiations for a Gaussian input signal are also performed. The experimental results show good agreement with theory, confirming the effectiveness of our approach.