
Stevens Institute of Technology
UniversityHoboken, United States
Research output, citation impact, and the most-cited recent papers from Stevens Institute of Technology (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Stevens Institute of Technology
With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.
This paper presents a novel adaptive synthetic (ADASYN) sampling approach for learning from imbalanced data sets. The essential idea of ADASYN is to use a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data is generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn. As a result, the ADASYN approach improves learning with respect to the data distributions in two ways: (1) reducing the bias introduced by the class imbalance, and (2) adaptively shifting the classification decision boundary toward the difficult examples. Simulation analyses on several machine learning data sets show the effectiveness of this method across five evaluation metrics.
Variance-based structural equation modeling is extensively used in information systems research, and many related findings may have been distorted by hidden collinearity. This is a problem that may extend to multivariate analyses, in general, in the field of information systems as well as in many other fields. In multivariate analyses, collinearity is usually assessed as a predictor-predictor relationship phenomenon, where two or more predictors are checked for redundancy. This type of assessment addresses vertical, or “classic”, collinearity. However, another type of collinearity may also exist, here called “lateral” collinearity. It refers to predictor-criterion collinearity. Lateral collinearity problems are exemplified based on an illustrative variance-based structural equation modeling analysis. The analysis employs WarpPLS 2.0, with the results double-checked with other statistical analysis software tools. It is shown that standard validity and reliability tests do not properly capture lateral collinearity. A new approach for the assessment of both vertical and lateral collinearity in variance-based structural equation modeling is proposed and demonstrated in the context of the illustrative analysis.
The three-dimensional classical many-body system is approximated by the use of collective coordinates, through the assumed knowledge of two-body correlation functions. The resulting approximate statistical state is used to obtain the two-body correlation function. Thus, a self-consistent formulation is available for determining the correlation function. Then, the self-consistent integral equation is solved in virial expansion, and the thermodynamic quantities of the system thereby ascertained. The first three virial coefficients are exactly reproduced, while the fourth is nearly correct, as evidenced by numerical results for the case of hard spheres.
We propose a lightweight and ground-optimized lidar odometry and mapping method, LeGO-LOAM, for realtime six degree-of-freedom pose estimation with ground vehicles. LeGO-LOAM is lightweight, as it can achieve realtime pose estimation on a low-power embedded system. LeGO-LOAM is ground-optimized, as it leverages the presence of a ground plane in its segmentation and optimization steps. We first apply point cloud segmentation to filter out noise, and feature extraction to obtain distinctive planar and edge features. A two-step Levenberg-Marquardt optimization method then uses the planar and edge features to solve different components of the six degree-of-freedom transformation across consecutive scans. We compare the performance of LeGO-LOAM with a state-of-the-art method, LOAM, using datasets gathered from variable-terrain environments with ground vehicles, and show that LeGO-LOAM achieves similar or better accuracy with reduced computational expense. We also integrate LeGO-LOAM into a SLAM framework to eliminate the pose estimation error caused by drift, which is tested using the KITTI dataset.
We propose a framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, that achieves highly accurate, real-time mobile robot trajectory estimation and map-building. LIO-SAM formulates lidar-inertial odometry atop a factor graph, allowing a multitude of relative and absolute measurements, including loop closures, to be incorporated from different sources as factors into the system. The estimated motion from inertial measurement unit (IMU) pre-integration de-skews point clouds and produces an initial guess for lidar odometry optimization. The obtained lidar odometry solution is used to estimate the bias of the IMU. To ensure high performance in real-time, we marginalize old lidar scans for pose optimization, rather than matching lidar scans to a global map. Scan-matching at a local scale instead of a global scale significantly improves the real-time performance of the system, as does the selective introduction of keyframes, and an efficient sliding window approach that registers a new keyframe to a fixed-size set of prior "sub-keyframes." The proposed method is extensively evaluated on datasets gathered from three platforms over various scales and environments.
1 Introduction.- 2 Limit Process Expansions Applied to Ordinary Differential Equations.- 3 Multiple-Variable Expansion Procedures.- 4 Applications to Partial Differential Equations.- 5 Examples from Fluid Mechanics.- Author Index.
Industry devices (i.e., entities) such as server machines, spacecrafts, engines, etc., are typically monitored with multivariate time series, whose anomaly detection is critical for an entity's service quality management. However, due to the complex temporal dependence and stochasticity of multivariate time series, their anomaly detection remains a big challenge. This paper proposes OmniAnomaly, a stochastic recurrent neural network for multivariate time series anomaly detection that works well robustly for various devices. Its core idea is to capture the normal patterns of multivariate time series by learning their robust representations with key techniques such as stochastic variable connection and planar normalizing flow, reconstruct input data by the representations, and use the reconstruction probabilities to determine anomalies. Moreover, for a detected entity anomaly, OmniAnomaly can provide interpretations based on the reconstruction probabilities of its constituent univariate time series. The evaluation experiments are conducted on two public datasets from aerospace and a new server machine dataset (collected and released by us) from an Internet company. OmniAnomaly achieves an overall F1-Score of 0.86 in three real-world datasets, signicantly outperforming the best performing baseline method by 0.09. The interpretation accuracy for OmniAnomaly is up to 0.89.
An equilibrium theory of rigid sphere fluids is developed based on the properties of a new distribution function G(r) which measures the density of rigid sphere molecules in contact with a rigid sphere solute of arbitrary size. A number of exact relations which describe rather fully the functional form of G(r) are derived. These are based on both geometrical considerations and the virial theorem. A knowledge of G(a) where a is the diameter of a rigid sphere enables one to arrive at the equation of state. The resulting analytical expression which is exact up to the third virial coefficient gives the fourth virial coefficient within 3% and the fifth, insofar as it is known, within 5%. Furthermore over the entire range of fluid density, the equation of state derived from theory agrees with that computed using machine methods. Theory also gives an expression for the surface tension of a hard sphere fluid in contact with a perfectly repelling wall. The dependence of surface tension on curvature is also given. The expressions obtained correlate nicely with those adduced by other thermodynamic and statistical mechanical theories. They also suggest that macroscopic consideration on surface tension can sometimes be successfully extrapolated to molecular dimensions.
In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance. The proposed CNN cascade operates at multiple resolutions, quickly rejects the background regions in the fast low resolution stages, and carefully evaluates a small number of challenging candidates in the last high resolution stage. To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade. The output of each calibration stage is used to adjust the detection window position for input to the subsequent stage. The proposed method runs at 14 FPS on a single CPU core for VGA-resolution images and 100 FPS using a GPU, and achieves state-of-the-art detection performance on two public face detection benchmarks.
Is there any connection between the vastness of the universes of stars and galaxies and the existence of life on a small planet out in the suburbs of the Milky Way? This book shows that there is. In their classic work, John Barrow and Frank Tipler examine the question of Mankind's place in the Universe, taking the reader on a tour of many scientific disciplines and offering fascinating insights into issues such as the nature of life, the serach for extraterrestrial intelligence, and the past history and fate of our universe.
Deep Learning has recently become hugely popular in machine learning for its ability to solve end-to-end learning systems, in which the features and the classifiers are learned simultaneously, providing significant improvements in classification accuracy in the presence of highly-structured and large databases.
This paper proposes a definition of systems thinking for use in a wide variety of disciplines, with particular emphasis on the development and assessment of systems thinking educational efforts. The definition was derived from a review of the systems thinking literature combined with the application of systems thinking to itself. Many different definitions of systems thinking can be found throughout the systems community, but key components of a singular definition can be distilled from the literature. This researcher considered these components both individually and holistically, then proposed a new definition of systems thinking that integrates these components as a system. The definition was tested for fidelity against a System Test and against three widely accepted system archetypes. Systems thinking is widely believed to be critical in handling the complexity facing the world in the coming decades; however, it still resides in the educational margins. In order for this important skill to receive mainstream educational attention, a complete definition is required. Such a definition has not yet been established. This research is an attempt to rectify this deficiency by providing such a definition.
A new development in the dynamical behavior of elementary quantum systems is the surprising discovery that correlation between two quantum units of information called qubits can be degraded by environmental noise in a way not seen previously in studies of dissipation. This new route for dissipation attacks quantum entanglement, the essential resource for quantum information as well as the central feature in the Einstein-Podolsky-Rosen so-called paradox and in discussions of the fate of Schrödinger's cat. The effect has been labeled ESD, which stands for early-stage disentanglement or, more frequently, entanglement sudden death. We review recent progress in studies focused on this phenomenon.
The flow about a spinning sphere moving in a viscous fluid is calculated for small values of the Reynolds number. With this solution the force and torque on the sphere are computed. It is found that in addition to the drag force determined by Stokes, the sphere experiences a force F L orthogonal to its direction of motion. This force is given by ${\bf F}_L = \pi a^3 \rho \Omega \times {\bf V}[1 + O(R)]$ . Here a is the radius of the sphere, Ω is its angular velocity, V is its velocity, ρ is the fluid density and R is the Reynolds number, $R = \rho \mu ^{-1} Va$ . For small values of R , the transverse force is independent of the viscosity μ. This force is in such a direction as to account for the curving of a pitched baseball, the long range of a spinning golf ball, etc. It is used as a basis for the discussion of the flow of a suspension of spheres through a tube. The calculation involves the Stokes and Oseen expansions. A representation of solutions of the Oseen equations in terms of two scalar functions is also presented.
Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $η$ must decay, even if full-gradient is used; otherwise, the solution will be $Ω(η)$ away from the optimal.
Laboratory-based courses play a critical role in scientific education. Automation is changing the nature of these laboratories, and there is a long-running debate about the value of hands-on versus simulated laboratories. In addition, the introduction of remote laboratories adds a third category to the debate. Through a review of the literature related to these labs in education, the authors draw several conclusions about the state of current research. The debate over different technologies is confounded by the use of different educational objectives as criteria for judging the laboratories: Hands-on advocates emphasize design skills, while remote lab advocates focus on conceptual understanding. We observe that the boundaries among the three labs are blurred in the sense that most laboratories are mediated by computers, and that the psychology of presence may be as important as technology. We also discuss areas for future research.
Abstract We create a culture dictionary using one of the latest machine learning techniques—the word embedding model—and 209,480 earnings call transcripts. We score the five corporate cultural values of innovation, integrity, quality, respect, and teamwork for 62,664 firm-year observations over the period 2001–2018. We show that an innovative culture is broader than the usual measures of corporate innovation – R&D expenses and the number of patents. Moreover, we show that corporate culture correlates with business outcomes, including operational efficiency, risk-taking, earnings management, executive compensation design, firm value, and deal making, and that the culture-performance link is more pronounced in bad times. Finally, we present suggestive evidence that corporate culture is shaped by major corporate events, such as mergers and acquisitions.
Atmospheric-pressure, non-equilibrium plasmas are susceptible to instabilities and, in particular, to arcing (glow-to-arc transition). Spatially confining the plasma to dimensions of 1 mm or less is a promising approach to the generation and maintenance of stable, glow discharges at atmospheric-pressure. Often referred to as microdischarges or microplasmas, these weakly-ionized discharges represent a new and fascinating realm of plasma science, where issues such as the possible breakdown of 'pd scaling' and the role of boundary-dominated phenomena come to the fore. Microplasmas are generated under conditions that promote the efficient production of transient molecular species such as the rare gas excimers, which generally are formed by three-body collisions. Pulsed excitation on a sub-microsecond time scale results in microplasmas with significant shifts in both the temperatures and energy distribution functions associated with the ions and electrons. This allows for the selective production of chemically reactive species and opens the door to a wide range of new applications of microplasmas. The implementation of semiconductor and microelectronics and MEMs microfabrication techniques has resulted in the realization of microplasma arrays as large as 250,000 devices. Fabricated in silicon or ceramics with characteristic device dimensions as small as 10 µm and at packing densities up to 104 cm−2, these arrays offer optical and electrical characteristics well suited for applications in medical diagnostics, displays and environmental sensing. Several microplasma device structures, including their fundamental properties and selected applications, will be discussed.
Energy harvesting technologies that are engineered to miniature sizes, while still increasing the power delivered to wireless electronics, (1, 2) portable devices, stretchable electronics, (3) and implantable biosensors, (4, 5) are strongly desired. Piezoelectric nanowire- and nanofiber-based generators have potential uses for powering such devices through a conversion of mechanical energy into electrical energy. (6) However, the piezoelectric voltage constant of the semiconductor piezoelectric nanowires in the recently reported piezoelectric nanogenerators (7-12) is lower than that of lead zirconate titanate (PZT) nanomaterials. Here we report a piezoelectric nanogenerator based on PZT nanofibers. The PZT nanofibers, with a diameter and length of approximately 60 nm and 500 microm, were aligned on interdigitated electrodes of platinum fine wires and packaged using a soft polymer on a silicon substrate. The measured output voltage and power under periodic stress application to the soft polymer was 1.63 V and 0.03 microW, respectively.