Hewlett-Packard (United States)
companyPalo Alto, California, United States
Research output, citation impact, and the most-cited recent papers from Hewlett-Packard (United States) (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Hewlett-Packard (United States)
A novel and robust automated docking method that predicts the bound conformations of flexible ligands to macromolecular targets has been developed and tested, in combination with a new scoring function that estimates the free energy change upon binding. Interestingly, this method applies a Lamarckian model of genetics, in which environmental adaptations of an individual's phenotype are reverse transcribed into its genotype and become heritable traits (sic). We consider three search methods, Monte Carlo simulated annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm, and compare their performance in dockings of seven protein–ligand test systems having known three-dimensional structure. We show that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckian genetic algorithm is the most efficient, reliable, and successful of the three. The empirical free energy function was calibrated using a set of 30 structurally known protein–ligand complexes with experimentally determined binding constants. Linear regression analysis of the observed binding constants in terms of a wide variety of structure-derived molecular properties was performed. The final model had a residual standard error of 9.11 kJ mol−1 (2.177 kcal mol−1) and was chosen as the new energy function. The new search methods and empirical free energy function are available in AUTODOCK, version 3.0. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1639–1662, 1998
A new signature scheme is proposed, together with an implementation of the Diffie-Hellman key distribution scheme that achieves a public key cryptosystem. The security of both systems relies on the difficulty of computing discrete logarithms over finite fields.
The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86). The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
From the Publisher: Phadke was trained in robust design techniques by Genichi Taguchi, the mastermind behind Japanese quality manufacturing technologies and the father of Japanese quality control. Taguchi's approach is currently under consideration to be adopted as a student protocol with the US govrnment. The foreword is written by Taguchi. This book offers a complete blueprint for structuring projects to achieve rapid completion with high engineering productivity during the research and development phase to ensure that high quality products can be made quickly and at the lowest possible cost. Some topics covered are: orthogonol arrays, how to construct orthogonal arrays, computer-aided robutst design techniques, dynamic systems design methods, and more.
Linear optics with photon counting is a prominent candidate for practical quantum computing. The protocol by Knill, Laflamme, and Milburn [2001, Nature (London) 409, 46] explicitly demonstrates that efficient scalable quantum computing with single photons, linear optical elements, and projective measurements is possible. Subsequently, several improvements on this protocol have started to bridge the gap between theoretical scalability and practical implementation. The original theory and its improvements are reviewed, and a few examples of experimental two-qubit gates are given. The use of realistic components, the errors they induce in the computation, and how these errors can be corrected is discussed.
This work presents a new efficient method for fitting ellipses to scattered data. Previous algorithms either fitted general conics or were computationally expensive. By minimizing the algebraic distance subject to the constraint 4ac-b/sup 2/=1, the new method incorporates the ellipticity constraint into the normalization factor. The proposed method combines several advantages: It is ellipse-specific, so that even bad data will always return an ellipse. It can be solved naturally by a generalized eigensystem. It is extremely robust, efficient, and easy to implement.
In this paper, we study the linking patterns and discussion topics of political bloggers. Our aim is to measure the degree of interaction between liberal and conservative blogs, and to uncover any differences in the structure of the two communities. Specifically, we analyze the posts of 40 "A-list" blogs over the period of two months preceding the U.S. Presidential Election of 2004, to study how often they referred to one another and to quantify the overlap in the topics they discussed, both within the liberal and conservative communities, and also across communities. We also study a single day snapshot of over 1,000 political blogs. This snapshot captures blogrolls (the list of links to other blogs frequently found in sidebars), and presents a more static picture of a broader blogosphere. Most significantly, we find differences in the behavior of liberal and conservative blogs, with conservative blogs linking to each other more frequently and in a denser pattern.
Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996.
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison of twelve feature selection methods (e.g. Information Gain) evaluated on a benchmark of 229 text classification problem instances that were gathered from Reuters, TREC, OHSUMED, etc. The results are analyzed from multiple goal perspectives—accuracy, F-measure, precision, and recall—since each is appropriate in different situations. The results reveal that a new feature selection metric we call ‘Bi-Normal Separation ’ (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly challenging for induction algorithms. A new evaluation methodology is offered that focuses on the needs of the data mining practitioner faced with a single dataset who seeks to choose one (or a pair of) metrics that are most likely to yield the best performance. From this perspective, BNS was the top single choice for all goals except precision, for which Information Gain yielded the best result most often. This analysis also revealed, for example, that Information Gain and Chi-Squared have correlated failures, and so they work poorly together. When choosing optimal pairs of metrics for each of the four performance goals, BNS is consistently a member of the pair—e.g., for greatest recall, the pair BNS + F1-measure yielded the best performance on the greatest number of tasks by a considerable margin.
This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.
This paper presents methods for collecting and analyzing physiological data during real-world driving tasks to determine a driver's relative stress level. Electrocardiogram, electromyogram, skin conductance, and respiration were recorded continuously while drivers followed a set route through open roads in the greater Boston area. Data from 24 drives of at least 50-min duration were collected for analysis. The data were analyzed in two ways. Analysis I used features from 5-min intervals of data during the rest, highway, and city driving conditions to distinguish three levels of driver stress with an accuracy of over 97% across multiple drivers and driving days. Analysis II compared continuous features, calculated at 1-s intervals throughout the entire drive, with a metric of observable stressors created by independent coders from videotapes. The results show that for most drivers studied, skin conductivity and heart rate metrics are most closely correlated with driver stress level. These findings indicate that physiological signals can provide a metric of driver stress in future cars capable of physiological monitoring. Such a metric could be used to help manage noncritical in-vehicle information systems and could also provide a continuous measure of how different road and traffic conditions affect drivers.
We present an analysis of a person-to-person recommendation network, consisting of 4 million people who made 16 million recommendations on half a million products. We observe the propagation of recommendations and the cascade sizes, which we explain by a simple stochastic model. We analyze how user behavior varies within user communities defined by a recommendation network. Product purchases follow a ‘long tail’ where a significant share of purchases belongs to rarely sold items. We establish how the recommendation network grows over time and how effective it is from the viewpoint of the sender and receiver of the recommendations. While on average recommendations are not very effective at inducing purchases and do not spread very far, we present a model that successfully identifies communities, product, and pricing categories for which viral marketing seems to be very effective.
In recent years, social media has become ubiquitous and important for social networking and content sharing.And yet, the content that is generated from these websites remains largely untapped.In this paper, we demonstrate how social media content can be used to predict real-world outcomes.In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies.We show that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors.We further demonstrate how sentiments extracted from Twitter can be further utilized to improve the forecasting power of social media.
Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. Recently, collaborative tagging has grown in popularity on the web, on sites that allow users to tag bookmarks, photographs and other content. In this paper we analyze the structure of collaborative tagging systems as well as their dynamic aspects. Specifically, we discovered regularities in user activity, tag frequencies, kinds of tags used, bursts of popularity in bookmarking and a remarkable stability in the relative proportions of tags within a given URL. We also present a dynamic model of collaborative tagging that predicts these stable patterns and relates them to imitation and shared knowledge.
In the past few years elliptic curve cryptography has moved from a fringe activity to a major challenger to the dominant RSA/DSA systems. Elliptic curves offer major advances on older systems such as increased speed, less memory and smaller key sizes. As digital signatures become more and more important in the commercial world the use of elliptic curve-based signatures will become all pervasive. This book summarizes knowledge built up within Hewlett-Packard over a number of years, and explains the mathematics behind practical implementations of elliptic curve systems. Due to the advanced nature of the mathematics there is a high barrier to entry for individuals and companies to this technology. Hence this book will be invaluable not only to mathematicians wanting to see how pure mathematics can be applied but also to engineers and computer scientists wishing (or needing) to actually implement such systems.
Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of \t which may substantially reduce the number of combinations to be examined. However, \t still encounters problems when a sequence database is large and/or when sequential patterns to be mined are numerous and/or long. In this paper, we propose a novel sequential pattern mining method, called PrefixSpan (i.e., Prefix-projected Sequential pattern mining), which explores prefixprojection in sequential pattern mining. PrefixSpan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover, prefix-projection substantially reduces the size of projected databases and leads to efficient processing. Our performance study shows that PrefixSpan outperforms both the -based GSP algorithm and another recently proposed method, FreeSpan, in mining large sequence databases. 1
This correspondence describes the construction of complex codes of the form exp <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i \alpha_k</tex> whose discrete circular autocorrelations are zero for all nonzero lags. There is no restriction on code lengths.
The authors first study the behavior of a finite-state channel where a binary symmetric channel is associated with each state and Markov transitions between states are assumed. Such a channel is referred to as a finite-state Markov channel (FSMC). By partitioning the range of the received signal-to-noise ratio into a finite number of intervals, FSMC models can be constructed for Rayleigh fading channels. A theoretical approach is conducted to show the usefulness of FSMCs compared to that of two-state Gilbert-Elliott channels. The crossover probabilities of the binary symmetric channels associated with its states are calculated. The authors use the second-order statistics of the received SNR to approximate the Markov transition probabilities. The validity and accuracy of the model are confirmed by the state equilibrium equations and computer simulation.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
Optimized scale factors for calculating vibrational harmonic and fundamental frequencies and zero-point energies have been determined for 145 electronic model chemistries, including 119 based on approximate functionals depending on occupied orbitals, 19 based on single-level wave function theory, three based on the neglect-of-diatomic-differential-overlap, two based on doubly hybrid density functional theory, and two based on multicoefficient correlation methods. Forty of the scale factors are obtained from large databases, which are also used to derive two universal scale factor ratios that can be used to interconvert between scale factors optimized for various properties, enabling the derivation of three key scale factors at the effort of optimizing only one of them. A reduced scale factor optimization model is formulated in order to further reduce the cost of optimizing scale factors, and the reduced model is illustrated by using it to obtain 105 additional scale factors. Using root-mean-square errors from the values in the large databases, we find that scaling reduces errors in zero-point energies by a factor of 2.3 and errors in fundamental vibrational frequencies by a factor of 3.0, but it reduces errors in harmonic vibrational frequencies by only a factor of 1.3. It is shown that, upon scaling, the balanced multicoefficient correlation method based on coupled cluster theory with single and double excitations (BMC-CCSD) can lead to very accurate predictions of vibrational frequencies. With a polarized, minimally augmented basis set, the density functionals with zero-point energy scale factors closest to unity are MPWLYP1M (1.009), τHCTHhyb (0.989), BB95 (1.012), BLYP (1.013), BP86 (1.014), B3LYP (0.986), MPW3LYP (0.986), and VSXC (0.986).
LOCO-I (LOw COmplexity LOssless COmpression for Images) is the algorithm at the core of the new ISO/ITU standard for lossless and near-lossless compression of continuous-tone images, JPEG-LS. It is conceived as a "low complexity projection" of the universal context modeling paradigm, matching its modeling unit to a simple coding unit. By combining simplicity with the compression potential of context models, the algorithm "enjoys the best of both worlds." It is based on a simple fixed context model, which approaches the capability of the more complex universal techniques for capturing high-order dependencies. The model is tuned for efficient performance in conjunction with an extended family of Golomb-type codes, which are adaptively chosen, and an embedded alphabet extension for coding of low-entropy image regions. LOCO-I attains compression ratios similar or superior to those obtained with state-of-the-art schemes based on arithmetic coding. Moreover, it is within a few percentage points of the best available compression ratios, at a much lower complexity level. We discuss the principles underlying the design of LOCO-I, and its standardization into JPEC-LS.