![Show Menu](styles/mobile-menu.png)
![Page Background](./../common/page-substrates/page0266.png)
INFORMS Nashville – 2016
264
Tuesday, 10:00AM - 10:50AM
Tuesday Plenary
Davidson Ballroom-MCC
Big Data and Big Decisions
Plenary Session
Chair: Shabbir Ahmed, Georgia Tech,
shabbir.ahmed@isye.gatech.edu1 - Big Data And Big Decisions
Suvrajeet Sen, University of Southern California, 3715 McClintock
Ave, Los Angeles, CA, 90089, United States,
s.sen@usc.edu,
Suvrajeet Sen
Over the past decade, the world of Statistical and Machine Learning have made
dramatic in-roads into some of the more challenging AI problems ranging from
speech recognition and natural language processing, to bio and health
informatics. Both supervised and unsupervised learning methods have exploded
in daily use for applications covering business analytics, e-commerce,
educational/tutoring systems, and others. In many cases, new models and
algorithms have been developed so that the results of learning are also easier to
interpret (for a human decision maker). The partnership between AI and human
cognition is not new, but its widespread success in recent years has transformed
the way we do business today. The combination of modern informatics and high
dimensional statistics has often been credited with this transformation. This
lecture will not only highlight some successes of Big Data, but also explore
settings where human cognition may not provide the best test of decision quality.
This new class of problems involves not only Big Data, but also Big Decisions. This
lecture will explore the continuum between Big Data and Big Decisions.
Tuesday, 11:00AM - 12:30PM
TB01
101A-MCC
Clustering Methods in Data Mining
Sponsored: Data Mining
Sponsored Session
Chair: Majeed Simaan, RPI, 231 Congress Street, Troy, NY, 12180,
United States,
simaam@rpi.edu1 - Parable: A Parallel Random Partition Based Hierarchical
Clustering Algorithm For The Mapreduce Framework
Haimonti Dutta, University at Buffalo, 325P Jacobs Management
Center, Buffalo, NY, 14260, United States,
haimonti@buffalo.eduLarge datasets, of the order of peta and tera bytes, are prevalent in many scientific
domains. To effectively store, query and analyze these gigantic repositories,
parallel and distributed architectures are popular. Apache Hadoop is one such
parallel framework for supporting data-intensive applications. In this paper, we
present a PArallel, RAndom-partition Based hierarchicaL clustEring algorithm for
the MapReduce framework on Hadoop. It proceeds in two steps - local
hierarchical clustering on nodes and integration of results by a novel dendrogram
alignment technique. Empirical results indicate that significant scalability benefits
can be obtained while maintaining good cluster quality.
2 - Clustering The Traffic Data Errors Using K-mean
Clustering Method
Amin Ariannezhad, Graduate Research Assistant, University of
Arizona, 1209 E. 2nd Street, Tucson, AZ, 85719, United States,
ariannezhad@email.arizona.edu,Yao-Jan Wu
This study aims to identify the meaningful patterns of errors observed in traffic
data collected from dual loop detectors in Phoenix, Arizona. A set of data quality
control criteria was implemented to calculate the percentage of different types of
errors observed during each day of data for each loop detector. K-mean clustering
method was then utilized to cluster the 15 possible error categories in the data
detected in each loop detector on daily basis. Seven significant patterns were
found in these errors based on the relationship between them. Findings from the
field visit revealed that the clustering method could successfully find different
meaningful patterns in data errors.
3 - A New Optimization Model For Supervised Biclustering Problem
In Biomedical Dataset Classification
Cem Iyigun, Associate Professor, Middle East Technical University,
Inonu Blvd, Endustri Muhendisligi, Ankara, 06800, Turkey,
iyigun@ie.metu.edu.tr, Saziye Deniz Oguz Arikan
Biclustering groups samples and features simultaneously in the given set of data.
We focus on a supervised biclustering problem leading to unsupervised feature
selection for binary class and multi-class problems. For this problem, we have
proposed a new supervised biclustering optimization model which aims to
maximize classification accuracy by selecting almost all features.
4 - Large Scale Spectral Partitioning By Simulated Mixing
Shahzad Bhatti, University of Illinois at Urbana Champaign,
104 S Mathews Ave,, Urbana, IL, 61801, United States,
bhatti2@illinois.edu, Carolyn Beck, Angelia Nedich
Several problems can be cast as a spectral partitioning problem such as data
clustering, graph partitioning, community detection, image segmentation etc.
However, computational complexity of eigenvalue decompositions has
handicapped application of spectral partitioning to large scale problems. Several
algorithms in the recent past focus on accelerating spectral partitioning, however
they sacrifice its accuracy to achieve faster speed. Our algorithm on the other
hand does not require eigenvalue decomposition, rather it recursively bi-
partitions the data by finding an approximate linear combination of eigenvectors
of the normalized adjacency matrix of the underlying graph.
TB02
101B-MCC
Methods for Analysis of Next-Generation
Sequencing Data
Sponsored: Data Mining
Sponsored Session
Chair: Paul Brooks, Virginia Commonwealth Univ, Richmond, VA,
United States,
jpbrooks@vcu.edu1 - Quality Control For Microbiome Experiments
David Edwards, Virginia Commonwealth University, Richmond,
VA, United States,
dedwards7@vcu.edu, Paul Brooks
Microbiome studies aim to understand the role of the bacterial communities in
physiology and disease. The primary goal of the Vaginal Microbiome Consortium
is to develop methods to facilitate the discovery of patterns in 16S rRNA data and
extensive clinical and demographic data as it relates to women’s health.
Maintaining internal consistency and understanding measurement variation in
microbiome experiments is key to identifying and avoiding batch effects. In this
talk, we discuss and illustrate how statistical quality control techniques (and
related visualizations) are useful for assessing data consistency across time via
positive and negative controls.
2 - Characterizing The Vaginal Microbiome Based On A Large
Observational Study
Victoria Pokhilko, Virginia Commonwealth University,
Richmond, VA, United States,
pokhilkovv@vcu.edu,Paul Brooks,
David Edwards
We conducted an analysis of 16S rRNA surveys of the vaginal microbiome based
on samples from over 6,000 women. Vaginal microbiome profiles are typically
dominated by a single bacterium, leading to a classification of samples into groups
that we call vagitypes. Vagitype classifications facilitate the discovery of
relationships between microbiome profile and clinical data. The presence or
absence of Lactobacillus species and a diagnosis of bacterial vaginosis have been
shown to play an important role in the reproductive health of a woman. Our
analysis provides information about these patterns and suggests roles for other
bacteria in health and dysbiosis.
3 - Longitudinal Data Analysis Techniques For Analyzing
Microbiome Data
Eugenie Jackson, University of Wyoming,
ejacks20@uwyo.eduMicrobiome data is characterized by a high degree of sparseness, a number of
observations much smaller than the number of taxa, and often a small set of taxa
that dominates the data. Goals of analysis include identifying and characterizing
microbiome profiles, discovering relationships between microbial populations and
health states, and understanding interdependencies among taxa. Changes in
human microbial communities and their respective hosts across time is of
fundamental interest. We present an overview of recent longitudinal analysis
techniques for microbiome data. We discuss their respective strengths and uses,
open problems, and directions for future work.
TB01