2016 INFORMS Annual Meeting Program

Previous Page

Next Page

Page Background

INFORMS Nashville – 2016

264

Tuesday, 10:00AM - 10:50AM

Tuesday Plenary

Davidson Ballroom-MCC

Big Data and Big Decisions

Plenary Session

Chair: Shabbir Ahmed, Georgia Tech,

shabbir.ahmed@isye.gatech.edu

1 - Big Data And Big Decisions

Suvrajeet Sen, University of Southern California, 3715 McClintock

Ave, Los Angeles, CA, 90089, United States,

,

Suvrajeet Sen

Over the past decade, the world of Statistical and Machine Learning have made

dramatic in-roads into some of the more challenging AI problems ranging from

speech recognition and natural language processing, to bio and health

informatics. Both supervised and unsupervised learning methods have exploded

in daily use for applications covering business analytics, e-commerce,

educational/tutoring systems, and others. In many cases, new models and

algorithms have been developed so that the results of learning are also easier to

interpret (for a human decision maker). The partnership between AI and human

cognition is not new, but its widespread success in recent years has transformed

the way we do business today. The combination of modern informatics and high

dimensional statistics has often been credited with this transformation. This

lecture will not only highlight some successes of Big Data, but also explore

settings where human cognition may not provide the best test of decision quality.

This new class of problems involves not only Big Data, but also Big Decisions. This

lecture will explore the continuum between Big Data and Big Decisions.

Tuesday, 11:00AM - 12:30PM

TB01

101A-MCC

Clustering Methods in Data Mining

Sponsored: Data Mining

Sponsored Session

Chair: Majeed Simaan, RPI, 231 Congress Street, Troy, NY, 12180,

United States,

1 - Parable: A Parallel Random Partition Based Hierarchical

Clustering Algorithm For The Mapreduce Framework

Haimonti Dutta, University at Buffalo, 325P Jacobs Management

Center, Buffalo, NY, 14260, United States,

haimonti@buffalo.edu

Large datasets, of the order of peta and tera bytes, are prevalent in many scientific

domains. To effectively store, query and analyze these gigantic repositories,

parallel and distributed architectures are popular. Apache Hadoop is one such

parallel framework for supporting data-intensive applications. In this paper, we

present a PArallel, RAndom-partition Based hierarchicaL clustEring algorithm for

the MapReduce framework on Hadoop. It proceeds in two steps - local

hierarchical clustering on nodes and integration of results by a novel dendrogram

alignment technique. Empirical results indicate that significant scalability benefits

can be obtained while maintaining good cluster quality.

2 - Clustering The Traffic Data Errors Using K-mean

Clustering Method

Amin Ariannezhad, Graduate Research Assistant, University of

Arizona, 1209 E. 2nd Street, Tucson, AZ, 85719, United States,

ariannezhad@email.arizona.edu,

Yao-Jan Wu

This study aims to identify the meaningful patterns of errors observed in traffic

data collected from dual loop detectors in Phoenix, Arizona. A set of data quality

control criteria was implemented to calculate the percentage of different types of

errors observed during each day of data for each loop detector. K-mean clustering

method was then utilized to cluster the 15 possible error categories in the data

detected in each loop detector on daily basis. Seven significant patterns were

found in these errors based on the relationship between them. Findings from the

field visit revealed that the clustering method could successfully find different

meaningful patterns in data errors.

3 - A New Optimization Model For Supervised Biclustering Problem

In Biomedical Dataset Classification

Cem Iyigun, Associate Professor, Middle East Technical University,

Inonu Blvd, Endustri Muhendisligi, Ankara, 06800, Turkey,

iyigun@ie.metu.edu.tr

, Saziye Deniz Oguz Arikan

Biclustering groups samples and features simultaneously in the given set of data.

We focus on a supervised biclustering problem leading to unsupervised feature

selection for binary class and multi-class problems. For this problem, we have

proposed a new supervised biclustering optimization model which aims to

maximize classification accuracy by selecting almost all features.

4 - Large Scale Spectral Partitioning By Simulated Mixing

Shahzad Bhatti, University of Illinois at Urbana Champaign,

104 S Mathews Ave,, Urbana, IL, 61801, United States,

bhatti2@illinois.edu

, Carolyn Beck, Angelia Nedich

Several problems can be cast as a spectral partitioning problem such as data

clustering, graph partitioning, community detection, image segmentation etc.

However, computational complexity of eigenvalue decompositions has

handicapped application of spectral partitioning to large scale problems. Several

algorithms in the recent past focus on accelerating spectral partitioning, however

they sacrifice its accuracy to achieve faster speed. Our algorithm on the other

hand does not require eigenvalue decomposition, rather it recursively bi-

partitions the data by finding an approximate linear combination of eigenvectors

of the normalized adjacency matrix of the underlying graph.

TB02

101B-MCC

Methods for Analysis of Next-Generation

Sequencing Data

Sponsored: Data Mining

Sponsored Session

Chair: Paul Brooks, Virginia Commonwealth Univ, Richmond, VA,

United States,

jpbrooks@vcu.edu

1 - Quality Control For Microbiome Experiments

David Edwards, Virginia Commonwealth University, Richmond,

VA, United States,

dedwards7@vcu.edu

, Paul Brooks

Microbiome studies aim to understand the role of the bacterial communities in

physiology and disease. The primary goal of the Vaginal Microbiome Consortium

is to develop methods to facilitate the discovery of patterns in 16S rRNA data and

extensive clinical and demographic data as it relates to women’s health.

Maintaining internal consistency and understanding measurement variation in

microbiome experiments is key to identifying and avoiding batch effects. In this

talk, we discuss and illustrate how statistical quality control techniques (and

related visualizations) are useful for assessing data consistency across time via

positive and negative controls.

2 - Characterizing The Vaginal Microbiome Based On A Large

Observational Study

Victoria Pokhilko, Virginia Commonwealth University,

Richmond, VA, United States,

pokhilkovv@vcu.edu,

Paul Brooks,

David Edwards

We conducted an analysis of 16S rRNA surveys of the vaginal microbiome based

on samples from over 6,000 women. Vaginal microbiome profiles are typically

dominated by a single bacterium, leading to a classification of samples into groups

that we call vagitypes. Vagitype classifications facilitate the discovery of

relationships between microbiome profile and clinical data. The presence or

absence of Lactobacillus species and a diagnosis of bacterial vaginosis have been

shown to play an important role in the reproductive health of a woman. Our

analysis provides information about these patterns and suggests roles for other

bacteria in health and dysbiosis.

3 - Longitudinal Data Analysis Techniques For Analyzing

Microbiome Data

Eugenie Jackson, University of Wyoming,

ejacks20@uwyo.edu

Microbiome data is characterized by a high degree of sparseness, a number of

observations much smaller than the number of taxa, and often a small set of taxa

that dominates the data. Goals of analysis include identifying and characterizing

microbiome profiles, discovering relationships between microbial populations and

health states, and understanding interdependencies among taxa. Changes in

human microbial communities and their respective hosts across time is of

fundamental interest. We present an overview of recent longitudinal analysis

techniques for microbiome data. We discuss their respective strengths and uses,

open problems, and directions for future work.

TB01