COSA Workshop 2012

Combinatorial Optimization, Statistics, and Applications

September 6-7, 2012, TU Munich, Germany

Zentrum Mathematik
Boltzmannstrasse 3, 85747 Garching (MAP Campus Garching)
Room 02.04.011 (2nd floor of "Mathematik und Informatik" building, between "fingers" 4 and 6)
Take U6 and drop off at Garching-Forschungszentrum.

Organizers: Raymond Hemmecke (TU Munich, Germany), Milan Studený (UTIA Prague, Czech Republic)

This workshop is in the spirit of the former Model Selection Days that were held at a variety of places with the intention to bring together researchers from different fields such as algebra, (algorithmic) discrete mathematics/optimization, computer science, mathematical statistics with applications for example in the life sciences or in machine learning. The first COSA workshop was held in 2011 also at TU Munich.

There will be 2 talks in the afternoon of September 6 (Thursday) and 2 talks starting early on September 7 (Friday). In addition to this "official part", we encourage discussions on Thursday morning and Friday afternoon among those participants planning to stay for the full two days.

No registration is required for the workshop. However, a short email to is appreciated if you intend to come.


September 6, 09:30 a.m.Time for discussions and lunch
September 6, 01:30 p.m.James CussensBayesian network structure learning with integer linear programming: results and prospects
September 6, 03:00 p.m.Jing XiThe Characteristic Imset Polytope for Diagnosis Models

September 7, 10:00 a.m.David HawsDetecting Epistasis using Bayesian Networks
September 7, 11:30 a.m.Robert CowellA simple greedy algorithm for reconstructing pedigrees
September 7, 12:30 p.m.Lunch and time for discussions

Speakers and abstracts

James Cussens
Department of Computer Science
University of York, UK
Bayesian network structure learning with integer linear programming: results and prospects

BN structure learning is an NP-hard problem even with convenient assumptions such as complete discrete data. Integer linear programming (ILP) provides a well-established mathematical and computational framework for solving hard problems like this. In this talk I will describe which ILP techniques have proved effective for BN structure learning with a particular focus on cutting plane algorithms which generate new constraints in the course of solving. Further progress depends crucially on generating new *variables* 'on the fly' using column generation techniques. As well as discussing column generation and the bounds which are needed to make it practical, I will also discuss useful results on facet-defining inequalities for the set covering problem which I expect to be useful for BN learning.

Jing Xi
Department of Statistics
University of Kentucky, Lexington, USA
The Characteristic Imset Polytope for Diagnosis Models

In 2010, M. Studený, R. Hemmecke, and S. Lindner explored a new algebraic description of graphical models, characteristic imsets. Compare with standard imsets, characteristic imsets have several advantages: they are still unique vector representative of conditional independence structures, they are 0-1 vectors, and they are more intuitive in terms of graphs than standard imsets. After defining characteristic imset polytope as the convex hull of all characteristic imsets for a given set of nodes, they also showed that a model selection in graphical models, which essentially is a problem of maximizing a quality criterion, can be converted into an integer programming problem on the characteristic imset polytope. However, this integer programming problem is very hard in general. Therefore, here we focus on diagnosis models which can be described by Bipartite graphs with a set of m nodes and a set of n nodes for any m, n \in Z_+, and their characteristic imset polytope. In this paper, first, we will show that the characteristic imsets for diagnosis models have very nice properties including that the number of non-zero coordinates is at most is n * (2^m - 1), and with these properties we are able to find a combinatorial description of all edges of the characteristic imset polytopes for diagnosis models. Then we prove that these characteristic imset polytopes are direct products of n many (2^m-1) dimensional simplicies. The expression of all facets are given, too. Finally, we end the paper with further questions in this topic.

David Haws
IBM Yorktown Heights, USA
Detecting Epistasis using Bayesian Networks

Gene by gene epistatic interactions plays a major role in the expression of many phenotypes, e.g. disease, disease resistance, etc. Epistasis is the interaction between multiple genes such that the total effect can not be attributed to the sum of their marginal effects alone. Recently, Jiang et al. demonstrated the ability of Bayesian networks to accurately detect epistasis. Bayesian networks are graphical models represented by directed acyclic graphs, and are used in a plethora of machine learning and statistics applications. However, Jiang et al. limited their search only up to 4-way interactions, nevertheless still showing good performance. But, in order to properly detect epistasis one must calculate a Bayesian network score for all possible k-way interactions. If the number of loci is n then one may have to exhaustively score 2^n interactions. Fortunately for certain Bayesian network scoring criteria (e.g. BIC, BDE) Campos et al. demonstrated that in practice many higher order scores (interactions) can be excluded, i.e. pruned. Here we study Campos' conditions and additional pruning conditions proposed by Studený which further reduce the search space. We demonstrate our pruning ability with custom C++ software on selected UCI machine learning database data sets and simulated epistasis data. The pruning of large sets of DAGs also has applications to the general problem of using polyhedral approaches to learning Bayesian networks, which the authors are also investigating.

Robert Cowell
Cass Business School
City University London, UK
A simple greedy algorithm for reconstructing pedigrees

I present a simple greedy-search algorithm for finding high likelihood pedigrees using micro-satellite (STR) genotype information on a complete sample of related individuals. The core idea behind the algorithm is not new, but I believe that putting it into a greedy search setting, and specifically the application to pedigree learning, is novel. The algorithm does not require age or sex information, but this information can be incorporated if desired. Prior information concerning pedigree structure is readily incorporated after the greedy-search is completed, on the high-likelihood pedigrees found. The algorithm is applied to human and non-human genetic data and in a simulation study.