skip to main content
10.1145/974614.974618acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

An exact solution for finding minimum recombinant haplotype configurations on pedigrees with missing data by integer linear programming

Published: 27 March 2004 Publication History

Abstract

We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analysis. Our previous results show that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. The existing algorithms for MRHC either are heuristic in nature and cannot guarantee optimality, or only work under some restrictions (on e.g. the size and structure of the input pedigree, the number of marker loci, the number of recombinants in the pedigree, etc.). In addition, most of them cannot handle data with missing alleles and, for those that do consider missing data, they usually do not perform well in terms of minimizing the number of recombinants when a significant fraction of alleles are missing. In this paper, we develop an effective integer linear programming (ILP) formulation of the MRHC problem with missing data and a branch-and-bound strategy that utilizes a partial order relationship (and some other special relationships) among variables to decide the branching order. The partial order relationship is discovered in the preprocessing of constraints by considering unique properties in our ILP formulation. A directed graph is built based on the variables and their partial order relationship. By identifying and collapsing the strongly connected components in the graph, we may greatly reduce the size of an ILP instance. Non-trivial (lower and upper) bounds on the optimal number of recombinants are introduced at each branching node to effectively prune the search tree. When multiple solutions exist, a best haplotype configuration is selected based on a maximum likelihood approach. Our results on simulated data show that the algorithm could recover haplotypes with 50 loci from a pedigree of size 29 in seconds on a standard PC. Its accuracy is more than 99.8% for data with no missing alleles and 98.3% for data with 20% missing alleles in terms of correctly recovered phase information at each marker locus. As an application of our algorithm to real data, we present some test results on reconstructing haplotypes from a genome-scale SNP data set consisting of 12 pedigrees that have 0.8% to 14.5% missing alleles.

References

[1]
L. Aceto, J. A. Hansen, A. Ingólfsdóttir, J. Johnsen, and J. Knudsen. The complexity of checking consistency of pedigree information and related problems. Manuscript, 2003.
[2]
P. Bonizzoni, G. Della Vedova, R. Dondi, and J. Li. The haplotyping problem: an overview of computational models and solutions. J Comp Sci Tech, 18(6):675--688, 2003.
[3]
T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, Massachusetts, 2001.
[4]
D. Curtis. A program to draw pedigrees using LINKAGE or LINKSYS data files. Ann Hum Genet, 54:365--367, 1990.
[5]
M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander. High-resolution haplotype structure in the human genome. Nat Genet, 29(2):229--32, 2001.
[6]
K. Doi, J. Li, and T. Jiang. Minimum recombinant haplotype configuration on tree pedigrees. In Proc. WABI'03, pages 339-353, 2003.
[7]
E. Eskin, E. Halperin, and R. M. Karp. Large scale reconstruction of haplotypes from genotype data. In Proc. RECOMB'03, pages 104--113, 2003.
[8]
L. Excoffier and M. Slatkin. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol, 12:921--927, 1995.
[9]
S. B. Gabriel et al. The structure of haplotype blocks in the human genome. Science, 296(5576):2225--9, 2002.
[10]
D. Gusfield. Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In Proc. RECOMB'02, pages 166--175, 2002.
[11]
L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583--585, 2001.
[12]
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 409(6822):860--921, 2001.
[13]
J. Li and T. Jiang. Efficient inference of haplotypes from genotypes on a pedigree. J Bioinfo Comp Biol, 1(1):41--69, 2003.
[14]
J. Li and T. Jiang. Efficient rule-based haplotyping algorithms for pedigree data. In Proc. RECOMB'03, pages 197--206, 2003.
[15]
L. Li, J. H. Kim, and M. S. Waterman. Haplotype reconstruction from SNP alignment. In Proc. RECOMB'03, pages 207--216, 2003.
[16]
S. Lin and T. P. Speed. An algorithm for haplotype analysis. J Comput Biol, 4(4):535--46, 1997.
[17]
R. Lippert, R. Schwartz, G. Lancia, and S. Istrail. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in Bioinformatics, 3(1):23--31, 2002.
[18]
T. Niu, Z. S. Qin, X. Xu, and J. S. Liu. Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet, 70(1):157--169, 2002.
[19]
J. R. O'Connell. Zero-recombinant haplotyping: applications to fine mapping using SNPs. Genet Epidemiol, 19 Suppl 1:S64--70, 2000.
[20]
I. Pe'er and J. S. Beckmann. Resolution of haplotypes and haplotype frequencies from SNP genotypes of pooled samples. In Proc. RECOMB'03, pages 237--246, 2003.
[21]
D. Qian and L. Beckmann. Minimum-recombinant haplotyping in pedigrees. Am J Hum Genet, 70(6):1434--1445, 2002.
[22]
H. Seltman, K. Roeder, and B.D. Devlin. Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. Am J Hum Genet, 68(5):1250--1263, 2001.
[23]
E. Sobel, K. Lange, J. O'Connell, and D. Weeks. Haplotyping algorithms. T. Speed and M. Waterman, eds., Genetic Mapping and DNA Sequencing, IMA Vol in Math and its App, 81:89--110, 1996.
[24]
M. Stephens, N. J. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet, 68(4):978--89, 2001.
[25]
P. Tapadar, S. Ghosh, and P. P. Majumder. Haplotyping in pedigrees via a genetic algorithm. Hum Hered, 50(1):43--56, 2000.
[26]
J. C. Venter et al. The sequence of the human genome. Science, 291(5507):1304--1351, 2001.
[27]
E. M. Wijsman. A deductive method of haplotype analysis in pedigrees. Am J Hum Genet, 41(3):356--73, 1987.
[28]
L. Wolsey. Integer programming. John Wiley & Sons Inc, 1998.
[29]
S. Zhang et al. Transmission/Disequilibrium test based on haplotype sharing for tightly linked markers. Am J Hum Genet, 73(3):566--579, 2003.

Cited By

View all

Index Terms

  1. An exact solution for finding minimum recombinant haplotype configurations on pedigrees with missing data by integer linear programming

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology
        March 2004
        370 pages
        ISBN:1581137559
        DOI:10.1145/974614
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 March 2004

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. branch-and-bound algorithm
        2. haplotyping
        3. integer linear programming
        4. missing data imputation
        5. pedigree analysis
        6. recombination

        Qualifiers

        • Article

        Conference

        RECOMB04
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 148 of 538 submissions, 28%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 26 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media