icipe Digital Repository

Research and development of a weighted most recent common ancester algorithm for metagenomic taxonomic assignment

Show simple item record

dc.contributor.author Butungi, Hellen
dc.date.accessioned 2019-12-04T08:01:42Z
dc.date.available 2019-12-04T08:01:42Z
dc.date.issued 2012
dc.identifier.uri http://hdl.handle.net/123456789/1131
dc.description A Dissertation submitted to the Centre for Biotechnology and Bioinformatics (CEBIB) in partial fulfillment of the requirements for the Award of a Master of Science Degree in Bioinformatics of the University of Nairobi en_US
dc.description.abstract The new generation of metagenomics has gained tremendous popularity in recent years. This has been majorly due to rapid advances in DNA sequencing technology, which has produced large amounts of sequence data in relatively shorter times, compared to conventional DNA sequencing methods. There is a need to taxonomically characterise these data by assigning individual sequence reads to their constituent taxa. However, there is lack of up-to-date and customized software tools to accomplish this task, and for taxonomic characterisation, an automated taxonomic classification scheme is necessary. The overall objective of this study was to improve the accuracy of the most recent common ancestor (MRCA) estimation method used in scoring metagenomic reads in the pathogen profiling pipeline (PPP). The specific objectives included investigating sequence comparison algorithms that have been used for assigning sequence reads to taxa excluding the MRCA, compare the taxonomic classification accuracy of MEGAN and MRCA on the same simulated metagenomic dataset and finally design the weighted MRCA algorithm that attains the maximum possible classification accuracy and implement it in the PPP. A novel "weighted most recent common ancestor" (weighted MRCA) algorithm was developed as a number of Perl scripts and evaluated for taxonomic accuracy. The datasets used for evaluation were simulated by the QSA Read simulator using reference viral and prokaryotic (Bacteria and Archaea) genomes obtained from the NCBI Refseq database. The results showed an improved mapping of up to 3.6% for viral sequences and 8.4% for the prokaryotic sequences (p-values as low as 0.0043 at a significance level of α = 0.05), at the species rank compared to MEGAN and MRCA. In the context of environmental science and medicine, these percentages are highly significant as they inform key decisions in public health. For large-scale pathogen discovery projects, this method facilitates more accurate analysis and reporting of candidate etiological agents in complex nucleic acid mixtures, which enhances outbreak preparedness by enhancing capacity for early recognition and containment of pathogens. en_US
dc.description.sponsorship German Academic Exchange Service (DAAD); World Federation of Scientists, Switzerland en_US
dc.publisher University of Nairobi en_US
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject ancester algorithm en_US
dc.subject metagenomic taxonomic en_US
dc.title Research and development of a weighted most recent common ancester algorithm for metagenomic taxonomic assignment en_US
dc.type Thesis en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States

Search icipe Repository


Advanced Search

Browse

My Account