Recent Comments

The Genetic Diversity of Potato Virus Y Coat Protein Allows for Detection and Discrimination Between Strain Types

The Genetic Diversity of Potato Virus Y Coat Protein Allows for Detection and Discrimination Between Strain Types

Current tree of life published online today



Phylogenetic analysis of housekeeping genes of Streptococcus agalatiae isolated from bulk tank milk samples in Colombia

Final assignment for Bioinformatic course winter,  2016*

University of Prince Edward Island


Claudia Gisela Cobo-Angel1

The data for this assignment was collected in collaboration with:

Ana Sofia Jaramillo1;Sandra Bibiana Aguilar1; Juan Carlos Rodriguez-Lecompte2; Javier Sanchez2;  Ruth Zadoks3; Alejandro Ceballos1

1Research Group in Biology of Livestock and Animal Science; CLEV group. Universidad de Caldas, Manizales, Caldas, Colombia.

2Atlantic Veterinary College, University of Prince Edward Island, Canada.

3Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, United Kingdom

*This analysis was made by the student as part of the course, the collaborators did not reviewed the manuscript. The information here is only informative for the course, and is not scientific evidence


S. agalactiae, a Gram positive coccus, aerobic, classified in the group B of Lacefield, is a pathogen of the mammary gland in bovines, causing subclinical mastitis, causing economical loses to the dairy industry. Additionally, it is also a human pathogen; responsible of the disease known as “neonatal sepsis”, in adults, it can cause meningitis, abscess, urinary infections and arthritis.

The objective of this work is to present a phylogenetic analysis of 11 isolates of S. agalactiae, obtained from bulk tank milk samples from Colombian dairy farms, through the DNA sequences of seven “housekeeping” genes, which are highly conserved and codify for energetic metabolism functions of the pathogen.

Bacterial DNA extraction and amplification was made for the following genes: Alcohol dehydrogenase (Adhp), Phenylalanine/tRNA ligase (Phes), Serine/Threonine Kinase (Atr), Glutamine synthetase (Glna), Succinate dehydrogenase (Sdha), Glucokinase (glck), and transketolase (Tkt). Sequences of approximately 600 pb were obtained for each gene, by sequencing the amplicons, using sanger technology. 

The phylogenetic analysis was performed by multiple sequence alignment, and the subsequent phylogenetic tree of each gene using Clustal W®, in order to evaluate possible evolutionary processes among isolates. This comparative phylogenetic analysis showed that the use of only one of the housekeeping genes selected, is not an appropriate phylogenetic marker for S. agalactiae, since the individual analysis of individual genes showed inconsistent relation among isolates. However, the combination of genes in a multilocus analysis such as MLST typing provide a reliable indication of genetic variation and clusters among the species.

Keywords: Housekeeping, phylogenetic, Streptococcus



 Streptococcus agalactiae is a gram positive coccus, aerobic, encapsulated, classified inside the group B by Lancefield (Lancefield, 1934). In bovines is considered a major mastitis pathogen due to the large effect on milk quality and production. It produces mastitis, generally subclinical and chronic, often highly contagious, and often with low probability of self-cure (Keefe, 2012).

Moreover, S. agalactiae is also a significant human pathogen, and represents the main cause of neonatal septicemia, producing severe infections in newborns, infants and elderly (Bisharat et al., 2004). In women, this pathogen can cause abscess and mastitis during lactation, which is a risk factor for the infant infection (Dinger et al., 2002). In adults, asymptomatic colonization with S. agalactiae is frequent (20-40%) (Manning et al., 2008). However, it may cause meningitis or septicemia, as well as localized infection such as subcutaneous abscesses, urinary tract infections or arthritis (Chaiwarith et al., 2011), reaching mortality rates of approximately 15% in developed countries (Pereira et al., 2010).

The interspecies transmission is still a major question on the S. agalactiae epidemiology, and to elucidate this, it is necessary to study and compare the phylogenetic relation of the isolates from bovines and humans. Some of the methods include the analysis of highly conserved sequences in a multi locus sequence analysis, or the identification of molecular markers (Zadoks and Schukken, 2006).  This study presents a phylogenetic analysis of S. agalactiae isolated from bovine milk, based on seven housekeeping genes as individual molecular markers and Multi Locus Sequence Type (MLST), analysis to compare both approaches.

Material and methods

Bacteria isolation and confirmation

A total of 11 isolates of S.  agalacatiae were obtained from bulk tank milk in the western of Colombia, identification was made based on culture and CAMP reaction, following the methodology proposed by National Mastitis Council (NMC). Chromosomal DNA was isolated from presumptive positive and negative colonies, using an UltraClean Microbial Isolation Kit (MoBio®). Molecular DNA search was carried out by means the Polymerase Chain Reaction (PCR) to confirm isolates. The species-specific primers used for amplification (V1 and V2): V1 (5’-GCGTGCCTAATACATGCAA-3’) and V2 (5’-TACAACGCAGGTCCATCT-3’), directed to 16S rRNA. Genus—specific primers (C1 and C2) were used C1 (5’-TTTGGTGTTTACACTAGACTG-3’), C2 (5’-TGTGTTAATT ACTCTTATGCG-3’), when the amplification with primers V1 and V2 were negative (Elias et al., 2012). The amplification conditions were: initial denaturation of 94ºC for 4 min, 35 cycles of 94 ºC for 30 s, 45,1º for 30 s, 72º C for 30 s and a final extension of 72 ºC for 5 min.

Housekeeping genes amplification and sequencing

The following housekeeping genes were selected and amplified according to (Jones et al., 2003): Alcohol dehydrogenase (adhp), Phenylalanine/tRNA ligase (phes), Serine/Threonine Kinase (atr), Glutamine synthetase (glna), Succinate dehydrogenase (sdha), Glucokinase (glck), and transketolase (tkt). Primers are shown in Table 1. The amplification conditions were: initial denaturation of 94ºC for 3 min, 30cycles of 94 ºC for 1 min, 55 º for 45 s, 72º C for 1 min and a final extension of 72 ºC for 10 min. Amplicons were sequenced by sanger method using the primers described in the Table 1.

Table 1. Primers used to amplify and sequencing the housekeeping genes


Forward (5′ to 3′)

Reverse (5′ to 3′) amplicon size (bp)

Sequence analysis

Sequences were examined and assembled using Seqman® software (DNAStar, Madison, WI). Multiple sequence alignment was performed using Clustal-Omega®, with the following parameters: Gap penalty: 15, gap length penalty: 6.66, DNA transition weight: 0.5. Afterwards, the alignment was used to create and bootstrap the phylogenetic tree on DNAStar software, using the method of neighbor joining. Bootstrapping was made with 1000 replicates and a random seed of 111.

On the other hand, MLST typing was performed, by assigning an allele number for individual genes using alignments of each gene sequence against the database available at Each isolate was therefore designated by a seven-integer number, one per gene, constituting its allelic profile. Isolates with the same allelic profile were assigned to the same sequence type (ST) (Jones et al., 2003).  Concatenated sequences of the seven genes were used to make a multiple sequence alignment and subsequently phylogenetic tree, with the same parameters than the independent sequence for each gene.

Results and discussion

Housekeeping genes analyzed independently

Sequences of housekeeping genes are highly conserved because they codified for metabolic functions, which are essential for living cells. This kind of sequences has been used to identify evolutionary events (Vesth et al., 2010). Suitable molecular markers for identification purposes exhibit the smallest amount of heterogeneity within a species/genomovar and result in maximal separation between the different species/genomovars (Martens et al., 2008).

In this study, the most conserved sequence among isolates was the atr genes, presenting less than one nucleotides substitutions per 100 residues (Fig 1g), followed by adhp with seven nucleotides substitutions per 100 residues, as maximum (Fig 1b). On the other hand, glck was the most variable gen, with 75 nucleotides substitutions per 100 residues in one of the groups (Fig 1e).

Other studies had used some of the genes included in this analysis for bacterial classification, such as glck in Bacillus subtilis (Mesak et al., 2004) and phes in Lactobacillus (Naser et al., 2007). Nonetheless, the phylogenetic analysis of individual housekeeping genes, showed inconsistent relation among isolates (Fig. 1), except with few agreements between phylogenetic trees. For example, tkt and adhp clustered the isolate 1133 away from other isolates (Fig 1a & 1b), but tkt sequences of other isolates were very similar, instead adhp sequences clustered in two more groups.

On the other hand, glna and adhp had similar distribution of isolates in the phylogenetic tree (Fig 1c & 1d), where the isolate 1038, clustered apart, and the rest of the isolates were close among them.

The phylogenetic tree of atr was the only one where the isolates were divided in two groups of six and five isolates (Fig 1g), the genes phes and glck neither showed a pattern shared with other gene analysis.


Fig. 1

Fig 1. Phylogenetic analysis of the seven housekeeping genes, analyzed in this work. Only bootstrap values above 70 are shown.


Some of the phylogenetic trees have low bootstrap values, which reflect the uncertainty of analyzing genes individually. It has been suggested that under favorable conditions, bootstrapping value of more than 70% correspond to a probability of more than 95% that the true phylogeny has been found (Hillis and Bull, 1993), which means that in this study phylogenetic tree of phes gene resulted very unreliable, because the lack of bootstrap values above 70% (Fig 1f).  This fact could indicate that phes gene is not a good molecular marker by itself in this bacteria.

On the other hand, under certain conditions high bootstrap values can make the wrong phylogeny look good; therefore, the conditions of the analysis must be considered (Hillis and Bull, 1993). For instance, the atr gene did not have high bootstrapping values near to the root, reflecting a lack of consensus at the higher levels, but it had a high value (100%) close to the leafs (Fig 1a). This points to a significant probability of support for incorrect relationships for the isolates included, despite the high bootrapping value (Leekitcharoenphon et al., 2012).

Even though some researchers has used the housekeeping genes as substitutes for 16S rRNA gen, showing improved efficacy in species identification , it remains unlikely that a single gene can always reflect the subtle differences between genomes of the same species (Leekitcharoenphon et al., 2012). However, these limitations of using a single gene may be improved by the simultaneous analysis of multiple genes, like Multi Locus Sequence Typing (MLST), which has found wide applications, especially in phylogenetic studies.

MLST analysis

The MLST typing produced five groups for the 11 isolates (Table 2). The most common group was ST 718 (36.36%), followed by 356 (27.27%) and 248 (18.18%).

Table 2. Results of MLST typing

Isolate ST adhp Phes Atr Glna Sdha Glck Tkt
1106 718 13 1 81 13 1 1 1
1108 718 13 1 81 13 1 1 1
1384 718 13 1 81 13 1 1 1
1134 718 13 1 81 13 1 1 1
1034 356 13 1 2 41 1 1 5
1133 356 13 1 2 41 1 1 5
1125 356 13 1 2 41 1 1 5
1284 248 16 18 2 2 9 2 2
1038 248 16 18 2 2 9 2 2
1393 337 13 1 1 13 1 2 1
1137 1 1 1 2 1 1 2 2

The phylogenetic analysis of the concatenated sequences showed the importance of using more than one gene as molecular marker, showing a different distribution compared to the phylogenetic trees of each gene, with bootstrapping values above 70 in all the branches (Fig. 2), indicating that the analysis is accurate and reliable (Hillis and Bull, 1993).  The phylogenetic tree successfully clustered in different groups the different STs, being the ST 248, the most distanced to the other groups (isolates 1125 and 1248). Other authors have reported that the concatenation of a sufficient number of genes overwhelms possible conflicting phylogenetic signals in different genes (Guo et al., 2008).


Fig. 2 Phylogenetic tree of concatenated sequences of the seven housekeeping genes

Fig. 2 Phylogenetic tree of concatenated sequences of the seven housekeeping genes


Regarding the epidemiological implications of the groups found in this study, it is interesting the fact that one of the isolates were ST 1, which is frequently isolated from colonized pregnant women and infected neonates (Manning et al., 2009), finding this ST in milk could indicate adaptation process of the bacteria to survive in different media. Other authors have found this ST shared in humans and bovines (Manning et al., 2010), which support the hypothesis that interspecies transmission is possible in a farm environment. This needs more research.


This comparative phylogenetic analysis showed that the use of only one of the housekeeping genes selected, is not an appropriate phylogenetic marker for S. agalactiae. However, the combination of genes in a multilocus analysis such as MLST typing provide a reliable indication of genetic variation and clusters among the species. Other multilocus schemes should be investigated to find the most suitable marker for this pathogen.


Bisharat, N., D. W. Crook, J. Leigh, R. M. Harding, P. N. Ward, T. J. Coffey, M. C. Maiden, T. Peto, and N. Jones. 2004. Hyperinvasive neonatal group B streptococcus has arisen from a bovine ancestor. J Clin Microbiol 42(5):2161-2167.

Chaiwarith, R., W. Jullaket, M. Bunchoo, N. Nuntachit, T. Sirisanthana, and K. Supparatpinyo. 2011. Streptococcus agalactiae in adults at Chiang Mai University Hospital: a retrospective study. BMC infectious diseases 11:149.

Dinger, J., A. Topfer, P. Schaller, and R. Schwarze. 2002. Functional residual capacity and compliance of the respiratory system after surfactant treatment in premature infants with severe respiratory distress syndrome. European journal of pediatrics 161(9):485-490.

Elias, A. O., A. Cortez, P. E. Brandao, R. C. da Silva, and H. Langoni. 2012. Molecular detection of Streptococcus agalactiae in bovine raw milk samples obtained directly from bulk tanks. Res Vet Sci 93(1):34-38.

Guo, Y., W. Zheng, X. Rong, and Y. Huang. 2008. A multilocus phylogeny of the Streptomyces griseus 16S rRNA gene clade: use of multilocus sequence analysis for streptomycete systematics. Int J Syst Evol Microbiol 58(Pt 1):149-159.

Hillis, D. M. and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42(2):182-192.

Jones, N., J. F. Bohnsack, S. Takahashi, K. A. Oliver, M. S. Chan, F. Kunst, P. Glaser, C. Rusniok, D. W. Crook, R. M. Harding, N. Bisharat, and B. G. Spratt. 2003. Multilocus sequence typing system for group B streptococcus. J Clin Microbiol 41(6):2530-2536.

Keefe, G. 2012. Update on control of Staphylococcus aureus and Streptococcus agalactiae for management of mastitis. The Veterinary clinics of North America. Food animal practice 28(2):203-216.

Lancefield, R. C. 1934. A serological differentiation of specific types of bovine hemolytic streptococci (Group B). The Journal of experimental medicine 59(4):441-458.

Leekitcharoenphon, P., O. Lukjancenko, C. Friis, F. M. Aarestrup, and D. W. Ussery. 2012. Genomic variation in Salmonella enterica core genes for epidemiological typing. BMC Genomics 13:88.

Manning, S. D., M. A. Lewis, A. C. Springman, E. Lehotzky, T. S. Whittam, and H. D. Davies. 2008. Genotypic diversity and serotype distribution of group B Streptococcus isolated from women before and after delivery. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 46(12):1829-1837.

Manning, S. D., A. C. Springman, E. Lehotzky, M. A. Lewis, T. S. Whittam, and H. D. Davies. 2009. Multilocus sequence types associated with neonatal group B streptococcal sepsis and meningitis in Canada. J Clin Microbiol 47(4):1143-1148.

Manning, S. D., A. C. Springman, A. D. Million, N. R. Milton, S. E. McNamara, P. A. Somsel, P. Bartlett, and H. D. Davies. 2010. Association of Group B Streptococcus colonization and bovine exposure: A prospective cross-sectional cohort study. PloS one 5(1):e8795.

Martens, M., P. Dawyndt, R. Coopman, M. Gillis, P. De Vos, and A. Willems. 2008. Advantages of multilocus sequence analysis for taxonomic studies: a case study using 10 housekeeping genes in the genus Ensifer (including former Sinorhizobium). Int J Syst Evol Microbiol 58(Pt 1):200-214.

Mesak, L. R., F. M. Mesak, and M. K. Dahl. 2004. Bacillus subtilis GlcK activity requires cysteines within a motif that discriminates microbial glucokinases into two lineages. BMC Microbiol 4:6.

Naser, S. M., P. Dawyndt, B. Hoste, D. Gevers, K. Vandemeulebroecke, I. Cleenwerck, M. Vancanneyt, and J. Swings. 2007. Identification of lactobacilli by pheS and rpoA gene sequence analyses. Int J Syst Evol Microbiol 57(Pt 12):2777-2789.

Pereira, U. P., G. F. Mian, I. C. Oliveira, L. C. Benchetrit, G. M. Costa, and H. C. Figueiredo. 2010. Genotyping of Streptococcus agalactiae strains isolated from fish, human and cattle and their virulence potential in Nile tilapia. Veterinary microbiology 140(1-2):186-192.

Vesth, T., T. M. Wassenaar, P. F. Hallin, L. Snipen, K. Lagesen, and D. W. Ussery. 2010. On the origins of a Vibrio species. Microb Ecol 59(1):1-13.

Zadoks, R. N. and Y. H. Schukken. 2006. Use of molecular epidemiology in veterinary practice. Vet Clin North Am Food Anim Pract 22(1):229-261.

Course Co-ordinator’s Welcome Message

Welcome to Bioinformatics for Graduate Students VPM-885 course. This course is an introduction to bioinformatics and a practical guide to the analysis of genes and proteins. It is designed to familiarize students with the tools and principles of contemporary bioinformatics.

Biological research was revolutionized with the advent of Recombinant DNA Technology in the early 1970s. Since then, Molecular Biology has been transformative – probably too successfully such that our challenge now is finding better ways of handling, analyzing, and understanding the massive data being generated by genome sequencing and other emerging technologies (such as synthetic biology design, network and pathway analysis, analysis of high-throughput sequencing data, and innovative visualizations). The next transformative tools are related to Bioinformatics. Hence the importance of this graduate course (Bioinformatics for Graduate Students VPM-885) for current and future researchers in Life Sciences.

By the end of this course, you will have a working knowledge of a variety of publicly available databases and computational tools important in bioinformatics, and a grasp of the underlying principles that will be adequate for you to evaluate and use novel techniques as they arise in the future.

The Nucleic Acids Research online Molecular Biology Database Collection, available at this link: currently lists 1685 online databases sorted into 15 categories and 41 subcategories (Rigden et al., 2016. The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection. Nucleic Acids Research 44, Database issue D1–D6 doi: 10.1093/nar/gkv1356).

The Bioinformatics Links Directory, available at this link: currently has 1548 web tools listed.

Some of the sequence analysis software that I personally use on a daily basis is available at the following links:

Best wishes.

Fred Kibenge
Course Co-ordinator.

Welcome to VPM885

This is the first offering of VPM885 and welcome on board. Before VPM885 was set up,  some graduate students wanted to take CS322/BIO322 (Bioinformatics) and also wanted to get graduate course credit. To accommodate these requests, VPM885 was created.

In my understanding, graduate students and undergraduate students are slightly different in taking this course: they both want to learn bioinformatics knowledge and skills, but graduate students has more expectations. Graduate students hope that this course would be helpful in their thesis research; they could be more interested in some special topics of this course; their course project may involve deeper explorations and considerations.

Based on the above understanding, we carefully tailored the components of this course. We hope every student actively involve course activities in the classroom, in Moodle and in this website.