Print E-mail
Volume 45, Number 6, December 2012

Characterization of subtypes of the influenza A hemagglutinin (HA) gene using profile hidden Markov models

Yu-Nong Gong, Guang-Wu Chen, Shin-Ru Shih

Received: July 29, 2011    Revised: September 27, 2011    Accepted: October 16, 2011   


Corresponding author:
  • Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
  • Research Center for Emerging Viral Infections, Chang Gung University, Taoyuan, Taiwan
  • These authors contributed equally to this work.
  • Corresponding Author InformationCorresponding author. Department of Computer Science and Information Engineering, Chang Gung University, 259 Wen-Hua 1st Road, Kwei-Shan, Taoyuan 333, Taiwan.


Background and purpose: 

The influenza A virus has evolved into 16 hemagglutinin (HA) subtypes with different antigenic properties. Thus far typing has been primarily assay based, but the many sequences available from the US National Center for Biotechnology Information (NCBI) offer alternative ways of characterizing the HA gene.



All available HA sequences from the NCBI were analyzed. The software package HMMER was used to score how a training sequence fitted a profile hidden Markov model (profile HMM) constructed from the consensus sequence of one particular HA subtype, Hx, where x=1 to 16. Scores from sequences of the same subtype and from other subtypes were then compared to see if they were separable. This approach was implemented in a stepwise manner, utilizing a sliding window of 100 amino acids with 10-amino-acid increments to build many subtype-specific models, and then assessing which 100-amino acid segments yielded the desired differentiability.



Segment-based analysis revealed domains that correlate to HA sequence heterogeneity from one subtype to the others. For example, we showed that H1 segments covering only the second half of HA are not statistically separable from H2, H5 and H6 within the same region, suggesting evolutionary relatedness for these subtypes. The HA1 domain was found to be mostly differentiable between subtypes, which is in line with wet-lab findings that the domain is antigenicity-rich. We also reported a couple of regions that can be conveniently used to characterize all HA subtypes.




We established an analysis framework for assessing sequence-subtype association to provide insights into HA subtypes with close evolutionary relationships.


Key words:

 HA subtypeInfluenza A virusProfile hidden Markov modelSequence analysis