The gene family analysis was done by following procedures:
1) All-against-all BLASTP of protein sequences from four species (e-value 1e-5, maximum hits 500).
2) TribeMCL clustering (inflation value 5.0) to cluster proteins into “Family”, represented by the “F” prefix.
3) All-against-all BLASTP for each “Family" (e-value 1e-5, maximum hits 500).
4) TribeMCL clustering (inflation value 5.0) to cluster proteins into “Group”, represented by the “G” prefix. Protein sequences that can’t be clustered into “Group” in each family are aggravated in the last Group.
5) Protein sequences in each “Family” and ‘Group” were aligned by the multiple sequence alignment (MSA) program MUSCLE using the default setting.
6) Gaps and unconserved blocks with more than 20% sequence divergence in the MSA were removed based on the BLOSUM62 scoring matrix.
7) The phylogenetic tree was built by PHYML (parameter -d aa -m LG -b -4 -a e).