| • Science | • People | • Locations | • Timeline |
| Contents | ||
Molecular systematics has been made possible by the availability of techniques for gene sequencing, which allow the determination of the exact sequence of nucleotidess or bases in either DNA or RNA. At present it is still a long and expensive process to sequence the entire genome of an organism, and this has been done for only a few species. However it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its haplotypeA haplotype is the genetic constitution of an individual with respect to one member of a pair of allelic genes. A haplotype can refer to only one locus or to an entire genome. A genome-wide haplotype would comprise half of a diploid genome, including one. In principle, since there are four base types, with 1000 base pairs, we could have 41000 distinct haplotypes. However, for organisms within a particular species, or in a group of related species, it turns out as a matter of empirical fact that
In a molecular systematic analysis, the haplotypes from a substantial sample of individuals of the target species or other taxonA taxon (plural taxa is an element of a taxonomy, e. in the scientific classification in biology. Taxa form a hierarchical scheme, each being broken down into subtaxa. In traditional Linnaean taxonomy, taxa are ranked as follows, with some of the less wid are determined for a defined area of genetic material. Haplotypes of individuals of a comparably sized sample of closely related, but supposedly different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: these are referred to as an out group. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: this is referred to as the number of substitutions (other kinds of differences between haplotypes can also occur, for example the insertion of a section of nucleic acid in one haplotype that is not present in another). Usually the number of substitutions is re-expressed as a percentage divergence, by dividing the number of substitutions by the number of base-pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced.
An alternative approach is to determine the divergences between the genotypeThe genotype is the specific genetic makeup (the specific genome) of an individual, usually in the form of DNA. It codes for the phenotype of that individual. Typically, one refers to an individual's genotype with regard to a particular gene of interest as of individuals by DNA-DNA hybridisationDNA-DNA hybridization is a method in genetics to measure the degree of genetic similarity between DNA sequences. The technique is usually used to determine the genetic "distance" between two species. When several species are compared that way, the similar instead of by determining and comparing gene sequences. The advantage of using hybridisation rather than gene sequencing is that is based on the entire genotype, rather than a particular section of DNA. Its disadvantage is that precise haplotypes are not determined.
Once the divergences between all pairs of samples have been determined, the resulting triangular matrixIn the mathematical discipline of linear algebra, a triangular matrix is a special kind of square matrix where the entries below or above the main diagonal are zero. Because matrix equations with triangular matrices are easy to solve they are very importa of differences is submitted to some form of statistical cluster analysis, and the resulting dendrogram is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group, or not. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a clade. Statistical significanceIn statistics, a result is significant if it is unlikely to have occurred by chance. More precisely, in traditional frequentist statistical hypothesis testing, the significance level of a test is the maximum probability of accidentally rejecting a true nu tests are available to examine whether it is possible to reject the hypothesis that a particular of haplotypes lie in a single clade.