Lucia Pettinato - INFN Firenze # Statistical analysis of promoter sequences #
Promoters are DNA sequences located upstream of each gene. Through various biochemical mechanisms, they regulate the transcription rate of the corresponding gene, directing when and in which tissues the information stored in the gene must (or must not) be used.
In this work, I analyze a sample of promoters of Homo sapiens, to search for statistically significant properties of promoter sequences. A clustering algorithm, specifically developed, identifies classes of promoters with similar statistical base composition properties. The classes obtained are further characterized developing an algorithm to detect regular regions (i.e. periodic or homogeneous in composition): such regions are known to have an important role in determining the promoter functions.
The combination of these methods allows to identify and characterize three classes of promoters in Homo sapiens; it also gives crucial clues to grasp the pervasive presence of transposons (DNA sequences capable to move from one location on the genome to another) in one of these classes. This analysis, repeated for other species, allows a comparison to search for evolutionary trends in promoter structure.