Monday, July 20, 2020

HUMAN GENOME PROJECT

HUMAN GENOME PROJECT


Genetic make up of an organism or an individual lies in the DNA sequences. If two individuals differ, then their DNA sequences should also be different, at least at some places. These assumptions led to the quest of finding out the complete DNA sequence of human genome. With the establishment of genetic engineering techniques where it was possible to isolate and clone any piece of DNA and availability of simple and fast techniques for determining DNA sequences, a very ambitious project of sequencing human genome was launched in the year 1990.

Human Genome Project (HGP) was called a mega project. You can imagine the magnitude and the requirements for the project if we simply define the aims of the project as follows:

Human genome is said to have approximately 3 x 10bp, and if the cost of sequencing required is US $3 per bp (the estimated cost in the beginning), the total estimated cost of the project would be approximately 9 billion US dollars. Further, if the obtained sequences were to be stored in typed form in books, and if each page of the book contained 1000 letters and each book contained 1000 pages, then 3300 such books would be required to store the information of DNA sequence from a single human cell. HGP was closely associated with the rapid development of a new area in biology called as Bioinformatics.

Goals of HGP 

Some of the important goals of HGP are as follows :

1. Identify all the genes in human DNA.

2. Determine the sequences of the 3 billion chemical base pairs that make up human DNA.

3. Store this information in databases

4. Improve tools for data analysis.

5. Transfer related technologies to other sectors, such as industries

6. Address the ethical, legal and social issues (ELSI) that may arise from the project.

The project was completed in 2003. Knowledge about the effects of DNA variations among individuals can lead to revolutionary new ways to diagnose, treat and someday prevent the thousands of disorders that affect human beings. Besides providing clues to understanding human biology, learning about non-human organisms, DNA sequences can lead to an understanding of their natural capabilities that can be applied toward solving challenges in health care, agriculture energy production, environmental remediation. Many non-human model organisms, such as bacteria, yeast, Caenorhabditis elegans (a freeliving non-pathogenic nematode), Drosophila (the fruit fly), plants (rice and Arabidopsis), etc., have also been sequenced.

Methodologies : The methods involved two major approaches (1) Expressed Sequence Tags (ESTS) - Identifying all the genes that expressed as RNA (2) Sequence Annotation - The blind approach of simply sequencing the whole set of genome that contained all the coding and non-coding sequence, and later assigning different regions in the sequence with functions. For sequencing, the total DNA from a cell is isolated and converted into random fragments of relatively smaller sizes (recall DNA is a very long polymer, and there are technical limitations in sequencing very long pieces of DNA) and cloned in suitable host using specialised vectors.

The cloning resulted into amplification of each piece of DNA fragment so, that is subsequently could be sequenced with ease. The commonly used hosts were bacteria and yeast, and the vectors were called as BAC (bacterial artificial chromosomes), and YAC (yeast artificial chromosomes).

The fragments were sequenced using automated DNA sequencers that worked on the principle of a method developed by Frederick Sanger (Remember, Sanger is also credited for developing method for determination of amino acid sequences in proteins). These sequences were  then arranged based on some overlapping regions present in them. This required generation of overlapping fragments for sequencing. Alignment of these sequences was humanly not possible. Therefore, specialised computer based programmes were developed. These sequences were subsequently annotated and were assigned to each chromosome. The Sequence of chromosome I was completed only in May 2006 (this was the last of the 24 human chromosomes 22 autosomes and X and Y to be sequenced). Another challenging task was assigning the genetic and physical maps on the genome. This was generated using information on polymorphism of restriction endonuclease recognition sites, and some repetitive DNA sequences known as microsatellites.

Salient Features of Human Genome-

 Some of the salient observations drawn from human genome project are as follows :

1. The human genome contains 3164.7 million nucleotide bases.

2. The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.

3. The total number of genes is estimated at 30,000 much lower than previous estimates of 80,000 to 1,40,000 genes. Almost all (99.9 per cent) nucleotide bases are exactly the same in all people.

5. The functions are unknown for over 50 percent of discovered genes.

6. Less than 2 percent of the genome codes for proteins.

7. Repeated sequences make up very large portion of the human genome.

8. Repetitive sequences are stretches of DNA sequences that are repeated many times, sometimes hundred to thousand times. They are thought to have no direct coding functions, but they shed light on chromosomes structure, dynamics and evolution.

9. Chromosome 1 has most genes (2968). and the Y has the fewest (231).

10. Scientists have identified about 1.4 million locations where single base DNA differences (SNPS- single nucleotide polymorphism, pronounced as 'snips) Occur in humans. This information promises to revolutionise the processes of finding chromosomal locations for disease-associated sequences and tracing human history.


(a) First prokaryotes in which complete genome was sequenced is Haemophilus influenzae.

(b) First Eukaryote in which complete genome was sequenced is Saccharomyces cerviceae (Yeast).

(c) First plant in which complete genome was sequenced is Arabidopsis thaliana (Small mustard plant).

(d) First animal in which complete genome was sequenced is Caenorhabditis elegans (Nematode).

  β-globin and insulin gene are less than 10 kilo base pair T.D.F. gene is the smallest gene (14 base pair) and Duchenne muscular Dystrophy gene is made up of 2400 kilo base pair.(Longest gene)




Thank you....🤞

0 comments:

Post a Comment

If you have any doubts. Please let me know.