Westfield Students in the News for Gene Annotations

>Sample Gene from Chlamydia trachomatis
ATGACAGAGTCATATGTAAACAAAGAAGAAATCATCTCTTTAGCAAAGAA
TGCTGCATTGGAGTTGGAAGATGCCCACGTGGAAGAGTTCGTAACATCTA
TGAATGACGTCATTGCTTTAATGCAGGAAGTAATCGCGATAGATATTTCG
GATATCATTCTTGAAGCTACAGTGCATCATTTCGTTGGTCCAGAGGATCT
TAGAGAAGACATGGTGACTTCGGATTTTACTCAAGAAGAATTTTTATCTA
ACGTTCCCGTGTCGTTGGGAGGATTAGTCAAAGTCCCTACAGTTATCAAA
TAG

3 letters are called a codon. Each codon codes for an amino acid. The amino acids form a protein, which serves a specific purpose.

Scientists have given each amino acid a one letter code.

>amino acid sequence of the above Chlamydia trachomatis gene
MTESYVNKEEIISLAKNAALELEDAHVEEFVTSMNDVIALMQEVIAIDIS
DIILEATVHHFVGPEDLREDMVTSDFTQEEFLSNVPVSLGGLVKVPTVIK

Convert DNA sequence to Amino Acid Sequence. Amino acids also have chemical properties, such as polar, neutral, basic, acidic, or hydrophobic. Polar Amino Acid-able to participate in hydrogen bonding. Hydrophilic. Glycine (G), Serine (S), Threonine … — Convert DNA sequence to Amino Acid Sequence.
Amino acids also have chemical properties, such as polar, neutral, basic, acidic, or hydrophobic.
Polar Amino Acid-able to participate in hydrogen bonding. Hydrophilic. Glycine (G), Serine (S), Threonine (T), Tyrosine (Y) and Cysteine (C).
Neutral Polar Amino acid.- amide side chain do NOT produce basic solutions. , proton donor or proton acceptor, Hydrophilic, Asparagine (N) and glutamine (Q)
Basic Amino acid -Polar, Raise pH., Hydrophilic, nitrogen side chains, proton acceptor, form positive charges. Lysine (K), Arginine (R) and Histidine (H).
Acidic Amino acid-Polar, lower pH, Hydrophilic, Carboxylic side chain, proton donor, form negative charges. Aspartic Acid (D) and Glutamic acid (E).
Hydrophobic amino acid-”water Fearing” found buried in the core of a protein. side chains composed mostly of carbon and hydrogen. Alanine (A) , Isoleucine ( I), Leucine (L), Methionine (M ), Phenylalanine ( F), Valine (V), Proline ( P) and
Glycine (G).

CODON Converter

ATG -Methionine (M) Start Codon

ACA-Threonine (T)  polar amino acid.

GAG-Glutamic Acid (E) Acidic amino acid

TCA-Serine (S)    Polar Amino Acid

TAT-Tyrosine (Y) Polar Amino acid

GTA-Valine (V) Hydrophobic (water fearing)

AAC-Asparagine (N) Neutral amino acid

AAA -Lysine (K)-Basic amino acid

GAA- Glutamic acid (E)-Acidic  amino acid

GAA-Glutamic Acid (E)-Acidic amino acid

ATC -Isoleucine (I)  Hydrophobic amino acid

ATC-Isoleucine (I) Hydrophobic amino acid.

TCT-Serine  (S)  Polar amino acid

TTA-Leucine (L) Hydrophobic

GCA-Alanine (A) Hydrophobic

AAG-Lysine  (K) Basic

AAT-Asparagine (N) Neutral 
GCT-Alanine (A) Hydrophobic

GCA-Alanine (A)Hydrophobic

TTG-Leucine (L) Hydrophobic

GAG-Glutamic acid (E)  Acidic

TTG-Leucine (L)  Hydrophobic

GAA -Glutamic acid (E) Acidic

GAT-Aspartic acid (D) Acidic

GCC- Alanine (A) Hydrophobic

CAC - Histidine (H) Basic

GTG -Valine (V) Hydrophobic

GAA- Glutamic Acid (E) Acidic 
GAG-Glutamic Acid (E)  Acidic

TTC-Phenylalanine (F)  Hydrophobic

GTA-Valine (V)Hydrophobic

ACA-Threonine (T) Polar

TCT-Serine (S) Polar
ATG -Methionine (M) Hydrophobic

AAT- Asparagine (N) Neutral

GAC-Aspartic Acid (D) Acidic

GTC-Valine (V) Hydrophobic

ATT-Isoleucine (I) Hydrophobic

GCT -Alanine (A) Hydrophobic

TTA -Leucine (L) Hydrophobic

ATG -Methionine(M) Hydrophobic

CAG-Glutamine (Q) Neutral

GAA-Glutamic acid (E) Acidic

GTA-Valine (V) Hydrophobic

ATC-Isoleucine (I) Hydrophobic

GCG-Alanine (A) Hydrophobic

ATA  -Isoleucine (I) Hydrophobic

GAT -Aspartic Acid (D)  Acidic

ATT -Isoleucine (I) Hydrophobic

TCG -Serine (S)  Polar
GAT Aspartic Acid (D) Acidic

ATC Isoleucine (I) Hydrophobic

ATT -Isoleucine (I) Hydrophobic

CTT -Leucine (L) Hydrophobic

GAA - Glutamic acid (E) Acidic

GCT -Alanine (A) Hydrophobic

ACA-Threonine (T) Polar

GTG-Valine (V) Hydrophobic

CAT' -Histidine (H)  Basic

CAT -Histidine (H) Basic

TTC-Phenylalanine (F) Hydrophobic

GTT-Valine (V) Hydrophobic

GGT-Glycine (G) Polar

CCA-Proline (P) Hydrophobic

GAG-Glutamic Acid (E)  Acidic

GAT-Aspartic Acid (D)Acidic

CTT-Leucine (L) Hydrophobic
AGA-Arginine (R) Basic

GAA-Glutamic acid (E) Acidic

GAC-Aspartic Acid (D)  Acidic

ATG-Methionine (M) Hydrophobic

GTG-Valine (V) Hydrophobic

ACT-Threonine (T) Polar

TCG-Serine (S) Polar

GAT-Aspartic acid (D) Acidic

TTT-Phenylalanine (F) Hydrophobic

ACT-Threonine (T) Polar

CAA-Glutamine (Q) Neutral

GAA-Glutamic acid (E) Acidic

You may try completing the rest.

GAA- _________

TTT- _________

TTA- _________

TCT- _________

AAC- _________
GTT- __________

CCC- __________

GTG- ___________

TCG- ___________

TTG -____________

GGA Glycine (G) Polar

GGA-Glycine (G)  Polar

TTA -Leucine (L) Hydrophobic

GTC-Valine (V) Hydrophobic

AAA-Lysine (K) Basic

GTC -Valine (V)  Hydrophobic

CCT -Proline (P) Hydrophobic

ACA-Threonine (T) Polar

GTT Valine (V)  Hydrophobic

ATC -Isoleucine (I) Hydrophobic

AAA-Lysine (K) Basic 
TAG-STOP Codon *

What would this protein look like?

Source: http://www.rcsb.org/structure/3QH7

Important Information

Basic DNA YouTube Videos

What is DNA and How Does it Work?

What is a gene?

Mr. DNA from Jurassic Park

Module 1: Basic Information

The Locus Tag, Sequence Coordinates, DNA Sequences DNA Sequence Length, Amino acid sequence and Amino acid sequence length for his/her gene will be added to your gene notebook.

PDF of Instructions.

Instructional Video 1

Instructional Video 2

Module 2: Sequence-based Similarity Data Module

It answers the question: Is the protein you are annotating similar to other known proteins? This involves pasting the sequence into websites and learning how to interpret the results.

PDF of Instructions

BLAST Training video

BLAST

BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.

CDD Training Video

CDD (Conserved Domain Database) is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins.

T-COFFEE Training Video

T-Coffee Multiple Sequence Alignment

T-Coffee is a multiple sequence alignment program.

WEBlogo Training video

WebLogo

WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible.

Training Video Module 2 version 2

BLACK-Hydrophobic A,V, L, I P, W, F & M.

RED-Acidic D&,E

BLUE-Basic K,R, & H.

GREEN -Polar G, S, T, Y & C.

PURPLE- Neutral Q & N

N-Terminus=(also known as the amino-terminus, NH2-terminus, N-terminal end or amine-terminus) is the start of a protein or polypeptide referring to the free amine group (-NH2) located at the end of a polypeptide.

C-Terminus-(also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain protein orpolypepotide), terminated by a free carboxyl (-COOH).

Interpretation: This WebLogo is more conserved at the C- terminus than the N-terminus. .

Module 3. Structure-based Evidence Module:

Is the protein you are annotating functionally similar to other known proteins?

PDF of Instructions

TIGR FAMS website

GENI-ACT - TIGRFAM Video

TigerFam and Pfam video

TIGR FAMS supports searches of protein sequence against a database of hidden Markov models (HMMs) based upon protein families.

Pfam website

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).

Protein Data Bank

pfam Video

The Vision of the PDB is to enable open access to the accumulating knowledge of 3D structure, function, and evolution of biological macromolecules, expanding the frontiers of fundamental biology, biomedicine, and biotechnology.

PDB Video

Module 4. Cellular Localization:

Is the protein you are annotating located in the cytoplasm of the cell, embedded in the cytoplasmic membrane or secreted?

PDF of Instructions

PubMed Search

Prediction of transmembrane helices in proteins

TMHMM and SignalP Video

TMHMM is a method for prediction transmembrane helices based on a hidden Markov model

Signal Protein Predictor

The SignalP 5.0 server predicts the presence of signal peptides and the location of their cleavage sites in proteins from Archaea, Gram-positive Bacteria, Gram-negative Bacteria and Eukarya.

Predict lipoprotein signal peptides in Gram-negative Eubacteria

Predict whether your protein is found within the cytoplasm, an integral membrane protein or a secreted protein.

Phobius: A combined transmembrane topology and signal peptide predictor

Phobius website

Phobius Video

PsortB: most precise bacterial localization prediction tool available.

PsortB website

pSORTb Video

Module 6: Enzymatic Function.

Is the protein you are annotating an enzyme, if so, what is its function?

PDF of Instructions.

KEGG pathway database

KEGG PATHWAY is a collection of manually drawn pathway maps representing our knowledge on the molecular interaction, reaction and relation networks for:

1. Metabolism
2. Genetic Information Processing
3. Environmental Information Processing
4. Cellular Processes
5. Organismal Systems
6. Human Diseases
7. Drug Development

MetaCyc pathway

MetaCyc is a curated database of experimentally elucidated metabolic pathways from all domains of life.

ExPASy Enzyme

ENZYME is a repository of information relative to the nomenclature of enzymes.

Module 8: Horizontal Gene Transfer

Did the bacteria get the gene from another organism?

BLAST

T-Coffee website

NCBI Taxonomy

France Phylogenetic Tree site

IMG/JGI site

Module 5: Alternative Open reeading Frame

Did the gene caller call the start codon correctly?

If not, what is the correct start codon?

PDF of Instructions

JGI IMG HOME

Vocabulary:

Shine-Dalgarno sequence- (5′-AGGAGGU-3′) ribosomal binding site in bacterial messenger RNA , generally located around 8 bases upstream of the start codon.

Get Involved

Donate Now

Westfield Students in the News for Gene Annotations

Selection of Gene Annotation Research Posters by Westfield Students

Geni-ACT.org

Guiding Education through Novel Investigation-Academic Collaboration Toolkit

Important Information

Basic DNA YouTube Videos

Module 1: Basic Information

Module 2: Sequence-based Similarity Data Module

Module 3. Structure-based Evidence Module:

Module 4. Cellular Localization:

Module 6: Enzymatic Function.

Module 8: Horizontal Gene Transfer

Module 5: Alternative Open reeading Frame

Get Involved

Funded by an Education Impact Mini Grant from the United Way of Northern Chautauqua County