Skip to content

BIOINFORMATICS

METADATA

The teacher is not a biologist, but a computer scientist.

ProfessorSimeoni Marta
Rate9/10

EXAM

Written and oral (if you pass the written part) or 2 partials.

  • Written part (core of the exam): Open questions and exercises.
  • (15 minutes, 3 points, i.e. 1 week of work) oral part or assignment to present in class (group in 2/3 people)
    • depends on the teacher (probably it will be an assingment), we will know this a mid course (november)

In short: Or:

  • Written part -> oral
  • Presentation + Written part
  • written part + presentation

Materials

There is no a textbook to buy.

  • Introduction to algorithms (Cormen)
  • Algorithms on strings (Gusfield)
  • Biologicalsequence analysis, probabilistic models and nucleic acids (Durbin)
  • Bioinformatics-sequence and genome analysis (Mount)

Intro

Important to remember

  • Bioinformatics applications generally have to deal with:
    • enormous amount of data
    • data which contains redunadncy and errors
    • imcomplete information
    • rules with many exceptions
    • heuristic solutions
  • It is nevessary to have a deep knowledge of the biological problem to be solved.
  • It is necessary to work together with biologists, statisticians, ...
  • It is a difficult process beacause we don't speak the same language.

Bioinformatics syllabus

  • Introduction to molecualr biolagy
    • DNA, RNA, Proteins
  • Exact sequence search
    • pattern preprocessing (FA algorithm, KMP algorithm)
    • text preprocessing (suffix trees, suffix array)
  • Approximate sequence search
    • Pair alignment (heuristics, scores, exact methods for global and local alignment)
    • Multiple alignment
  • Genome sequencing
    • Techlogies and techniques (encode some partes of molecules in strings)
  • Protein structure determination
    • introduction
  • Phylogenetic trees
    • characteristics and inference methods (compare genes of different species)
  • Sequence modelling
    • HMM for modelling biological data
  • Systems biology
    • general introduction and examples

Biological background

Outline

  • DNA and genome
  • Proteins

Metaphor: DNA as software

  • Software -> a component specialized to contain data and commands to interpret then (DNA)
  • Hardware -> a component able to interpret it and to traslate it iinto actions (cell).

DNA

DNA (Deoxyribose Nucleic Acid) is the nucleic acid which encodes the genetic program of both prokaryotes (bacteria and archea) and eukaryotes (animales and plant)

Trasforming principle

Experiment:

  • non patogenous bacteria -> mouse -> alive
  • patogenous bacteria -> mouse -> dead
  • killed patogenous bacteria with heat -> mouse -> alive
  • killed patogenous bacteria with heat + non patogenous bacteria -> mouse -> dead

This experiment showed that the transforming principle is the DNA. Previuosly they believed that proteins were responsible for carrying the genetic information.

Dna is a long polymer made from repeating units called nucleotides or bases 4 (loong chain of billion bases):

DNA/RNA bases:

  • A: Adenine
  • C: Cytosine
  • G: Guanine
  • T: Thymine (only in DNA)
  • U: Uracil (only in RNA)

The DNA is a bounded by two strands of nucleotides which form a double helix. The bases are in the middle of the helix and they are bounded by hydrogen bonds. The hydrogen can be broken easily. Strands: are antiparallel. They are made by a sugar-phosphate backbone.

Pairs: A-T C-G

This means if we know one helix we can know the other one.

DNA replication

DNA polymerase is the enzyme which is responsible for the replication of DNA. It is able to read a DNA strand and to create a new one which is complementary to the original one.