Python for Bioinformatics

Python for Bioinformatics is a book written by Sebastian Bassi, leader of the DNALinux distribution and BioPython development team member. This post has some notes from the book. The book itself has much more information.

• Some scientific Python adaptations include Python(x,y) and Enthought Python Distribution.

• Methods associated with strings include: replace, count, find, index, split

• You can add elements into a list by using append, insert, or extend.

• You can remove elements from a list by using pop, remove, or del.

• Common properties for working with lists, tuples, and strings include: indexing, slicing, the in keyword, concatenation, len/max/min, list()

• You can work with dictionaries by using the keys() and values() methods.

• Sets are created with the instruction set()

• The intersection() operator gets the common elements in two sets.

• Flow control structures: if-else, for loop, while loop

• Break escapes from a loop structure.

• Some examples of programs using control structures are estimating the net charge of a protein and searching for a low degeneration zone.

• Reading files is a three step process:

1. Open the file.

2. Read the file.

3. Close the file.

• The os module is involved in file system operations.

• Examples of programs that involve working with files include consolidating multiple DNA or protein sequences in one FASTA file, as well as estimating net charge of several proteins.

• An example of object-oriented bioinformatics programming in Python might be creating a class named Plasmid that inherits methods and properties from a class named Sequence, with methods and attributes such as ABres (which describes whether the plasmid has resistance to a particular antibiotic) and AbResDict (which has information of antibiotic-resistant regions).

• The re module is involved in working with regular expressions.

• Some key modules in BioPython include:

1. Alphabet: works with alphabets such as DNA and amino acids

2. Seq: composed of the sequence and an alphabet

3. MutableSeq: works with Seq objects, which are not mutable

4. SeqRecord: stores metadata about a sequence

5. Align: deals with sequence alignments

6. AlignInfo: used for extracting information from alignment objects

7. ClustalW: has classes and functions for interacting with ClustalW

8. SeqIO: interface to input and output sequence file formats

9. AlignIO: input/output interface for alignments

10. BLAST: for sequence similarity search

11. Data: works with built-in biological data

12. Entrez: integrates with NCBI databases

13. PDB: works with information on 3D molecular structures

14. PROSITE: works with protein database information

15. Restriction: for working with restriction enzymes

16. SeqUtils: for working with DNA and protein sequences

17. Sequencing: works with sequence data

18. SwissProt: works with protein sequence database information

• Some examples of Python bioinformatics applications included in the book (with source code) are:

Sequence manipulation in batch

Web application for filtering vector contamination

Searching for PCR primers using Primer3

Calculating melting temperature from a set of primers

Filtering out specific fields from a Genbank file

Converting XML BLAST file into HTML

Inferring splicing sites

DNA mutations with restrictions

Web server for multiple alignment

Drawing marker positions using data stored in a database

Comments are closed.