Python for Bioinformatics is a book written by Sebastian Bassi, leader of the DNALinux distribution and BioPython development team member. This post has some notes from the book. The book itself has much more information.
• Some scientific Python adaptations include Python(x,y) and Enthought Python Distribution.
• Methods associated with strings include: replace, count, find, index, split
• You can add elements into a list by using append, insert, or extend.
• You can remove elements from a list by using pop, remove, or del.
• Common properties for working with lists, tuples, and strings include: indexing, slicing, the in keyword, concatenation, len/max/min, list()
• You can work with dictionaries by using the keys() and values() methods.
• Sets are created with the instruction set()
• The intersection() operator gets the common elements in two sets.
• Flow control structures: if-else, for loop, while loop
• Break escapes from a loop structure.
• Some examples of programs using control structures are estimating the net charge of a protein and searching for a low degeneration zone.
• Reading files is a three step process:
1. Open the file.
2. Read the file.
3. Close the file.
• The os module is involved in file system operations.
• Examples of programs that involve working with files include consolidating multiple DNA or protein sequences in one FASTA file, as well as estimating net charge of several proteins.
• An example of object-oriented bioinformatics programming in Python might be creating a class named Plasmid that inherits methods and properties from a class named Sequence, with methods and attributes such as ABres (which describes whether the plasmid has resistance to a particular antibiotic) and AbResDict (which has information of antibiotic-resistant regions).
• The re module is involved in working with regular expressions.
• Some key modules in BioPython include:
1. Alphabet: works with alphabets such as DNA and amino acids
2. Seq: composed of the sequence and an alphabet
3. MutableSeq: works with Seq objects, which are not mutable
4. SeqRecord: stores metadata about a sequence
5. Align: deals with sequence alignments
6. AlignInfo: used for extracting information from alignment objects
7. ClustalW: has classes and functions for interacting with ClustalW
8. SeqIO: interface to input and output sequence file formats
9. AlignIO: input/output interface for alignments
10. BLAST: for sequence similarity search
11. Data: works with built-in biological data
12. Entrez: integrates with NCBI databases
13. PDB: works with information on 3D molecular structures
14. PROSITE: works with protein database information
15. Restriction: for working with restriction enzymes
16. SeqUtils: for working with DNA and protein sequences
17. Sequencing: works with sequence data
18. SwissProt: works with protein sequence database information
• Some examples of Python bioinformatics applications included in the book (with source code) are:
Sequence manipulation in batch
Web application for filtering vector contamination
Searching for PCR primers using Primer3
Calculating melting temperature from a set of primers
Filtering out specific fields from a Genbank file
Converting XML BLAST file into HTML
Inferring splicing sites
DNA mutations with restrictions
Web server for multiple alignment
Drawing marker positions using data stored in a database