1st Edition

Managing Your Biological Data with Python

    560 Pages 33 B/W Illustrations
    by Chapman & Hall

    560 Pages
    by Chapman & Hall

    Take Control of Your Data and Use Python with Confidence

    Requiring no prior programming experience, Managing Your Biological Data with Python empowers biologists and other life scientists to work with biological data on their own using the Python language. The book teaches them not only how to program but also how to manage their data. It shows how to read data from files in different formats, analyze and manipulate the data, and write the results to a file or computer screen.

    The first part of the text introduces the Python language and teaches readers how to write their first programs. The second part presents the basic elements of the language, enabling readers to write small programs independently. The third part explains how to create bigger programs using techniques to write well-organized, efficient, and error-free code. The fourth part on data visualization shows how to plot data and draw a figure for an article or slide presentation. The fifth part covers the Biopython programming library for reading and writing several biological file formats, querying the NCBI online databases, and retrieving biological records from the web. The last part provides a cookbook of 20 specific programming "recipes," ranging from secondary structure prediction and multiple sequence alignment analyses to superimposing protein three-dimensional structures.

    Tailoring the programming topics to the everyday needs of biologists, the book helps them easily analyze data and ultimately make better discoveries. Every piece of code in the text is aimed at solving real biological problems.

    Getting Started
    The Python Shell
    In This Chapter You Will Learn
    Story: Calculating the ΔG of ATP Hydrolysis
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Your First Python Program
    In This Chapter You Will Learn
    Story: How to Calculate the Frequency of Amino Acids from Insulin
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Data Management
    Analyzing a Data Column
    In This Chapter You Will Learn
    Story: Dendritic Lengths
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Parsing Data Records
    In This Chapter You Will Learn
    Story: Integrating Mass Spectrometry Data into Metabolic Pathways
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Searching Data
    In This Chapter You Will Learn
    Story: Translating an RNA Sequence into the Corresponding Protein Sequence
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Filtering Data
    In This Chapter You Will Learn
    Story: Working with RNA-Seq Output Data
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Managing Tabular Data
    In This Chapter You Will Learn
    Story: Determining Protein Concentrations
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Sorting Data
    In This Chapter You Will Learn
    Story: Sort a Data Table
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Pattern Matching and Text Mining
    In This Chapter You Will Learn
    Story: Search a Phosphorylation Motif in a Protein Sequence
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Modular Programming
    Divide a Program into Functions

    In This Chapter You Will Learn
    Story: Working with Three-Dimensional Coordinate Files
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Managing Complexity with Classes
    In This Chapter You Will Learn
    Story: Mendelian Inheritance
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Debugging
    In This Chapter You Will Learn
    Story: When Your Program Does Not Work
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Using External Modules: The Python Interface to R
    In This Chapter You Will Learn
    Story: Reading Numbers from a File and Calculating Their Mean Value Using R with Python
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Building Program Pipelines
    In This Chapter You Will Learn
    Story: Building an NGS Pipeline
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Writing Good Programs
    In This Chapter You Will Learn
    Problem Description: Uncertainty
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Data Visualization
    Creating Scientific Diagrams

    In This Chapter You Will Learn
    Story: Nucleotide Frequencies in the Ribosome
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Creating Molecule Images with PyMOL
    In This Chapter You Will Learn
    Story: The Zinc Finger
    Seven Steps to Create a High-Resolution Image
    Examples
    Testing Yourself

    Manipulating Images
    In This Chapter You Will Learn
    Story: Plot a Plasmid
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Biopython
    Working with Sequence Data

    In This Chapter You Will Learn
    Story: How to Translate a DNA Coding Sequence into the Corresponding Protein Sequence and Write It to a FASTA File
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Retrieving Data from Web Resources
    In This Chapter You Will Learn
    Story: Searching Publications by Keywords in PubMed, Downloading the Corresponding Records, and Writing Papers Published in a Given Year to a File
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Working with 3D Structure Data
    In This Chapter You Will Learn
    Story: Extracting Atom Names and Three-Dimensional Coordinates from a PDB File
    What Do the Commands Mean?
    Examples
    Testing Yourself

    Cookbook
    Recipe 1: The PyCogent Library
    Recipe 2: Reversing and Randomizing a Sequence
    Recipe 3: Creating a Random Sequence with Probabilities
    Recipe 4: Parsing Multiple Sequence Alignments Using Biopython
    Recipe 5: Calculating a Consensus Sequence from a Multiple Sequence Alignment
    Recipe 6: Calculating the Distance between Phylogenetic Tree Nodes
    Recipe 7: Codon Frequencies in a Nucleotide Sequence
    Recipe 8: Parsing RNA 2D Structures in the Vienna Format
    Recipe 9: Parsing BLAST XML Output
    Recipe 10: Parsing SBML Files
    Recipe 11: Running BLAST
    Recipe 12: Accessing, Downloading, and Reading Web Pages in Python
    Recipe 13: Parsing HTML Files
    Recipe 14: Split a PDB File into PDB Chain Files
    Recipe 15: Find the Two Closest Cα Atoms in a PDB Structure
    Recipe 16: Extract the Interface between Two PDB Chains
    Recipe 17: Building Homology Models Using Modeller
    Recipe 18: RNA 3D Homology Modeling with ModeRNA
    Recipe 19: Calculating RNA Base Pairs from a 3D Structure
    Recipe 20: A Real Case of Structural Superimposition: The Serine Protease Catalytic Triad

    Appendix A: Command Overview
    Appendix B: Python Resources
    Appendix C: Record Samples
    Appendix D: Handling Directories and Programs with UNIX

    Biography

    Allegra Via, Kristian Rother, Anna Tramontano

    “… a significant step forward … The book is cleverly designed to cover a wide range of subjects in a pleasant, easy-to-follow sequence of chapters. These have been carefully prepared so that the minimum level of interdependence is kept, making it possible to begin working at virtually any level without falling into intricate cross-references. A beginner will find the first chapters quite welcoming while a person with medium or even high levels of programming experience can easily find a suitable entry point in the middle.
    The book is written using an entertaining style that pushes the reader into a naturally built engaging experience … the authors have chosen a collection of underlying subject areas that cover a very wide variety of interests, ensuring that mixed audiences are kept engaged. In that sense, the content becomes adaptable to the wide diversity of learners that are found in today's communities of specialised biologists.
    … also usable as a reference guide, due to the richness of its worked examples that will prove valuable as seeds for code development for programmers at any level. … as a single book to support learning Python for problem solvers in the life sciences, this book is certainly a very smart choice. It is also ready for creative teachers to develop more in the same direction.”
    —Pedro L. Fernandes, Instituto Gulbenkian de Ciência

    "Having read Managing Your Biological Data with Python brings back memories of the times I started writing my first lines of code nearly a decade ago. As a beginning structural biologist without any coding experience, this book would have been a welcome companion to quickly get me started on my bioinformatical projects with Python. It is this, often pragmatic, attitude scientists have towards programming that makes Python the language of choice for many. A clear syntax, powerful build-in functions and a lively ecosystem of user contributed modules allow you to do advanced things with only little lines of code. 
    The book introduces you to the basic principles of programming in Python using the many build-in functions. It does so using practical examples that you can start using right away in your day-to-day research.
    Python’s modular design principles could even be seen in the organization of this book. If you have never written a line of code in your life, the first chapters are indispensable to teach you basic coding principles but if you have some experience, you can safely skip these. I would however, recommend to read the ones introducing the build-in functions. It never hurts to refresh your memory on the many powerful build-ins Python actually has; I certainly forgot about one or two of them. Working your way through the first chapters will help you get comfortable with Python and lay the foundation for writing more advanced programs in the remaining chapters. These chapters introduce some of the powerful community contributed Python modules that make your life as a biologist a whole lot easier. Again, the example code introducing these modules is of high practical value and together with the coding recipes in the ‘cookbook’ chapter they provide a solid blueprint for you to build your own code upon.
    I’m confident that reading Managing Your Biological Data with Python will quickly allow you to get the most out of your data and start answering those trilling scientific questions you have, and do all of that while having fun. "
    —Marc van Dijk, Structural biologist, bioinformaticien, and eScience entrepreneur, Bijvoet Center for Biomolecular Research, Utrecht University, The Netherlands

    "For many biologists faced with computational challenges, Python has become the language of choice, due to its power, elegance, and simplicity. Managing Your Biological Data with Python by Allegra Via et al. teaches Python using biological examples and discusses important Python-driven applications, such as PyMol and Biopython. The book is an excellent resource for any biologist needing relevant programming skills."
    —Thomas Hamelryck, Associate Professor, Bioinformatics Center, University of Copenhagen, Denmark

    "Biological data volumes are growing rapidly as high-throughput technologies (e.g., DNA microarrays or DNA/RNA sequencing) improve. Managing and analyzing biological data are becoming more demanding and the application of programming techniques has simply become a standard. Managing Your Biological Data with Python is one of very few user-friendly books for biologists. It is amazing how clearly authors explain the possible applications of Python for data management (parsing data records, filtering and sorting data) and data visualization (also using the Python interface to R). The book also offers the description of modular programming, which is simply excellent! It guides readers from writing simple functions through writing classes to building program pipelines—everything according to Python coding standards and in an easy-to-follow way. This is absolutely the best book to start learning Python. Intermediate Python users can use this book to learn some new tricks that they could implement in their own code. I can highly recommend this book to researchers, students, and their lecturers."
    —Dr. Barbara Uszczynska, Centre de Regulació Genòmica (CRG), Barcelona, Spain