Home

This project is part of the 2007 Phyloinformatics Summer of Code which is part of the Google Summer of Code project. This web page will serve as the central resource for information relation to the project "A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes" that is being developed by Jamie Estill.

Jamie will use PERL to create a set of command line programs for topological queries in BioSQL. The goal of this project is to create an interface that is suitable for high throughput creation and modification of SQL based phylogenies. I will use this interface to further my research on the classification of plant LTR retrotransposons.

Table of Contents News Project Overview System Requirements phyinit - Initialize a database Synopsis Full Documentation Source Relevant existing code phyimport - Import trees into PhyloDB Synopsis Full Documentation Source Test Input Files Relevant existing code phyexport - Export PhyloDB trees Synopsis Full Documentation Source Relevant existing code phyopt - Compute optimization values Synopsis Full Documentation Source Relevant existing code: phyreport - Print report of information for a tree Synopsis Full Documentation Source Relevant existing code phymod - Modify PhyloDB trees Synopsis Full Documentation Source Relevant existing code References

News

Finally loaded the scripts to biosql-schema/scripts --Jestill 16:23, 23 October 2007 (EDT)
PhyInit now available from the project source code repository--Jestill 14:57, 1 June 2007 (EDT)
Started a Project Blog -- 23 April 2007 (EDT)
This page started -- 19 April 2007 (EDT)

Project Overview

I will be coding in PERL and will be using MySQL as the development RDBMS. I will use standard PERL and BioPERL modules when available. These modules include: Bio::TreeIO, DBI, and Getopt::Std. The Felidae tree available from TreeBASE will be used as the test phylogeny dataset for the duration of the project. This test tree is of moderate size, has some named parent nodes, and includes one small comb. Phylogenies of LTR Retrotransposons will be created and stored in the database framework throughout the development process. These phylogenies will be quite large and will use individual occurrences of LTR retrotransposons as the OTUs. Development will assume a single phylogeny per database. The variables for database name, user name, user password, and host will have default values.

Student Homepage: James Estill

Mentor(s): Hilmar Lapp (primary), Weigang Qiu, Bill Piel, Mike Muratet (secondary)

Project Blog: phylosoc2007jestill.blogspot.com

Source code: code.google.com/p/phylosoc2007jestill

System Requirements

I will be adding to the system requirements as the project develops.

Perl
PERL DBI
BioPerl
The following modules are used:
- Bio::Tree:TreeI
Database:
- MySQL - 4.1 or newer
  Currently I am developing this only on MySQL. Nested SQL requires version 4.1 or better. I will be actively trying to make this compatible with the oldest version of MySQL possible.

Additional useful applications:

TreeView

phyinit - Initialize a database

Create PhyloDB tables and foreign keys.

Synopsis

  USAGE: phyinit.pl -d 'DBI:mysql:database=biosql;host=localhost' 
                    -u UserName -p dbPass

      REQUIRED ARGUMENTS:
        --dsn        # The DSN string for the DB connection
        --dbuser     # User name to connect with
        --dbpass     # User password to connect with
      ALTERNATIVE TO --dsn:
        --driver     # DB Driver "mysql", "Pg", "Oracle" 
        --dbname     # Name of database to use
        --host       # Host to connect with (ie. localhost)
      ADDITIONAL OPTIONS:
        --sqldir     # SQL Dir that contains the SQL to create tables
        --quiet      # Run the program in quiet mode.
        --verbose    # Run the program with maximum output
      ADDITIONAL INFORMATION:
        --version    # Show the program version     
        --usage      # Show program usage
        --help       # Show a short help message
        --man        # Show full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyinit

Source

phyinit.pl - PERL code to initialize the phylo tables in a database
biosql-phylodb-mysql.sql - MySQL schema for representing trees or networks

Relevant existing code

phyimport - Import trees into PhyloDB

This will initially support only a few "standard" formats and make use of the Bio::TreeIO module in BioPERL. It can therefore be extended by the open source community to include additional file formats as needed. The file formats supported initially will be the NEXUS and Newick formats.

Synopsis

  USAGE: phyimport.pl -d 'DBI:mysql:database=biosql;host=localhost' 
                      -u UserName -p dbPass -i InFilePath -f InFileFormat 

    REQUIRED ARGUMENTS:
        --dsn        # The DSN string for the DB connection
        --dbuser     # User name to connect with
        --dbpass     # User password to connect with
        --infile     # Full path to the tree file to import to the db
        --format     # "newick", "nexus" (default "newick")
    ALTERNATIVE TO --dsn:
        --driver     # DB Driver "mysql", "Pg", "Oracle" 
        --dbname     # Name of database to use
        --host       # Host to connect with (ie. localhost)
    ADDITIONAL OPTIONS:
        --tree       # Tree name to use
        --quiet      # Run the program in quiet mode.
        --verbose    # Run the program in verbose mode.
    ADDITIONAL INFORMATION:
        --version    # Show the program version     
        --usage      # Show program usage
        --help       # Print short help message
        --man        # Open full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyimport

Source

phyimport.pl - Perl code to import trees into database.

Test Input Files

randtree_26.tre
A randomly generated newick format tree. This was generated using the RandTree.pl script
nhx_example_unique.nhx
Modified from example nhx file at the atv documentation page. This file modified to have unique leaf node names.
nhx_example.nhx
Example nhx format file from the atv documentation page. This file currently breaks PhyImport due to unique name constraints in the database.
cats.nex
Example tree in NEXUS format. This is a real data tree downloaded from Treebase.

Relevant existing code

parseTreesPG.pl -uses an internal method to parse NEXUS files
Bio::TreeIO -creates Bio::Tree::TreeI objects
Bio::NEXUS-an object-oriented API to the NEXUS file format
Bio::Phylo::Adaptor::Bioperl::Tree Provides a bioperl compatible interface to Bio::Phylo tree objects
Newick Format
- Newick format description
- Newick Example: SSU_Euk_rep.newick
  phylogenetic tree data file for the set of 140 representative eukaryotic sequences with full sequence descriptions in newick format

phyexport - Export PhyloDB trees

This will initially support whole tree export in the formats given below. This will later be extended to export a single tree resulting from a query of the tree. This subset function will make use of the precomputed nested sets and transitive closure. This export will create trees that are able to be viewed in TreeView for visual inspection of branch IDs.

Synopsis

  USAGE: phyexport.pl

    REQUIRED ARGUMENTS:
        --dsn         # The DSN string the database to connect to
                      # Must conform to:
                      # 'DBI:mysql:database=biosql;host=localhost' 
        --dbuser      # User name to connect with
        --dbpass      # Password to connect with
        --outfile     # Full path to output file that will be created.
    ALTERNATIVE TO --dsn:
        --driver      # DB Driver "mysql", "Pg" "Oracle" 
        --dbname      # Name of database to use
        --host        # Host to connect with (ie. localhost)
    ADDITIONAL OPTIONS:
        --format      # "newick", "nexus" (default "newick")
        --tree        # Name of the tree to export
        --parent-node # Node to serve as root for a subtree export
        --help        # Print this help message
        --quiet       # Run the program in quiet mode.
        --db-node-id  # Preserve DB node names in export

Full Documentation

Full documentation is available at PhyloSoC:phyexport

Source

phyexport.pl - In progress

Relevant existing code

print-trees.pl
Bio::TreeIO -creates Bio::Tree::TreeI objects
Bio::Phylo::Forest::Node

phyopt - Compute optimization values

The phyopt program will optimize trees in a PhyloDB database by computing transitive closure paths as well as the left and right index values for the nested set indexes.

Synopsis

  USAGE: phyopt.pl -d 'DBI:mysql:database=biosql;host=localhost' 
                   -u UserName -p dbPass -t MyTree

    REQUIRED ARGUMENTS:
        --dsn        # The DSN string the database to connect to
                     # Must conform to:
                     # 'DBI:mysql:database=biosql;host=localhost' 
        --dbuser     # User name to connect with
        --dbpass     # Password to connect with
    ALTERNATIVE TO --dsn:
        --driver     # "mysql", "Pg", "Oracle" (default "mysql")
        --dbname     # Name of database to use
        --host       # optional: host to connect with
    ADDITIONAL OPTIONS:
        --tree       # Name of the tree to optimize.
                     # Otherwise the entire db is optimized.
        --quiet      # Run the program in quiet mode.
        --verbose    # Run the program in verbose mode.
    ADDITIONAL INFORMATION:
        --version    # Show the program version     
        --usage      # Show program usage
        --help       # Print short help message
        --man        # Open full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyopt

Source

phyopt.pl

Relevant existing code:

phyreport - Print report of information for a tree

Return a standard set of information for a given tree or for the entire database. This will return a standard set of information including (1) number of leaf nodes, (2) node IDs and names of terminal taxa etc. The output will be printed to an output file path.

Synopsis

  Usage: phyreport.pl -o PhyloDbReport.txt

    REQUIRED ARGUMENTS:
        --dsn         # The DSN string the database to connect to
                      # Must conform to:
                      # 'DBI:mysql:database=biosql;host=localhost' 
        --dbuser      # User name to connect with
        --dbpass      # Password to connect with
        --outfile     # Full path to output file that will be created.
    ALTERNATIVE TO --dsn:
        --driver      # DB Driver "mysql", "Pg", "Oracle" 
        --dbname      # Name of database to use
        --host        # Host to connect with (ie. localhost)
    ADDITIONAL OPTIONS:
        --tree        # Name of the tree to report on
                      # Otherwise generate report for all trees
        --quiet       # Run the program in quiet mode.
        --verbose     # Run the program in verbose mode.
    ADDITIONAL INFORMATION:
        --version     # Show the program version     
        --usage       # Show program usage
        --help        # Print short help message
        --man         # Open full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyreport

Source

phyreport.pl

Relevant existing code

print-trees.pl

phymod - Modify PhyloDB trees

Modify an existing phylogeny in the database. This will use -x, -c, and -v as command line arguments to indicate remove branch(cut), move branch(copy), add branch(paste). The add branch function will at first assume that the user is attempting to add an additional tree from an external file source to an existing database. Future development will allow for cut or copy and paste from one tree in the database to another tree. The program will assume that the user knows the ID of the branch which will be removed or added to. All precomputed fields will be set to null following changes in tree topology. By default, this will attempt to warn the user before doing something stupid, however these warnings can be turned off with the quiet flag.

DELETE:
To request a delete query, simply specify a node to cut without providing another node to paste to. This will delete the target node and all child nodes. The node attributes, edges, and edge attributes will also be deleted from the database:

PhyMod -d dbName -u dbUserName -x RemoveNodeID[-h dbHost]

COPY AND PASTE:
Copy node from a source tree and place in the destination tree. If the destination tree name does not exist, a new tree will be created. Note: the source tree name is not required if the node id is passed since the node id is unique in the database.

PhyMod -d dbName -u dbUserName -c SourceNodeID -v DestinationBrachID -t DestinationTreeName[-h dbHost]

CUT AND PASTE:
Cut node from from the source tree and place in the destination tree. If the destination tree name does not exist, a new tree will be created. The data from the source tree will be deleted.

PhyMod -d dbName -u dbUserName -x CutNodeID -v DestNodeID [-h dbHost]

Synopsis

  Usage: phymod.pl

    REQUIRED ARGUMENTS:
        --dsn        # The DSN string for the DB connection
        --dbuser     # User name to connect with
        --dbpass     # User password to connect with
        --infile     # Full path to the tree file to import to the db
        --format     # "newick", "nexus" (default "newick")
    ALTERNATIVE TO --dsn:
        --driver     # DB Driver "mysql", "Pg", "Oracle" 
        --dbname     # Name of database to use
        --host       # Host to connect with (ie. localhost)
    ADDITIONAL OPTIONS:
        --tree       # Tree name to use
        --quiet      # Run the program in quiet mode.
        --verbose    # Run the program in verbose mode.
    ADDITIONAL INFORMATION:
        --version    # Show the program version     
        --usage      # Show program usage
        --help       # Print short help message
        --man        # Open full program manual

Full Documentation

Full documentation is available at: PhyloSoC:phymod

Source

phymod.pl - In progress

phymodwork.sql - In progress

Relevant existing code

References

The following references are relevant to the goals of this project. :

BioSQL wiki
BioPERL HOWTO:Trees
J. Celko. 2004. Joe Celko's Trees and Hierarchies in SQL for Smarties.
G. Dong, L. Libkin, J. Su, L. Wong. 1999. "Maintaining Transitive Closure of Graphs in SQL". Int. Journal of Information Technology
L. Nakhleh, D. Miranker, F. Barbancon, W. H. Piel, M. Donoghue. 2003. "Requirements of Phylogenetic Databases". Proceedings of the 3rd IEEE Symposium on Bioinformatics and Bioengineering.
R.D.M. Page. 2004. "Phyloinformatics: Towards a Phylogenetic Database." p 219-241 in Data Mining in Bioinformatics. Wang et al eds. Springer Berlin Heidelberg.
R.D.M. Page. 2005. "Towards a Taxonomically Intelligent Phylogenetic Database". Technical Reports in Taxonomy 04-01. Paper presented at Database Issues in Biological Databases, Edinburgh, Jan. 8-9, 2005.
R.D.M. Page. 2007. "TBMap: A taxonomic perspective on the phylogenetic database TreeBASE" BMC Bioinformatics. 8:158. Additonal Info: TBMAp Website

I would appreciate any input for further references --Jestill 12:19, 19 April 2007 (EDT)

Home

Table of Contents

News

Project Overview

System Requirements

phyinit - Initialize a database

Synopsis

Full Documentation

Source

Relevant existing code

phyimport - Import trees into PhyloDB

Synopsis

Full Documentation

Source

Test Input Files

Relevant existing code

phyexport - Export PhyloDB trees

Synopsis

Full Documentation

Source

Relevant existing code

phyopt - Compute optimization values

Synopsis

Full Documentation

Source

Relevant existing code:

phyreport - Print report of information for a tree

Synopsis

Full Documentation

Source

Relevant existing code

phymod - Modify PhyloDB trees

Synopsis

Full Documentation

Source

Relevant existing code

References

Clone this wiki locally