-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This project is part of the 2007 Phyloinformatics Summer of Code which is part of the Google Summer of Code project. This web page will serve as the central resource for information relation to the project "A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes" that is being developed by Jamie Estill.
Jamie will use PERL to create a set of command line programs for topological queries in BioSQL. The goal of this project is to create an interface that is suitable for high throughput creation and modification of SQL based phylogenies. I will use this interface to further my research on the classification of plant LTR retrotransposons.
- Finally loaded the scripts to biosql-schema/scripts --Jestill 16:23, 23 October 2007 (EDT)
- PhyInit now available from the project source code repository--Jestill 14:57, 1 June 2007 (EDT)
- Started a Project Blog -- 23 April 2007 (EDT)
- This page started -- 19 April 2007 (EDT)
I will be coding in PERL and will be using MySQL as the development RDBMS. I will use standard PERL and BioPERL modules when available. These modules include: Bio::TreeIO, DBI, and Getopt::Std. The Felidae tree available from TreeBASE will be used as the test phylogeny dataset for the duration of the project. This test tree is of moderate size, has some named parent nodes, and includes one small comb. Phylogenies of LTR Retrotransposons will be created and stored in the database framework throughout the development process. These phylogenies will be quite large and will use individual occurrences of LTR retrotransposons as the OTUs. Development will assume a single phylogeny per database. The variables for database name, user name, user password, and host will have default values.
Student Homepage: James Estill
Mentor(s): Hilmar Lapp (primary), Weigang Qiu, Bill Piel, Mike Muratet (secondary)
Project Blog: phylosoc2007jestill.blogspot.com
Source code: code.google.com/p/phylosoc2007jestill
I will be adding to the system requirements as the project develops.
- Perl
- PERL DBI
-
BioPerl
The following modules are used:- Bio::Tree:TreeI
- Database:
-
MySQL - 4.1 or newer
Currently I am developing this only on MySQL. Nested SQL requires version 4.1 or better. I will be actively trying to make this compatible with the oldest version of MySQL possible.
-
MySQL - 4.1 or newer
Create PhyloDB tables and foreign keys.
USAGE: phyinit.pl -d 'DBI:mysql:database=biosql;host=localhost' -u UserName -p dbPass REQUIRED ARGUMENTS: --dsn # The DSN string for the DB connection --dbuser # User name to connect with --dbpass # User password to connect with ALTERNATIVE TO --dsn: --driver # DB Driver "mysql", "Pg", "Oracle" --dbname # Name of database to use --host # Host to connect with (ie. localhost) ADDITIONAL OPTIONS: --sqldir # SQL Dir that contains the SQL to create tables --quiet # Run the program in quiet mode. --verbose # Run the program with maximum output ADDITIONAL INFORMATION: --version # Show the program version --usage # Show program usage --help # Show a short help message --man # Show full program manual
Full documentation is available at PhyloSoC:phyinit
- phyinit.pl - PERL code to initialize the phylo tables in a database
- biosql-phylodb-mysql.sql - MySQL schema for representing trees or networks
- create_mysql_db.pl
- biosqldb-mysql.sql
- biosqldb-pg.sql
- biosqldb-hsqldb.sql
- biosql-phylodb-pg.sql
- drop-tables.sql
This will initially support only a few "standard" formats and make use of the Bio::TreeIO module in BioPERL. It can therefore be extended by the open source community to include additional file formats as needed. The file formats supported initially will be the NEXUS and Newick formats.
USAGE: phyimport.pl -d 'DBI:mysql:database=biosql;host=localhost' -u UserName -p dbPass -i InFilePath -f InFileFormat REQUIRED ARGUMENTS: --dsn # The DSN string for the DB connection --dbuser # User name to connect with --dbpass # User password to connect with --infile # Full path to the tree file to import to the db --format # "newick", "nexus" (default "newick") ALTERNATIVE TO --dsn: --driver # DB Driver "mysql", "Pg", "Oracle" --dbname # Name of database to use --host # Host to connect with (ie. localhost) ADDITIONAL OPTIONS: --tree # Tree name to use --quiet # Run the program in quiet mode. --verbose # Run the program in verbose mode. ADDITIONAL INFORMATION: --version # Show the program version --usage # Show program usage --help # Print short help message --man # Open full program manual
Full documentation is available at PhyloSoC:phyimport
phyimport.pl - Perl code to import trees into database.
-
randtree_26.tre
A randomly generated newick format tree. This was generated using the RandTree.pl script -
nhx_example_unique.nhx
Modified from example nhx file at the atv documentation page. This file modified to have unique leaf node names. -
nhx_example.nhx
Example nhx format file from the atv documentation page. This file currently breaks PhyImport due to unique name constraints in the database. -
cats.nex
Example tree in NEXUS format. This is a real data tree downloaded from Treebase.
- parseTreesPG.pl -uses an internal method to parse NEXUS files
- Bio::TreeIO -creates Bio::Tree::TreeI objects
- Bio::NEXUS-an object-oriented API to the NEXUS file format
- Bio::Phylo::Adaptor::Bioperl::Tree Provides a bioperl compatible interface to Bio::Phylo tree objects
- Newick Format
- Newick format description
- Newick Example: SSU_Euk_rep.newick
phylogenetic tree data file for the set of 140 representative eukaryotic sequences with full sequence descriptions in newick format
This will initially support whole tree export in the formats given below. This will later be extended to export a single tree resulting from a query of the tree. This subset function will make use of the precomputed nested sets and transitive closure. This export will create trees that are able to be viewed in TreeView for visual inspection of branch IDs.
USAGE: phyexport.pl REQUIRED ARGUMENTS: --dsn # The DSN string the database to connect to # Must conform to: # 'DBI:mysql:database=biosql;host=localhost' --dbuser # User name to connect with --dbpass # Password to connect with --outfile # Full path to output file that will be created. ALTERNATIVE TO --dsn: --driver # DB Driver "mysql", "Pg" "Oracle" --dbname # Name of database to use --host # Host to connect with (ie. localhost) ADDITIONAL OPTIONS: --format # "newick", "nexus" (default "newick") --tree # Name of the tree to export --parent-node # Node to serve as root for a subtree export --help # Print this help message --quiet # Run the program in quiet mode. --db-node-id # Preserve DB node names in export
Full documentation is available at PhyloSoC:phyexport
phyexport.pl - In progress
- print-trees.pl
- Bio::TreeIO -creates Bio::Tree::TreeI objects
- Bio::Phylo::Forest::Node
The phyopt program will optimize trees in a PhyloDB database by computing transitive closure paths as well as the left and right index values for the nested set indexes.
USAGE: phyopt.pl -d 'DBI:mysql:database=biosql;host=localhost' -u UserName -p dbPass -t MyTree REQUIRED ARGUMENTS: --dsn # The DSN string the database to connect to # Must conform to: # 'DBI:mysql:database=biosql;host=localhost' --dbuser # User name to connect with --dbpass # Password to connect with ALTERNATIVE TO --dsn: --driver # "mysql", "Pg", "Oracle" (default "mysql") --dbname # Name of database to use --host # optional: host to connect with ADDITIONAL OPTIONS: --tree # Name of the tree to optimize. # Otherwise the entire db is optimized. --quiet # Run the program in quiet mode. --verbose # Run the program in verbose mode. ADDITIONAL INFORMATION: --version # Show the program version --usage # Show program usage --help # Print short help message --man # Open full program manual
Full documentation is available at PhyloSoC:phyopt
Return a standard set of information for a given tree or for the entire database. This will return a standard set of information including (1) number of leaf nodes, (2) node IDs and names of terminal taxa etc. The output will be printed to an output file path.
Usage: phyreport.pl -o PhyloDbReport.txt REQUIRED ARGUMENTS: --dsn # The DSN string the database to connect to # Must conform to: # 'DBI:mysql:database=biosql;host=localhost' --dbuser # User name to connect with --dbpass # Password to connect with --outfile # Full path to output file that will be created. ALTERNATIVE TO --dsn: --driver # DB Driver "mysql", "Pg", "Oracle" --dbname # Name of database to use --host # Host to connect with (ie. localhost) ADDITIONAL OPTIONS: --tree # Name of the tree to report on # Otherwise generate report for all trees --quiet # Run the program in quiet mode. --verbose # Run the program in verbose mode. ADDITIONAL INFORMATION: --version # Show the program version --usage # Show program usage --help # Print short help message --man # Open full program manual
Full documentation is available at PhyloSoC:phyreport
Modify an existing phylogeny in the database. This will use -x, -c, and -v as command line arguments to indicate remove branch(cut), move branch(copy), add branch(paste). The add branch function will at first assume that the user is attempting to add an additional tree from an external file source to an existing database. Future development will allow for cut or copy and paste from one tree in the database to another tree. The program will assume that the user knows the ID of the branch which will be removed or added to. All precomputed fields will be set to null following changes in tree topology. By default, this will attempt to warn the user before doing something stupid, however these warnings can be turned off with the quiet flag.
DELETE:
To request a delete query, simply specify a node to cut without providing another node to paste to. This will delete the target node and all child nodes. The node attributes, edges, and edge attributes will also be deleted from the database:
PhyMod -d dbName -u dbUserName -x RemoveNodeID[-h dbHost]
COPY AND PASTE:
Copy node from a source tree and place in the destination tree. If the destination tree name does not exist, a new tree will be created. Note: the source tree name is not required if the node id is passed since the node id is unique in the database.
PhyMod -d dbName -u dbUserName -c SourceNodeID -v DestinationBrachID -t DestinationTreeName[-h dbHost]
CUT AND PASTE:
Cut node from from the source tree and place in the destination tree. If the destination tree name does not exist, a new tree will be created. The data from the source tree will be deleted.
PhyMod -d dbName -u dbUserName -x CutNodeID -v DestNodeID [-h dbHost]
Usage: phymod.pl REQUIRED ARGUMENTS: --dsn # The DSN string for the DB connection --dbuser # User name to connect with --dbpass # User password to connect with --infile # Full path to the tree file to import to the db --format # "newick", "nexus" (default "newick") ALTERNATIVE TO --dsn: --driver # DB Driver "mysql", "Pg", "Oracle" --dbname # Name of database to use --host # Host to connect with (ie. localhost) ADDITIONAL OPTIONS: --tree # Tree name to use --quiet # Run the program in quiet mode. --verbose # Run the program in verbose mode. ADDITIONAL INFORMATION: --version # Show the program version --usage # Show program usage --help # Print short help message --man # Open full program manual
Full documentation is available at: PhyloSoC:phymod
phymod.pl - In progress
phymodwork.sql - In progress
The following references are relevant to the goals of this project. :
- BioSQL wiki
- BioPERL HOWTO:Trees
- J. Celko. 2004. Joe Celko's Trees and Hierarchies in SQL for Smarties.
- G. Dong, L. Libkin, J. Su, L. Wong. 1999. "Maintaining Transitive Closure of Graphs in SQL". Int. Journal of Information Technology
- L. Nakhleh, D. Miranker, F. Barbancon, W. H. Piel, M. Donoghue. 2003. "Requirements of Phylogenetic Databases". Proceedings of the 3rd IEEE Symposium on Bioinformatics and Bioengineering.
- R.D.M. Page. 2004. "Phyloinformatics: Towards a Phylogenetic Database." p 219-241 in Data Mining in Bioinformatics. Wang et al eds. Springer Berlin Heidelberg.
- R.D.M. Page. 2005. "Towards a Taxonomically Intelligent Phylogenetic Database". Technical Reports in Taxonomy 04-01. Paper presented at Database Issues in Biological Databases, Edinburgh, Jan. 8-9, 2005.
- R.D.M. Page. 2007. "TBMap: A taxonomic perspective on the phylogenetic database TreeBASE" BMC Bioinformatics. 8:158. Additonal Info: TBMAp Website