Important tips for future Updates
2009-07-22
- In the very beginning, clean-up the ID of the large fasta file (the concatenation of Uniprot, JGI, Tair):
No pipes in the first word (fastacmd skrews-up with pipes)
- Make this operation formal so that it is reversible, capable of recovering the initial fasta file without any meta-data
- In all the file names that contain gene family names, cushion them with dashes so that they have equal length
Cellwall 2.0 Database
Currently we are working on re-populating the database. See Cellwall-database
Bioinformatics
Contents
Updating a Single Family against new Protein Sequences
See also Arabidopsis-GEN240B-project
Generating a Multiple Alignment of a Family
- Download on the FASTA file for one of the families
I chose 3.1.1 Expansins (EXP) so my link was http://bioweb.ucr.edu/Cellwall/family.pl?action=download_Family&format=fasta&family_id=13
I saved it to http://biocluster.ucr.edu/~alevchuk/cellwall/one-family-exercise/expansins-cellwall2005.fasta
- Install T-Coffee 7.81
- Perform the following compute task
t_coffee -output=fasta_aln expansins-cellwall2005.fasta
Completed in 65 minutes. Result are here http://biocluster.ucr.edu/~alevchuk/cellwall/one-family-exercise/
Building the Markov Model
- Locate the Multiple Alignment file
- Run
hmmbuild expansins-cellwall2005.hmm expansins-cellwall2005.aln.fasta
- Run
hmmcalibrate expansins-cellwall2005.hmm
Now there should be an "EVD" value in the hmm file
Searching against one large FASTA source
- Run
hmmpfam --acc -E 0.1 -T 0.1 --domE 0.1 expansins-cellwall2005.hmm ~/.html/cellwall/source_2009-04-10_fasta_tair-v20080412_pep--- | ~/.html/cellwall/scripts/filter-hmmpfam-output > 2009-04-10_fasta_tair-v20080412_pep---_results
filter-hmmpfam-output is a simple script that block out all the useless output.
View or Download: http://biocluster.ucr.edu/~alevchuk/cellwall/scripts/filter-hmmpfam-output
Now you recovered some family members: http://biocluster.ucr.edu/~alevchuk/cellwall/one-family-exercise/2009-04-10_fasta_tair-v20080412_pep---_results
Looks good. Time to run on all publicly available peptide sequences
Searching against all large FASTA sources
Put something here
Old Webapp management
Tried perl -MCPAN -e "install DBD::Pg" as suggested on http://articles.techrepublic.com.com/5100-10878_11-6039937.html
- Updating bioweb:/etc/perl/CPAN/Config.pm