Full_lengther_NEXT v0.0.8 database setup and run
2015/11/13 02:27:22
Full_lengther_next is an excellent program to test if you EST assembly are full-length or not, or test how much percentage it can get.
for the 1st step of analysis, you probably needs a database for the program to blast+. the built-in script ‘download_fln_dbs.rb’ is designed to setup the database for you. Basically, the script would download taxonomy database from uniprot, splice database from uniprot, and non-coding RNA database from SCBI, filter out the non-full-length sequences, and make BLAST+ database for BLAST+ program. Unfortunately, the script would report NET::FTP or gzip problems.
I ran ‘download_fln_dbs.rb’, and got error like:
550 Permission denied. (Net::FTPPermError)
This is probably duo to errors either from ruby NET::FTP and uniprot FTP sever. (Sorry I am a PERLer, NOT a RUBYer).
and if you get a gzip error,
gzip *.gz not found
probably you need to change you environment variable BLASTDB in ~/.bashrc to a format like:
The last letter ‘/’is needed to find /path/to/your/blastdb/.gz files downloaded from uniprot, or else if will find /path/to/your/blastdb.gz files.
OK, let’s start to setup half-manually:
1. Setup environment variable BLASTDB if you did NOT do this
to test if you already had one:
$ echo $BLASTDB
If it returns a path like /path/to/your/blastdb, ignore this step
$ vim ~/.bashrc
#add one more line at the end of file
#press letter i or a
#NOTE: change /path/to/your/blastDB/ to a specified path depending on your machine, like: $HOME/blastdb
export BLASTDB=/path/to/your/blastDB/
#Esc and :wq in vim to save the changes
#log out-and-in or use source ~/.bashrc to take effect
#It seems no effect just using shell ‘export BLASTDB=/path/to/blastdb, no idea why. I knew it will detect $ENV[‘BLASTDB’], but it still shows the path in ./.bashrc
2. Download necessary files from uniprot:
$ mkdir –p $BLASTDB
$ wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_trembl_plants.dat.gz
#setup non-coding database
$ mkdir $BLASTDB/nc_rna_db
$ cd $BLASTDB/nc_rna_db
$ wget http://www.scbi.uma.es/downloads/FLNDB/ncrna_fln_100.fasta.zip
$ unzip ncrna_fln_100.fasta.zip
$ mv ncrna_fln_100.fasta ncrna_fln_100.fasta.oldID
#change seqids
$ perl -ne ‘BEGIN {$seqid=ncrna0000000001;} if (/^>/) {chomp; s/^>//; $=”>”.$seqid.” “.$; print $_, “\n”;$seqid++;}else {print;}’ ncrna_fln_100.fasta.oldID > ncrna_fln_100.fasta
make blast DB
$ makeblastdb -in ncrna_fln_100.fasta -dbtype nucl -parse_seqids
3. find download_fln_dbs.rb and edit
The script should not be detected using which cmd
#it either locates in /var/lib/gems//gems/full_lengther_next-0.0.8/bin/ or $HOME/.gem/ruby//gems/full_lengther_next-0.0.8/bin/
#open it with editor:
#go to line 202
my_array = [“human”,”fungi”,”invertebrates”,”mammals”,”plants”,”rodents”,”vertebrates”]
#Add # from left, to make it inactive
#my_array = [“human”,”fungi”,”invertebrates”,”mammals”,”plants”,”rodents”,”vertebrates”]
#and edit the following line:
my_array = [“plants”,”human”]
#Remove # and unnecessary species, like:
my_array = [“plants”]
I used PLANTS and download PLANTS uniprot database as showed in DOWNLOAD section
comment the following line to:
#conecta_uniprot(my_array, formatted_db_path)
#This line it used to down load the uniprot database, which usually report some NET::FTP error
#and edit following line to:
system(‘gunzip ‘File.join(formatted_db_path, ‘uniprot*.gz’))
#Haha, Just learning RUBY. Ruby looks like python. That will fix the gzip uncompress error and avoid to uncompress all the GZ files in your $BLASTDB
#find line below:
#and comment with #, to
#this will inactivate the downloading of NON-CODING RNA database, which can not successfully create the BLAST+ database using makeblastdb, guess some special letters in seqids
And then execute download_fln_dbs.rb
$ download_fln_dbs.rb
You will have 3 folders:
$ ls $BLASTDB/tr_plants
tr_plants.fasta tr_plants.fasta.00.pog tr_plants.fasta.00.psq tr_plants.fasta.01.pog tr_plants.fasta.01.psq
tr_plants.fasta.00.phr tr_plants.fasta.00.psd tr_plants.fasta.01.phr tr_plants.fasta.01.psd tr_plants.fasta.pal
tr_plants.fasta.00.pin tr_plants.fasta.00.psi tr_plants.fasta.01.pin tr_plants.fasta.01.psi
sp_plants.fasta sp_plants.fasta.pin sp_plants.fasta.psd sp_plants.fasta.psq
sp_plants.fasta.phr sp_plants.fasta.pog sp_plants.fasta.psi
ncrna_fln_100.fasta ncrna_fln_100.fasta.nin ncrna_fln_100.fasta.nsd ncrna_fln_100.fasta.nsq
ncrna_fln_100.fasta.nhr ncrna_fln_100.fasta.nog ncrna_fln_100.fasta.nsi
Run full_length_next as normal:
$ full_lengther_next -fasta $fastafile -taxon_group plants
I am still testing, at least I had database successfully created. Waiting for further error report. Will Update this post once have any updates
Aha… Have fun….Hope it helps