Full_lengther_NEXT v0.0.8 database setup and run

2015/11/13 02:27:22

Full_lengther_next is an excellent program to test if you EST assembly are full-length or not, or test how much percentage it can get.

for the 1st step of analysis, you probably needs a database for the program to blast+. the built-in script ‘download_fln_dbs.rb’ is designed to setup the database for you. Basically, the script would download taxonomy database from uniprot, splice database from uniprot, and non-coding RNA database from SCBI, filter out the non-full-length sequences, and make BLAST+ database for BLAST+ program. Unfortunately, the script would report NET::FTP or gzip problems.

I ran ‘download_fln_dbs.rb’, and got error like:

550 Permission denied. (Net::FTPPermError)

This is probably duo to errors either from ruby NET::FTP and uniprot FTP sever. (Sorry I am a PERLer, NOT a RUBYer).

and if you get a gzip error,

gzip *.gz not found

probably you need to change you environment variable BLASTDB in ~/.bashrc to a format like:

BLASTDB=/path/to/your/blastdb/

The last letter ‘/’is needed to find /path/to/your/blastdb/.gz files downloaded from uniprot, or else if will find /path/to/your/blastdb.gz files.

 

OK, let’s start to setup half-manually:

 

1. Setup environment variable BLASTDB if you did NOT do this

to test if you already had one:

$ echo $BLASTDB

If it returns a path like /path/to/your/blastdb, ignore this step

$ vim ~/.bashrc

#add one more line at the end of file

#press letter i or a

#NOTE: change /path/to/your/blastDB/ to a specified path depending on your machine, like: $HOME/blastdb

export BLASTDB=/path/to/your/blastDB/

#Esc and :wq in vim to save the changes

#log out-and-in or use source ~/.bashrc to take effect

#It seems no effect just using shell ‘export BLASTDB=/path/to/blastdb, no idea why. I knew it will detect $ENV[‘BLASTDB’], but it still  shows the path in ./.bashrc

2. Download necessary files from uniprot:

$ mkdir –p $BLASTDB

$ cd $BLASTDB

$ wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_trembl_plants.dat.gz

$ wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_plants.dat.gz

$ wget ftp://ftp.uniprot.org//pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot_varsplic.fasta.gz

#setup non-coding database

$ mkdir $BLASTDB/nc_rna_db

$ cd $BLASTDB/nc_rna_db

$ wget http://www.scbi.uma.es/downloads/FLNDB/ncrna_fln_100.fasta.zip

$ unzip ncrna_fln_100.fasta.zip

$ mv ncrna_fln_100.fasta ncrna_fln_100.fasta.oldID

#change seqids

$ perl -ne ‘BEGIN {$seqid=ncrna0000000001;} if (/^>/) {chomp; s/^>//; $=”>”.$seqid.” “.$; print $_, “\n”;$seqid++;}else {print;}’ ncrna_fln_100.fasta.oldID > ncrna_fln_100.fasta

make blast DB

$ makeblastdb -in ncrna_fln_100.fasta -dbtype nucl -parse_seqids

3. find download_fln_dbs.rb  and edit

The script should not be detected using which cmd

#it either locates in /var/lib/gems//gems/full_lengther_next-0.0.8/bin/ or $HOME/.gem/ruby//gems/full_lengther_next-0.0.8/bin/

#open it with editor:

#go to line 202

my_array = [“human”,”fungi”,”invertebrates”,”mammals”,”plants”,”rodents”,”vertebrates”]

#Add # from left, to make it inactive

#my_array = [“human”,”fungi”,”invertebrates”,”mammals”,”plants”,”rodents”,”vertebrates”]

#and edit the following line:

my_array = [“plants”,”human”]

#Remove # and unnecessary species, like:

my_array = [“plants”]

I used PLANTS and download PLANTS uniprot database as showed in DOWNLOAD section

comment the following line to:

#conecta_uniprot(my_array, formatted_db_path)

#This line it used to down load the uniprot database, which usually report some NET::FTP error

#and edit following line to:

system(‘gunzip ‘File.join(formatted_db_path, ‘uniprot*.gz’))

#Haha, Just learning RUBY. Ruby looks like python. That will fix the gzip uncompress error and avoid to uncompress all the GZ files in your $BLASTDB

#

#find line below:

download_ncrna(formatted_db_path)

#and comment with #, to

#download_ncrna(formatted_db_path)

#this will inactivate the downloading of NON-CODING RNA database, which can not successfully create the BLAST+ database using makeblastdb, guess some special letters in seqids

And then execute download_fln_dbs.rb

$ download_fln_dbs.rb

You will have 3 folders:

$ ls $BLASTDB/tr_plants

tr_plants.fasta tr_plants.fasta.00.pog tr_plants.fasta.00.psq tr_plants.fasta.01.pog tr_plants.fasta.01.psq

tr_plants.fasta.00.phr tr_plants.fasta.00.psd tr_plants.fasta.01.phr tr_plants.fasta.01.psd tr_plants.fasta.pal

tr_plants.fasta.00.pin tr_plants.fasta.00.psi tr_plants.fasta.01.pin tr_plants.fasta.01.psi

$BLASTDB/sp_plants

sp_plants.fasta sp_plants.fasta.pin sp_plants.fasta.psd sp_plants.fasta.psq

sp_plants.fasta.phr sp_plants.fasta.pog sp_plants.fasta.psi

$BLASTDB/nc_rna_db

ncrna_fln_100.fasta ncrna_fln_100.fasta.nin ncrna_fln_100.fasta.nsd ncrna_fln_100.fasta.nsq

ncrna_fln_100.fasta.nhr ncrna_fln_100.fasta.nog ncrna_fln_100.fasta.nsi

Run full_length_next as normal:

$ full_lengther_next -fasta $fastafile -taxon_group plants

I am still testing, at least I had database successfully created. Waiting for further error report. Will Update this post once have any updates

Aha… Have fun….Hope it helps