Version 1.0 Release Date: 03-28-06 Authors: George Karypis (karypis AT cs.umn.edu) Huzefa Rangwala (rangwala AT cs.umn.edu) LICENSE: Please read LICENSE.txt Please Cite: Huzefa Rangwala and George Karypis, Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 2005 21(23):4239-4247 DOCUMENTATION There are a series of steps for using the profile-based kernels for remote-homology and fold recognition. This directory has the various codes to achieve these steps. The end result of these steps will be a precomputed kernel matrix which you could plug into SVM for discrimminatory learning. 1. Generating profiles for set of sequences. 2. Running aasim1.0 to compute similarity-matrices 3. Running a matlab script to convert the matrix into a valid kernel matrix. ---------------------------------------------------------------------- Step1.0 Running modified psiblast code Note our kernel computation program takes a modified PSSM than the one generated by psiblast. Currently we provide binaries of this code which exists in ./psi-blast-bin/ Typical Options are similar to the PSI_BLAST code and can be referred to in the NCBI's website. We generate our pssms "blastpgp -d nr -j 3 -i $infile -Q $pssmfile -o $outfile -J" We will make this code available in the future. ---------------------------------------------------------------------- Step 2.0 : To build : Enter GKlib (make) Enter aasim (make) * Exectutable will be in the bin directory * - For running aasim1.0 we need a directory with the precomputed alignments from step1. We need an input file which lists these sequence pssm ids. Please look at the example in the test directory. There is a file which has the test ids as test.all and an alignment directory ./alignments/ Usage: aasim [options] Required parameters infile Stores the filenames of the sequences in the database. Optional parameters -simtype=string Specifies the method to be used for computing similarity between the sequences. The possible values are: kmer-pssm - A AF-PSSM model sw-pssm - A SW-based model using a PSSM scoring matrix bkmer-pssm - A BF-PSSM model abkmer-pssm - A BV-PSSM model -wmer=int Specifies the length of the wmer. The default value is 2. -gocost=float Specifies the gap openning cost. The default value is 10. -gecost=float Specifies the gap extension cost. The default value is 2. -shift=float Specifies the value to be added to the p2pscore. Default is 0.0. -help Prints this message. ------------------------------------------------------------------------------- Example aasim1.0 -simtype=sw-pssm -gocost=3.0 -gecost=0.75 -shift=1.5 -sntype=none test.all will run the SW-PSSM alignment with gap opening and extension costs 3.0 & 0.75 respectively. The shift parameter will be set to 1.5. All the other options are enables as well -------------------------------------------------------------------------------- Step 3.0 This script is in ./aasim/matlab Running the Matlab Script to convert the matrix generated from aasim1.0 to a valid kernel matrix. ----------------------------------------------------------------------------- Having trouble, bugs, suggestions: Contact rangwala@cs.umn.edu ---------------------------------------------------------------------------