Building Multiclass Classifiers for Remote Homology Detection and Fold Recognition
By Huzefa Rangwala (rangwala AT cs . umn dot edu) & George Karypis (karypis@ cs. umn dot edu)
sf95 and sf40 are setup for remote homology detection (RH), whereas
fd25 and fd40 are setup for the fold recognition problem (FD)
Dataset | Sequences in fasta format |
Sequence Identfiers for dataset* |
Test Set Identifiers |
Class Definitions |
sf40 |
SCOP
v 1.67 (Astral 40 %) |
Identifiers
(1119 sequences) |
Identifiers
(238 sequences) |
RH
Identifiers (37 superfamilies) |
fd25 |
SCOP
v 1.67 (Astral 25%) |
Identifiers
(1294 sequences) |
Identifiers
(278 sequences) |
FD
Identifiers (25 folds) |
fd40 |
SCOP v 1.67 (Astral 40 %) | Identifiers(1651 sequences) | Identifiers
(344 sequences) |
FD
Identifiers (27 folds) |
CODES
The
programs we used in this study were