StarFunc: When Old-School Biology and Deep Learning Had a Baby That Outperformed Both Parents

DeepMind won a Nobel Prize for predicting protein shapes. Meta trained ESM2 on 250 million protein sequences. Google poured resources into AlphaFold databases covering basically every known protein on Earth. And yet, a team of three researchers at the University of Michigan just showed that none of these deep learning heavyweights, on their own, can beat a method that also borrows tricks from the bioinformatics playbook circa 2005.

Meet StarFunc. It came 5th out of 1,625 teams in the world's largest protein function prediction competition. Its secret weapon? Refusing to pick a side.

The Problem Nobody Talks About at AI Conferences

Here's a number that should bother you: we know the sequences of over 240 million proteins. We have experimentally confirmed functions for less than 0.3% of them. That's like having a library with 240 million books where someone has read fewer than 720,000 of them - and you need to figure out what the rest are about by looking at the covers.

Protein function prediction is the task of guessing what a protein actually does in a cell, categorized using the Gene Ontology (GO) - a massive structured vocabulary with over 100,000 terms covering molecular activities, biological processes, and cellular locations. It's a multi-label classification problem on steroids, where predicting one label implies all its parent labels in a directed acyclic graph. Fun stuff.

Two Tribes, One Problem

The field has split into two camps. The template crowd says: "If a protein looks like something we already know, it probably does the same thing." They use sequence homology, structural similarity, protein-protein interaction networks, and domain family databases. Solid reasoning. Works great until you hit an orphan protein with no known relatives - roughly 30-40% of all sequences.

The deep learning crowd says: "Just throw a protein language model at it." Methods like ESM2 and ProtT5 learn patterns from millions of sequences and can make predictions even for those loner proteins. Impressive, but they sometimes miss rare functions and struggle with the long tail of GO terms that appear in only a handful of training examples.

Zhang, Liu, and Freddolino looked at both camps and said: "Why not both?" (Zhang et al., 2025).

Five Ingredients, One Random Forest to Rule Them All

StarFunc runs five independent prediction pipelines and combines them using random forest classifiers:

Structural threading against the AlphaFold database and BioLiP - finding proteins with similar 3D shapes and stealing their annotations
Sequence homology via searches against UniProt-GOA - the classic "this protein looks like that protein" approach
Protein-protein interaction partners from the STRING database - guilt by association
Pfam domain families - identifying functional building blocks within the protein
InterLabelGO, a deep learning model using ESM2 embeddings with a clever loss function that handles the messy reality of imbalanced, interdependent GO labels (Liu & Zhang, 2024)

The structural component is where things get interesting. Before AlphaFold predicted structures for 214 million proteins and earned Demis Hassabis and John Jumper the 2024 Nobel Prize in Chemistry (Jumper et al., 2021), structure-based function prediction was limited to around 200,000 experimentally solved structures. Now StarFunc can perform structural threading at genomic scale - something that would have sounded ridiculous five years ago.

The Scoreboard Doesn't Lie

StarFunc was tested in CAFA5, the 5th Critical Assessment of Function Annotation - basically the Olympics of protein function prediction, hosted on Kaggle with 1,625 teams from 96 countries. StarFunc placed 5th overall, with its weighted F-measure (Fmax) running 12% higher than the second-best approach in independent benchmarks. For context, the top finisher was GOCurator from Fudan University (also an ensemble method), and the 2nd place ProtBoost from Institut Curie combined protein language model embeddings with gradient boosting (Chervov et al., 2024).

Here's the kicker: StarFunc's standalone deep learning component, InterLabelGO+, placed 6th by itself. Adding the template-based pipelines bumped it up to 5th and dramatically improved accuracy. The templates aren't dead weight from a bygone era - they're carrying information that neural networks simply don't capture on their own.

Why This Matters Beyond Benchmarks

If you're thinking "cool, another leaderboard paper," consider this: understanding what proteins do is fundamental to drug discovery, disease understanding, and synthetic biology. Tools like mapb2.io can help researchers visually map out the complex relationships between protein functions, but the raw predictions still need to come from somewhere. StarFunc provides pre-computed predictions for the entire human reference proteome and offers a free web server where you can submit your own protein structures.

The deeper lesson is one machine learning keeps relearning across domains: ensemble approaches that combine fundamentally different information sources tend to beat any single paradigm. A review of deep learning methods for protein function prediction (Boadu et al., 2025) and evaluations of protein language model strategies (Frontiers in Bioengineering, 2025) both point to the same conclusion - the best results come from methods that don't put all their eggs in one basket.

The Bottom Line

StarFunc isn't flashy. It doesn't have a catchy brand name backed by a trillion-dollar company. It's three people at a university combining proven bioinformatics techniques with modern deep learning, glued together by random forests. And it works better than approaches that cost orders of magnitude more to develop.

Sometimes the right answer isn't choosing between the old way and the new way. It's making them work a shift together.

References

Zhang, C., Liu, Q., & Freddolino, L. (2025). StarFunc: Fusing Template-based and Deep Learning Approaches for Accurate Protein Function Prediction. Genomics, Proteomics & Bioinformatics. DOI: 10.1093/gpbjnl/qzag018. PMID: 41942113
Liu, Q. & Zhang, C. (2024). InterLabelGO+: Unraveling Label Correlations in Protein Function Prediction. Bioinformatics, 40(11), btae655. DOI: 10.1093/bioinformatics/btae655. PMID: 39499152
Jumper, J. et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583-589. DOI: 10.1038/s41586-021-03819-2
Chervov, A. et al. (2024). ProtBoost: Protein Function Prediction with Py-Boost and Graph Neural Networks. arXiv: 2412.04529
Kulmanov, M. et al. (2024). Protein Function Prediction as Approximate Semantic Entailment. Nature Machine Intelligence, 6, 220-228. DOI: 10.1038/s42256-024-00795-w
Boadu, F. et al. (2025). Deep learning methods for protein function prediction. Proteomics. DOI: 10.1002/pmic.202300471. PMID: 38996351

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded