Heng Yang

Research Areas

Exploring the frontiers of AI, Genomics, and Software Engineering

Large Language Models

Innovating with LLMs by creating frameworks like InstOptima for multi-objective instruction evolution and RNADesign-GRPO for reinforcement learning-based RNA sequence design.

LLM Instruction-tuning InstOptima Genetic Algorithms

Genomic Foundation Models

Pioneering AI4Science research by developing Genomic Foundation Models (e.g., OmniGenome) to tackle challenges like sequence sparsity and SNVs, significantly improving RNA structure prediction and sequence design.

AI4Science Genomics OmniGenome RNA Design

Aspect-Based Sentiment Analysis

Creator of PyABSA, a widely-used open-source toolkit for fine-grained sentiment analysis, supporting over 30 models and datasets and simplifying the research-to-application pipeline.

NLP PyABSA Sentiment Analysis Open Source

Adversarial ML & Fairness

Investigating the intersection of adversarial robustness and model fairness, revealing that attacks can mitigate bias and that adversarial training can form a Pareto front between accuracy and fairness.

Robustness Fairness Adversarial Attack AdvFairness

Software Defect Prediction

Developing the LMDP framework, which leverages Pre-trained Language Models for more accurate, line-level software defect prediction, outperforming traditional AST/GNN-based methods.

Code PLM Defect Prediction Software Engineering LMDP

Text Data Augmentation

Author of BoostAug, a novel text augmentation technique that uses global feature distribution and instance filtering to consistently improve performance across various NLP tasks.

Data Augmentation BoostAug NLP ACL 2023

Selected Publications

Contributing to top-tier conferences and journals

2025

Bridging Sequence-Structure Alignment in RNA Foundation Models

Heng Yang, Ke Li

AAAI 2025 (CCF-A)

Paper Code

2024

PlantRNA-FM: An Interpretable RNA Foundation Model for Exploration Functional RNA Motifs in Plants

Haopeng Yu#, Heng Yang#, Wenqing Sun, Zongyun Yan, Xiaofei Yang, Huakun Zhang, Yiliang Ding, Ke Li

Nature Machine Intelligence (Co-first Author)

Paper Code

2024

MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction

Heng Yang, Ke Li

EMNLP 2024 (CCF-B)

Paper Code

2024

The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples

Heng Yang, Ke Li

EMNLP 2024 (CCF-B)

Paper Code

2024

DaNuoYi: Evolutionary Multi-Task Injection Testing on Web Application Firewalls

Ke Li#, Heng Yang#, Willem Visser

IEEE Transactions on Software Engineering (CCF-A)

Paper Code

2024

Modeling Aspect Sentiment Coherency via Local Sentiment Aggregation

Heng Yang, Ke Li

EACL 2024 (CORE-A)

Paper Code

2024

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

Heng Yang, Jack Cole, Ke Li

arXiv Preprint

Paper Code

2024

Robustness Meets Fairness: Investigating Adversarial Attack Effects on Alleviating Model Bias

Heng Yang, Geyong Min, Ke Li

OpenReview Preprint

Paper Code

2023

InstOptima: Evolutionary Multi-objective Instruction Optimization via LLM-based Instruction Operators

Heng Yang, Ke Li

EMNLP 2023 (CCF-B)

Paper Code

2023

BoostAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

Heng Yang, Ke Li

ACL 2023 (CCF-A)

Paper Code

2023

PyABSA: A Modularized Framework for Reproducible Aspect-Based Sentiment Analysis

Heng Yang, Chen Zhang, Ke Li

CIKM 2023 (CCF-B)

Paper Code

2020

A Multi-task Learning Model for Chinese-oriented Aspect Polarity Classification and Aspect Term Extraction

Heng Yang, Biqing Zeng

Neurocomputing (CCF-C)

Paper

Preprint

Tokenization or Featurization? Leveraging Language Models for Code Defect Prediction

Heng Yang, Ke Li

Preprint

Code

Publication Venues

Nature Machine Intelligence ACL EMNLP CIKM EACL IEEE Transactions on Software Engineering Neurocomputing

Featured Hugging Face Models & Spaces

State-of-the-art models and interactive demos

View all Models & Spaces on Hugging Face

150+ Likes & Favorites

deberta-v3-base-absa-v1.1

Aspect-Based Sentiment Analysis

A top-performing, lightweight model for fine-grained sentiment analysis. Featured in Stanford AI Index Report 2022 as the leading open-source ABSA model.

Hugging Face 1900000+ Downloads

AAAI 2025

OmniGenome-186M

RNA Foundation Model

Revolutionary sequence-structure alignment model that dramatically improved RNA design success rates from 3% to 84%, setting new standards in computational biology.

Hugging Face 67000+ Downloads

Nature

PlantRNA-FM

Plant RNA Foundation Model

An interpretable foundation model for discovering functional RNA motifs in plants, advancing agricultural biotechnology and crop improvement research. Nature Machine Intelligence

Hugging Face 35M Parameters

EMNLP 2024

MP-RNA

Multi-Species RNA Model

Advanced multi-species RNA foundation model with calibrated secondary structure prediction capabilities, enabling cross-species RNA analysis.

Hugging Face 186M Parameters

CIKM 2023

PyABSA Space

Interactive Aspect-Based Sentiment Analysis

A comprehensive, interactive demo hub for ABSA. Featured as an official demo by Gradio-Blocks.

Hugging Face Space 142000+ Visits

arXiv 2024

OmniGenBench

Genomic Foundation Models Leaderboard

The official online leaderboard, allowing researchers to easily trial the benchmark framework.

Hugging Face Space First platform for GFMs

EMNLP 2024

Rapid Textual Adversarial Defense

Textual Adversarial Attack & Defense Demo

An interactive demo for textual adversarial attack and defense.

Hugging Face Space

AI Art

Super-Resolution-Anime-Diffusion

Anime Image Super-Resolution Demo

An interactive demo for anime image super-resolution using diffusion models.

Hugging Face Space 1 Million+ Visits 81+ Likes

Featured GitHub Projects

Open-source tools making AI accessible

PyABSA

Modularized framework for reproducible aspect-based sentiment analysis with pre-trained models and comprehensive benchmarks.

500K+ Downloads 1041+ Stars

Python NLP Transformers

OmniGenBench

Automated large-scale benchmarking framework for genomic foundation models, enabling comprehensive evaluation across multiple tasks and datasets.

53K+ Downloads 351+ Stars

Python Genomics Benchmarking

FindFile

Intelligent file searching tool with advanced filtering and pattern matching capabilities.

GitHub

Python CLI Tool

MetricVisualizer

Comprehensive visualization toolkit for machine learning metrics and model performance analysis.

GitHub

Python Visualization

BoostAug

Data augmentation library for boosting machine learning model performance with intelligent augmentation strategies.

GitHub

Python Data Augmentation

PlantRNA-FM

Interpretable RNA foundation model for exploring functional RNA motifs in plants, published in Nature Machine Intelligence.

Hugging Face

RNA Analysis Foundation Model

InstOptima

Evolutionary multi-objective instruction optimization via LLM-based instruction operators.

GitHub

LLM Instruction Optimization

AdvFairness

A framework to investigate the effects of adversarial attacks on alleviating model bias.

GitHub

Adversarial ML Fairness

CodeT5DefectDetection

Leveraging language models for code defect prediction at the line-level.

GitHub

Software Engineering Code PLM

DaNuoYi

Evolutionary multi-task injection testing on Web Application Firewalls (WAFs).

GitHub

Cybersecurity WAF Testing

AI Researcher & PhD Student

Research Areas

Large Language Models

Genomic Foundation Models

Aspect-Based Sentiment Analysis

Adversarial ML & Fairness

Software Defect Prediction

Text Data Augmentation

Selected Publications

Bridging Sequence-Structure Alignment in RNA Foundation Models

PlantRNA-FM: An Interpretable RNA Foundation Model for Exploration Functional RNA Motifs in Plants

MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction

The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples

DaNuoYi: Evolutionary Multi-Task Injection Testing on Web Application Firewalls

Modeling Aspect Sentiment Coherency via Local Sentiment Aggregation

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

Robustness Meets Fairness: Investigating Adversarial Attack Effects on Alleviating Model Bias

InstOptima: Evolutionary Multi-objective Instruction Optimization via LLM-based Instruction Operators

BoostAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

PyABSA: A Modularized Framework for Reproducible Aspect-Based Sentiment Analysis

A Multi-task Learning Model for Chinese-oriented Aspect Polarity Classification and Aspect Term Extraction

Tokenization or Featurization? Leveraging Language Models for Code Defect Prediction

Publication Venues

Featured Hugging Face Models & Spaces

deberta-v3-base-absa-v1.1

OmniGenome-186M

PlantRNA-FM

MP-RNA

PyABSA Space

OmniGenBench

Rapid Textual Adversarial Defense

Super-Resolution-Anime-Diffusion

Featured GitHub Projects

PyABSA

OmniGenBench

FindFile

MetricVisualizer

BoostAug

PlantRNA-FM

InstOptima

AdvFairness

CodeT5DefectDetection

DaNuoYi

Let's Connect

Institution

Email

Location