Loading...

Heng Yang

AI Researcher & PhD Student

I am a PhD student at the University of Exeter, specializing in Large Language Models, Genomic Foundation Models, and Aspect-Based Sentiment Analysis. My research focuses on developing innovative computational methods for biological sequence modeling and sentiment analysis, leading open-source initiatives with over 1.5 million downloads and contributing to top-tier publications like Nature Machine Intelligence.

0 First-Author Papers
0 Paper Citations
0 GitHub Repositories
0 Stars, 500k+ LOC
0 PyPI Packages
0 Huggingface Downloads
Heng Yang

Research Areas

Exploring the frontiers of AI, Genomics, and Software Engineering

Large Language Models

Innovating with LLMs by creating frameworks like InstOptima for multi-objective instruction evolution and RNADesign-GRPO for reinforcement learning-based RNA sequence design.

LLM Instruction-tuning InstOptima Genetic Algorithms

Genomic Foundation Models

Pioneering AI4Science research by developing Genomic Foundation Models (e.g., OmniGenome) to tackle challenges like sequence sparsity and SNVs, significantly improving RNA structure prediction and sequence design.

AI4Science Genomics OmniGenome RNA Design

Aspect-Based Sentiment Analysis

Creator of PyABSA, a widely-used open-source toolkit for fine-grained sentiment analysis, supporting over 30 models and datasets and simplifying the research-to-application pipeline.

NLP PyABSA Sentiment Analysis Open Source

Adversarial ML & Fairness

Investigating the intersection of adversarial robustness and model fairness, revealing that attacks can mitigate bias and that adversarial training can form a Pareto front between accuracy and fairness.

Robustness Fairness Adversarial Attack AdvFairness

Software Defect Prediction

Developing the LMDP framework, which leverages Pre-trained Language Models for more accurate, line-level software defect prediction, outperforming traditional AST/GNN-based methods.

Code PLM Defect Prediction Software Engineering LMDP

Text Data Augmentation

Author of BoostAug, a novel text augmentation technique that uses global feature distribution and instance filtering to consistently improve performance across various NLP tasks.

Data Augmentation BoostAug NLP ACL 2023

Selected Publications

Contributing to top-tier conferences and journals

2025

Bridging Sequence-Structure Alignment in RNA Foundation Models

Heng Yang, Ke Li

AAAI 2025 (CCF-A)

2024

PlantRNA-FM: An Interpretable RNA Foundation Model for Exploration Functional RNA Motifs in Plants

Haopeng Yu#, Heng Yang#, Wenqing Sun, Zongyun Yan, Xiaofei Yang, Huakun Zhang, Yiliang Ding, Ke Li

Nature Machine Intelligence (Co-first Author)

2024

MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction

Heng Yang, Ke Li

EMNLP 2024 (CCF-B)

2024

The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples

Heng Yang, Ke Li

EMNLP 2024 (CCF-B)

2024

DaNuoYi: Evolutionary Multi-Task Injection Testing on Web Application Firewalls

Ke Li#, Heng Yang#, Willem Visser

IEEE Transactions on Software Engineering (CCF-A)

2024

Modeling Aspect Sentiment Coherency via Local Sentiment Aggregation

Heng Yang, Ke Li

EACL 2024 (CORE-A)

2024

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

Heng Yang, Jack Cole, Ke Li

arXiv Preprint

2024

Robustness Meets Fairness: Investigating Adversarial Attack Effects on Alleviating Model Bias

Heng Yang, Geyong Min, Ke Li

OpenReview Preprint

2023

InstOptima: Evolutionary Multi-objective Instruction Optimization via LLM-based Instruction Operators

Heng Yang, Ke Li

EMNLP 2023 (CCF-B)

2023

BoostAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

Heng Yang, Ke Li

ACL 2023 (CCF-A)

2023

PyABSA: A Modularized Framework for Reproducible Aspect-Based Sentiment Analysis

Heng Yang, Chen Zhang, Ke Li

CIKM 2023 (CCF-B)

2020

A Multi-task Learning Model for Chinese-oriented Aspect Polarity Classification and Aspect Term Extraction

Heng Yang, Biqing Zeng

Neurocomputing (CCF-C)

Preprint

Tokenization or Featurization? Leveraging Language Models for Code Defect Prediction

Heng Yang, Ke Li

Preprint

Publication Venues

Nature Machine Intelligence ACL EMNLP CIKM EACL IEEE Transactions on Software Engineering Neurocomputing

Featured Hugging Face Models & Spaces

State-of-the-art models and interactive demos

150+ Likes & Favorites
Most Popular

deberta-v3-base-absa-v1.1

Aspect-Based Sentiment Analysis

A top-performing, lightweight model for fine-grained sentiment analysis. Featured in Stanford AI Index Report 2022 as the leading open-source ABSA model.

AAAI 2025

OmniGenome-186M

RNA Foundation Model

Revolutionary sequence-structure alignment model that dramatically improved RNA design success rates from 3% to 84%, setting new standards in computational biology.

Nature

PlantRNA-FM

Plant RNA Foundation Model

An interpretable foundation model for discovering functional RNA motifs in plants, advancing agricultural biotechnology and crop improvement research. Nature Machine Intelligence

EMNLP 2024

MP-RNA

Multi-Species RNA Model

Advanced multi-species RNA foundation model with calibrated secondary structure prediction capabilities, enabling cross-species RNA analysis.

CIKM 2023

PyABSA Space

Interactive Aspect-Based Sentiment Analysis

A comprehensive, interactive demo hub for ABSA. Featured as an official demo by Gradio-Blocks.

arXiv 2024

OmniGenBench

Genomic Foundation Models Leaderboard

The official online leaderboard, allowing researchers to easily trial the benchmark framework.

EMNLP 2024

Rapid Textual Adversarial Defense

Textual Adversarial Attack & Defense Demo

An interactive demo for textual adversarial attack and defense.

AI Art

Super-Resolution-Anime-Diffusion

Anime Image Super-Resolution Demo

An interactive demo for anime image super-resolution using diffusion models.

Featured GitHub Projects

Open-source tools making AI accessible

FindFile

Intelligent file searching tool with advanced filtering and pattern matching capabilities.

Python CLI Tool

MetricVisualizer

Comprehensive visualization toolkit for machine learning metrics and model performance analysis.

Python Visualization

BoostAug

Data augmentation library for boosting machine learning model performance with intelligent augmentation strategies.

Python Data Augmentation

PlantRNA-FM

Interpretable RNA foundation model for exploring functional RNA motifs in plants, published in Nature Machine Intelligence.

RNA Analysis Foundation Model

InstOptima

Evolutionary multi-objective instruction optimization via LLM-based instruction operators.

LLM Instruction Optimization

AdvFairness

A framework to investigate the effects of adversarial attacks on alleviating model bias.

Adversarial ML Fairness

CodeT5DefectDetection

Leveraging language models for code defect prediction at the line-level.

Software Engineering Code PLM

DaNuoYi

Evolutionary multi-task injection testing on Web Application Firewalls (WAFs).

Cybersecurity WAF Testing

Let's Connect

Open to collaborations and discussions

Institution

University of Exeter
Computer Science Department

Email

Academic: hy345@exeter.ac.uk

Personal: yangheng2021@gmail.com

Location

Innovation Centre Phase 1

Exeter, EX4 4RN, United Kingdom