BioKA: a curated and integrated biomarker knowledgebase for animalsYibo Wang, Yi‐Hao Lin, Sicheng Wu et al.|Nucleic Acids Research|2023 Biomarkers play an important role in various area such as personalized medicine, drug development, clinical care, and molecule breeding. However, existing animals' biomarker resources predominantly focus on human diseases, leaving a significant gap in non-human animal disease understanding and breeding research. To address this limitation, we present BioKA (Biomarker Knowledgebase for Animals, https://ngdc.cncb.ac.cn/bioka), a curated and integrated knowledgebase encompassing multiple animal species, diseases/traits, and annotated resources. Currently, BioKA houses 16 296 biomarkers associated with 951 mapped diseases/traits across 31 species from 4747 references, including 11 925 gene/protein biomarkers, 1784 miRNA biomarkers, 1043 mutation biomarkers, 773 metabolic biomarkers, 357 circRNA biomarkers and 127 lncRNA biomarkers. Furthermore, BioKA integrates various annotations such as GOs, protein structures, protein-protein interaction networks, miRNA targets and so on, and constructs an interactive knowledge network of biomarkers including circRNA-miRNA-mRNA associations, lncRNA-miRNA associations and protein-protein associations, which is convenient for efficient data exploration. Moreover, BioKA provides detailed information on 308 breeds/strains of 13 species, and homologous annotations for 8784 biomarkers across 16 species, and offers three online application tools. The comprehensive knowledge provided by BioKA not only advances human disease research but also contributes to a deeper understanding of animal diseases and supports livestock breeding.
MACdb: A Curated Knowledgebase for Metabolic Associations across Human CancersYanling Sun, Xinchang Zheng, Guoliang Wang et al.|Molecular Cancer Research|2023 Cancer is one of the leading causes of human death. As metabolomics techniques become more and more widely used in cancer research, metabolites are increasingly recognized as crucial factors in both cancer diagnosis and treatment. In this study, we developed MACdb (https://ngdc.cncb.ac.cn/macdb), a curated knowledgebase to recruit the metabolic associations between metabolites and cancers. Unlike conventional data-driven resources, MACdb integrates cancer-metabolic knowledge from extensive publications, providing high quality metabolite associations and tools to support multiple research purposes. In the current implementation, MACdb has integrated 40,710 cancer-metabolite associations, covering 267 traits from 17 categories of cancers with high incidence or mortality, based entirely on manual curation from 1,127 studies reported in 462 publications (screened from 5,153 research papers). MACdb offers intuitive browsing functions to explore associations at multi-dimensions (metabolite, trait, study, and publication), and constructs knowledge graph to provide overall landscape among cancer, trait, and metabolite. Furthermore, NameToCid (map metabolite name to PubChem Cid) and Enrichment tools are developed to help users enrich the association of metabolites with various cancer types and traits. IMPLICATION: MACdb paves an informative and practical way to evaluate cancer-metabolite associations and has a great potential to help researchers identify key predictive metabolic markers in cancers.
The Current Status and Future Prospects of Carbon Capture, Carbon Transportation, And Carbon Utilization Technologies in Chinese Coal-Fired Power Plants Within the Context of Dual Carbon GoalsYao Fu, Yibo Wang, Tongyang Zheng|Highlights in Science Engineering and Technology|2024 Carbon dioxide capture, utilization, and storage (CCUS) constitute vital measures for achieving net emissions reduction in China. Against the backdrop of the “dual carbon” initiative in China, this paper meticulously examines and analyzes the key technological principles, merits and demerits, and bottlenecks associated with CCUS carbon capture in the country’s thermal power plants at the current stage. It combines the prevailing industrial application status and the future development trajectory of pertinent technologies with considerations of industrial economic benefits and business models. Building upon this analysis, the paper summarizes the application status of CCUS technology in China’s thermal power plant industry and presents future prospects. On the technological front, the three capture methods grapple with the challenge of high costs. Therefore, there is a need to intensify efforts in developing low-cost and efficient carbon capture materials, as well as researching and practically applying low-cost oxygen generation technology. Economically, CCUS technology currently faces high costs, low returns, and certain commercial barriers, placing China in a demonstrative business model stage. To overcome these challenges and realize the economic benefits of CCUS technology, government incentives and technological innovation are imperative. These measures aim to reduce the overall cost of CCUS technology, stimulate the scale development, and foster commercialization, ultimately contributing to the achievement of China’s carbon emission reduction targets.
Self-Attention Mechanisms in HPC Job Scheduling: A Novel Framework Combining Gated Transformers and Enhanced PPOXu Gao, Hang Dong, Lianji Zhang et al.|Applied Sciences|2025 In HPC systems, job scheduling plays a critical role in determining resource allocation and task execution order. With the continuous expansion of computing scale and increasing system complexity, modern HPC scheduling faces two major challenges: a massive decision space consisting of tens of thousands of computing nodes and a huge job queue, as well as complex temporal dependencies between jobs and dynamically changing resource states.Traditional heuristic algorithms and basic reinforcement learning methods often struggle to effectively address these challenges in dynamic HPC environments. This study proposes a novel scheduling framework that combines GTrXL with PPO, achieving significant performance improvements through multiple technical innovations. The framework leverages the sequence modeling capabilities of the Transformer architecture and selectively filters relevant historical scheduling information through a dual-gate mechanism, improving long sequence modeling efficiency compared to standard Transformers. The proposed SECT module further enhances resource awareness through dynamic feature recalibration, achieving improved system utilization compared to similar attention mechanisms. Experimental results on multiple datasets (ANL-Intrepid, Alibaba, SDSC-SP2) demonstrate that the proposed components achieve significant performance improvements over baseline PPO implementations. Comprehensive evaluations on synthetic workloads and real HPC trace data show improvements in resource utilization and waiting time, particularly under high-load conditions, while maintaining good robustness across various cluster configurations.
Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over TablesWenting Zhao, Ye Liu, Yao Wan et al.|arXiv (Cornell University)|2023 Question answering on tabular data (a.k.a TableQA), which aims at generating answers to questions grounded on a provided table, has gained significant attention recently. Prior work primarily produces concise factual responses through information extraction from individual or limited table cells, lacking the ability to reason across diverse table cells. Yet, the realm of free-form TableQA, which demands intricate strategies for selecting relevant table cells and the sophisticated integration and inference of discrete data fragments, remains mostly unexplored. To this end, this paper proposes a generalized three-stage approach: Table-to- Graph conversion and cell localizing, external knowledge retrieval, and the fusion of table and text (called TAG-QA), to address the challenge of inferring long free-form answers in generative TableQA. In particular, TAG-QA (1) locates relevant table cells using a graph neural network to gather intersecting cells between relevant rows and columns, (2) leverages external knowledge from Wikipedia, and (3) generates answers by integrating both tabular data and natural linguistic information. Experiments showcase the superior capabilities of TAG-QA in generating sentences that are both faithful and coherent, particularly when compared to several state-of-the-art baselines. Notably, TAG-QA surpasses the robust pipeline-based baseline TAPAS by 17% and 14% in terms of BLEU-4 and PARENT F-score, respectively. Furthermore, TAG-QA outperforms the end-to-end model T5 by 16% and 12% on BLEU-4 and PARENT F-score, respectively.