Chuanyang Zheng

Progressive-Hint Prompting Improves Reasoning in Large Language Models

Chuanyang Zheng, Zhengying Liu, Enze Xie et al.|arXiv (Cornell University)|2023

Cited by 33Open Access

The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).

DeepKOA: a deep-learning model for predicting progression in knee osteoarthritis using multimodal magnetic resonance images from the osteoarthritis initiative

Jiaping Hu, Chuanyang Zheng, Qingling Yu et al.|Quantitative Imaging in Medicine and Surgery|2023

Cited by 31Open Access

Background: No investigations have thoroughly explored the feasibility of combining magnetic resonance (MR) images and deep-learning methods for predicting the progression of knee osteoarthritis (KOA). We thus aimed to develop a potential deep-learning model for predicting OA progression based on MR images for the clinical setting. Methods: A longitudinal case-control study was performed using data from the Foundation for the National Institutes of Health (FNIH), composed of progressive cases [182 osteoarthritis (OA) knees with both radiographic and pain progression for 24-48 months] and matched controls (182 OA knees not meeting the case definition). DeepKOA was developed through 3-dimensional (3D) DenseNet169 to predict KOA progression over 24-48 months based on sagittal intermediate-weighted turbo-spin echo sequences with fat-suppression (SAG-IW-TSE-FS), sagittal 3D dual-echo steady-state water excitation (SAG-3D-DESS-WE) and its axial and coronal multiplanar reformation, and their combined MR images with patient-level labels at baseline, 12, and 24 months to eventually determine the probability of progression. The classification performance of the DeepKOA was evaluated using 5-fold cross-validation. An X-ray-based model and traditional models that used clinical variables via multilayer perceptron were built. Combined models were also constructed, which integrated clinical variables with DeepKOA. The area under the curve (AUC) was used as the evaluation metric. Results: The performance of SAG-IW-TSE-FS in predicting OA progression was similar or higher to that of other single and combined sequences. The DeepKOA based on SAG-IW-TSE-FS achieved an AUC of 0.664 (95% CI: 0.585-0.743) at baseline, 0.739 (95% CI: 0.703-0.775) at 12 months, and 0.775 (95% CI: 0.686-0.865) at 24 months. The X-ray-based model achieved an AUC ranging from 0.573 to 0.613 at 3 time points. However, adding clinical variables to DeepKOA did not improve performance (P>0.05). Initial visualizations from gradient-weighted class activation mapping (Grad-CAM) indicated that the frequency with which the patellofemoral joint was highlighted increased as time progressed, which contrasted the trend observed in the tibiofemoral joint. The meniscus, the infrapatellar fat pad, and muscles posterior to the knee were highlighted to varying degrees. Conclusions: This study initially demonstrated the feasibility of DeepKOA in the prediction of KOA progression and identified the potential responsible structures which may enlighten the future development of more clinically practical methods.

FIMO: A Challenge Formal Dataset for Automated Theorem Proving

Chengwu Liu, Jianhao Shen, Huajian Xin et al.|arXiv (Cornell University)|2023

Cited by 10Open Access

We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes.

LEGO-Prover: Neural Theorem Proving with Growing Libraries

Haiming Wang, Huajian Xin, Chuanyang Zheng et al.|arXiv (Cornell University)|2023

Cited by 9Open Access

Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%). During the proving process, LEGO-Prover also manages to generate over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We also release our code and all the generated skills.

Plankton Metabolism in Coastal Waters of the Guangdong-Hong Kong-Macao Greater Bay: Regional Variance and Driving Factors

Liangkui Zhang, Gang Li, Chenhui Xiang et al.|Frontiers in Marine Science|2022

Cited by 5Open Access

Metabolisms of field plankton community, including gross primary production (GPP), community respiration (CR), and net community production (NCP), usually indicate the status of the health, resource production, and carbon budget of marine ecosystems. In this study, we explored the regional variance and driving forces of plankton metabolism in coastal waters of the Guangdong-Hong Kong-Macao Greater Bay Area (GGBA), a fast-developed area with complex hydrological and environmental states. The results showed that the maximum GPP and CR occurred in the estuarine plume of the GGBA in summer, while in winter the more active metabolisms of plankton community occurred in the Daya Bay, with the GPP and CR being respectively mediated by the nutrient level and temperature. Moreover, four regional zones were divided on the base of the environments and biological factors in surface water of the GGBA i.e., the river-runoff zone, river-plume zone, nearshore and far-offshore zones. The metabolic states in these zones varied significantly due to the regional and seasonal variations of, for example, the nutrient level, temperature, and turbidity driven by multiple factors including land-derived runoffs, anthropogenic activities, the Yuedong Coastal Current, and offshore seawater-intrusions. On the whole, the GGBA areas exhibited the weak heterotrophic processes in both summer (NCP = -24.9 ± 26.7 mg C m -3 d -1 ) and winter (NCP = -51.2 ± 8.51 mg C m -3 d -1 ). In addition, we found that the higher CR occurred to the bottom layers of the river plume and nearshore zones wherein hypoxia happened, indicating a possible attribution of plankton community respiration to the hypoxia in the GGBA.

Is this you? Claim your profile.

Top publicationsby citations