W

Weidong Ma

Shihezi University

ORCID: 0000-0002-6027-9133

Publishes on Game Theory and Applications, Cryospheric studies and observations, Complex Network Analysis Techniques. 105 papers and 10.3k citations.

105Publications
10.3kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

LightGBM: A Highly Efficient Gradient Boosting Decision Tree
Guolin Ke, Qi Meng, Thomas Finley et al.|HAL (Le Centre pour la Communication Scientifique Directe)|2017
Cited by 9.5kOpen Access

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

A Highly Efficient Gradient Boosting Decision Tree
Guolin Ke, Qi Meng, Taifeng Wang et al.|Neural Information Processing Systems|2017
Cited by 124

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: \emph{Gradient-based One-Side Sampling} (GOSS) and \emph{Exclusive Feature Bundling} (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB \emph{LightGBM}. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

Budgeted multi-armed bandits with multiple plays
Yingce Xia, Tao Qin, Weidong Ma et al.|Unknown|2016
Cited by 51

We study the multi-play budgeted multi-armed bandit (MP-BMAB) problem, in which pulling an arm receives both a random reward and a random cost, and a player pulls L(≥ 1) arms at each round. The player targets at maximizing her total expected reward under a budget constraint B for the pulling costs. We present a multiple ratio confidence bound policy: At each round, we first calculate a truncated upper (lower) confidence bound for the expected reward (cost) of each arm, and then pull the L arms with the maximum ratio of the sum of the upper confidence bounds of rewards to the sum of the lower confidence bounds of costs. We design a 0- 1 integer linear fractional programming oracle that can pick such the L arms within polynomial time. We prove that the regret of our policy is sublinear in general and is log-linear for certain parameter settings. We further consider two special cases of MP-BMABs: (1) We derive a lower bound for any consistent policy for MP-BMABs with Bernoulli reward and cost distributions. (2) We show that the proposed policy can also solve conventional budgeted MAB problem (a special case of MP-BMABs with L = 1) and provides better theoretical results than existing UCB-based pulling policies.

Alternative Pathway to Phase Down Coal Power and Achieve Negative Emission in China
Rui Wang, Haoran Li, Wenjia Cai et al.|Environmental Science & Technology|2022
Cited by 47

Although widely recognized as the key to climate goals, coal “phase down” has long been argued for its side effects on energy security and social development. Retrofitting coal power units with biomass and coal co-firing with a carbon capture and storage approach provides an alternative way to avoid these side effects and make deep carbon dioxide emission cuts or even achieve negative emission. However, there is a lack of clear answers to how much the maximum emission reduction potential this approach can unlock, which is the key information to promote this technology on a large scale. Here, we focus on helping China’s 4536 coal power units make differentiated retrofit choices based on unit-level heterogeneity information and resource spatial matching results. We found that China’s coal power units have the potential to achieve 0.4 Gt of negative CO2 emission in 2025, and the cumulative negative CO2 emission would reach 10.32 Gt by 2060. To achieve negative CO2 emission, the biomass resource amount should be 1.65 times the existing agricultural and forestry residues, and the biomass and coal co-firing ratio should exceed 70%. Coal power units should grasp their time window; otherwise, the maximum negative potential would decrease at a rate of 0.35 Gt per year.