J

Jinzhuo Wang

Peking University

ORCID: 0000-0002-9464-4426

Publishes on Human Pose and Action Recognition, Topic Modeling, Advanced Vision and Imaging. 63 papers and 682 citations.

63Publications
682Total Citations
#5in HER2

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Recent advances of Transformers in medical image analysis: A comprehensive review
Kun Xia, Jinzhuo Wang|MedComm – Future Medicine|2023
Cited by 53Open Access

Abstract Recent works have shown that Transformer's excellent performances on natural language processing tasks can be maintained on natural image analysis tasks. However, the complicated clinical settings in medical image analysis and varied disease properties bring new challenges for the use of Transformer. The computer vision and medical engineering communities have devoted significant effort to medical image analysis research based on Transformer with especial focus on scenario‐specific architectural variations. In this paper, we comprehensively review this rapidly developing area by covering the latest advances of Transformer‐based methods in medical image analysis of different settings. We first give introduction of basic mechanisms of Transformer including implementations of selfattention and typical architectures. The important research problems in various medical image data modalities, clinical visual tasks, organs and diseases are then reviewed systemically. We carefully collect 276 very recent works and 76 public medical image analysis datasets in an organized structure. Finally, discussions on open problems and future research directions are also provided. We expect this review to be an up‐to‐date roadmap and serve as a reference source in pursuit of boosting the development of medical image analysis field.

Millimeter-scale magnetic implants paired with a fully integrated wearable device for wireless biophysical and biochemical sensing
Ji Wan, Zhongyi Nie, Jie Xu et al.|Science Advances|2024
Cited by 52Open Access

Implantable sensors can directly interface with various organs for precise evaluation of health status. However, extracting signals from such sensors mainly requires transcutaneous wires, integrated circuit chips, or cumbersome readout equipment, which increases the risks of infection, reduces biocompatibility, or limits portability. Here, we develop a set of millimeter-scale, chip-less, and battery-less magnetic implants paired with a fully integrated wearable device for measuring biophysical and biochemical signals. The wearable device can induce a large amplitude damped vibration of the magnetic implants and capture their subsequent motions wirelessly. These motions reflect the biophysical conditions surrounding the implants and the concentration of a specific biochemical depending on the surface modification. Experiments in rat models demonstrate the capabilities of measuring cerebrospinal fluid (CSF) viscosity, intracranial pressure, and CSF glucose levels. This miniaturized system opens the possibility for continuous, wireless monitoring of a wide range of biophysical and biochemical conditions within the living organism.

Video Imagination from a Single Image with Transformation Generation
Cited by 42

In this work, we focus on a challenging task: synthesizing multiple imaginary videos given a single image. Major problems come from high dimensionality of pixel space and the ambiguity of potential motions. To overcome those problems, we propose a new framework that produce imaginary videos by transformation generation. The generated transformations are applied to the original image in a novel volumetric merge network to reconstruct frames in imaginary video. Through sampling different latent variables, our method can output different imaginary video samples. The framework is trained in an adversarial way with unsupervised learning. For evaluation, we propose a new assessment metric RIQA. In experiments, we test on 3 datasets varying from synthetic data to natural scene. Our framework achieves promising performance in image quality assessment. The visual inspection indicates that it can successfully generate diverse five-frame videos in acceptable perceptual quality.

Multiscale Deep Alternative Neural Network for Large-Scale Video Classification
Jinzhuo Wang, Wenmin Wang, Wen Gao|IEEE Transactions on Multimedia|2018
Cited by 31

With the rapid increase in the amount of multimedia data, video classification has become a demanding and challenging research topic. Compared with image classification, video classification requires mapping a video that contains hundreds of frames to semantic tags, which poses many challenges to the direct use of advanced models originally designed for image-oriented tasks. On the other hand, continuous frames in a video also give us more visual clues that we can leverage to achieve better classification. One of the most important clues is the context in the spatiotemporal domain. In this paper, we introduce the multiscale deep alternative neural network (DANN), a novel architecture combining the strengths of both convolutional neural network and recurrent neural networks to achieve a deep network that can collect rich context hierarchies for video classification. In particular, the DANN is stacked with alternative layers, each of which consists of a volumetric convolutional layer followed by a recurrent layer. The former acts as a local feature learner, whereas the latter is used to collect contexts. Compared with popular deep feed-forward neural networks, the DANN learns local features and their contexts from the very beginning. This setting enables preserving context evolutions, which we show to be essential for improving the accuracy of video classification. To release the full potential of the DANN, we develop a deeper version with stochastic-layer skip-connections and construct a multiscale DANN to incorporate contexts at different scales. We show how to apply the multiscale DANN for video classification with carefully designed configurations in terms of both input-output settings and training-testing methods. The DANN is shown to be robust to not only human-centric videos, but also natural videos. As there are few large-scale natural disaster video datasets, we construct a new large-scale one and make it publicly available. Experiments on four datasets show the effectiveness of our method for both human actions and natural events.

Similar Researchers

Coming soon — researchers in similar fields and career stages