Llama 2: Open Foundation and Fine-Tuned Chat ModelsHugo Touvron, Louis Martin, Kevin H. Stone et al.|arXiv (Cornell University)|2023 In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
Deep Denoising for Scientific Discovery: A Case Study in Electron MicroscopySreyas Mohan, Ramón Manzorro, Joshua Vincent et al.|IEEE Transactions on Computational Imaging|2022 Denoising is a fundamental challenge in scientific imaging. Deep convolutional neural networks (CNNs) provide the current state of the art in denoising natural images, where they produce impressive results. However, their potential has been inadequately explored in the context of scientific imaging. Denoising CNNs are typically trained on real natural images artificially corrupted with simulated noise. In contrast, in scientific applications, noiseless ground-truth images are usually not available. To address this issue, we propose a simulation-based denoising (SBD) framework, in which CNNs are trained on simulated images. We test the framework on data obtained from transmission electron microscopy (TEM), an imaging technique with widespread applications in material science, biology, and medicine. SBD outperforms existing techniques by a wide margin on a simulated benchmark dataset, as well as on real data. We analyze the generalization capability of SBD, demonstrating that the trained networks are robust to variations of imaging parameters and of the underlying signal structure. Our results reveal that state-of-the-art architectures for denoising photographic images may not be well adapted to scientific-imaging data. For instance, substantially increasing their field-of-view dramatically improves their performance on TEM images acquired at low signal-to-noise ratios. We also demonstrate that standard performance metrics for photographs (such as PSNR and SSIM) may fail to produce scientifically meaningful evaluation. We propose several metrics to remedy this issue for the case of atomic resolution electron microscope images. In addition, we propose a technique, based on likelihood computations, to visualize the agreement between the structure of the denoised images and the observed data. Finally, we release a publicly available benchmark dataset of TEM images, containing 18,000 examples.
Probabilistic Transformer For Time Series AnalysisBinh Tang, David S. Matteson|Neural Information Processing Systems|2021 Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction TuningLili Yu, Bowen Shi, Ramakanth Pasunuru et al.|arXiv (Cornell University)|2023 We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation.
Predicting poverty with vegetation indexBinh Tang, Yanyan Liu, David S. Matteson|Applied Economic Perspectives and Policy|2021 Abstract Accurate and timely predictions of the poverty status of communities in developing countries are critical to policymakers. Previous work has applied convolutional neural networks (CNNs) to high‐resolution satellite imagery to perform community‐level poverty prediction. Although promising, such imagery has limitations in predicting poverty among poor communities. We provide the first evidence that a publicly available, moderate‐resolution vegetation index (the normalized difference vegetation index [NDVI]), can be used with CNNs to produce accurate poverty predictions contemporaneously among poor communities heavily dependent on agriculture. We also show that the NDVI can effectively detect consumption variation over time. To our knowledge, this is the first attempt to use remote sensing data to predict future‐period consumption expenditure at the community level.