Eemil Lagerspetz

Theory and Practice of Bloom Filters for Distributed Systems

Sasu Tarkoma, Christian Esteve Rothenberg, Eemil Lagerspetz|IEEE Communications Surveys & Tutorials|2011

Cited by 521

Many network solutions and overlay networks utilize probabilistic techniques to reduce information processing and networking costs. This survey article presents a number of frequently used and useful probabilistic techniques. Bloom filters and their variants are of prime importance, and they are heavily used in various distributed systems. This has been reflected in recent research and many new algorithms have been proposed for distributed systems that are either directly or indirectly based on Bloom filters. In this survey, we give an overview of the basic and advanced techniques, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

Low-Cost Outdoor Air Quality Monitoring and Sensor Calibration

Francesco Concas, Julien Mineraud, Eemil Lagerspetz et al.|ACM Transactions on Sensor Networks|2021

Cited by 241Open Access

The significance of air pollution and the problems associated with it are fueling deployments of air quality monitoring stations worldwide. The most common approach for air quality monitoring is to rely on environmental monitoring stations, which unfortunately are very expensive both to acquire and to maintain. Hence, environmental monitoring stations are typically sparsely deployed, resulting in limited spatial resolution for measurements. Recently, low-cost air quality sensors have emerged as an alternative that can improve the granularity of monitoring. The use of low-cost air quality sensors, however, presents several challenges: They suffer from cross-sensitivities between different ambient pollutants; they can be affected by external factors, such as traffic, weather changes, and human behavior; and their accuracy degrades over time. Periodic re-calibration can improve the accuracy of low-cost sensors, particularly with machine-learning-based calibration, which has shown great promise due to its capability to calibrate sensors in-field. In this article, we survey the rapidly growing research landscape of low-cost sensor technologies for air quality monitoring and their calibration using machine learning techniques. We also identify open research challenges and present directions for future research.

Carat

Adam J. Oliner, Anand Iyer, Ion Stoica et al.|Unknown|2013

Cited by 146

We aim to detect and diagnose energy anomalies, abnormally heavy battery use. This paper describes a collaborative black-box method, and an implementation called Carat, for diagnosing anomalies on mobile devices. A client app sends intermittent, coarse-grained measurements to a server, which correlates higher expected energy use with client properties like the running apps, device model, and operating system. The analysis quantifies the error and confidence associated with a diagnosis, suggests actions the user could take to improve battery life, and projects the amount of improvement. During a deployment to a community of more than 500,000 devices, Carat diagnosed thousands of energy anomalies in the wild. Carat detected all synthetically injected anomalies, produced no known instances of false positives, projected the battery impact of anomalies with 95% accuracy, and, on average, increased a user's battery life by 11% after 10 days (compared with 1.9% for the control group).

Predicting Depression From Smartphone Behavioral Markers Using Machine Learning Methods, Hyperparameter Optimization, and Feature Importance Analysis: Exploratory Study

Kennedy Opoku Asare, Yannik Terhorst, Julio Vega et al.|JMIR mhealth and uhealth|2021

Cited by 144Open Access

BACKGROUND: Depression is a prevalent mental health challenge. Current depression assessment methods using self-reported and clinician-administered questionnaires have limitations. Instrumenting smartphones to passively and continuously collect moment-by-moment data sets to quantify human behaviors has the potential to augment current depression assessment methods for early diagnosis, scalable, and longitudinal monitoring of depression. OBJECTIVE: The objective of this study was to investigate the feasibility of predicting depression with human behaviors quantified from smartphone data sets, and to identify behaviors that can influence depression. METHODS: Smartphone data sets and self-reported 8-item Patient Health Questionnaire (PHQ-8) depression assessments were collected from 629 participants in an exploratory longitudinal study over an average of 22.1 days (SD 17.90; range 8-86). We quantified 22 regularity, entropy, and SD behavioral markers from the smartphone data. We explored the relationship between the behavioral features and depression using correlation and bivariate linear mixed models (LMMs). We leveraged 5 supervised machine learning (ML) algorithms with hyperparameter optimization, nested cross-validation, and imbalanced data handling to predict depression. Finally, with the permutation importance method, we identified influential behavioral markers in predicting depression. RESULTS: Of the 629 participants from at least 56 countries, 69 (10.97%) were females, 546 (86.8%) were males, and 14 (2.2%) were nonbinary. Participants' age distribution is as follows: 73/629 (11.6%) were aged between 18 and 24, 204/629 (32.4%) were aged between 25 and 34, 156/629 (24.8%) were aged between 35 and 44, 166/629 (26.4%) were aged between 45 and 64, and 30/629 (4.8%) were aged 65 years and over. Of the 1374 PHQ-8 assessments, 1143 (83.19%) responses were nondepressed scores (PHQ-8 score <10), while 231 (16.81%) were depressed scores (PHQ-8 score ≥10), as identified based on PHQ-8 cut-off. A significant positive Pearson correlation was found between screen status-normalized entropy and depression (r=0.14, P<.001). LMM demonstrates an intraclass correlation of 0.7584 and a significant positive association between screen status-normalized entropy and depression (β=.48, P=.03). The best ML algorithms achieved the following metrics: precision, 85.55%-92.51%; recall, 92.19%-95.56%; F1, 88.73%-94.00%; area under the curve receiver operating characteristic, 94.69%-99.06%; Cohen κ, 86.61%-92.90%; and accuracy, 96.44%-98.14%. Including age group and gender as predictors improved the ML performances. Screen and internet connectivity features were the most influential in predicting depression. CONCLUSIONS: Our findings demonstrate that behavioral markers indicative of depression can be unobtrusively identified from smartphone sensors' data. Traditional assessment of depression can be augmented with behavioral markers from smartphones for depression diagnosis and monitoring.

Toward Massive Scale Air Quality Monitoring

Naser Hossein Motlagh, Eemil Lagerspetz, Petteri Nurmi et al.|IEEE Communications Magazine|2020

Cited by 118Open Access

Dangers associated with poor air quality are driving deployments of air quality monitoring technology. These deployments rely on either professional-grade measurement stations or a small number of low-cost sensors integrated into urban infrastructure. In this article, we present a research vision of real-time massive scale air quality sensing that integrates tens of thousands or even millions of air quality sensors to monitor air quality at fine spatial and temporal resolution. We highlight opportunities and challenges of our vision by discussing use cases, key requirements, and reference technologies in order to establish a roadmap on how to realize this vision. We address the feasibility of our vision, introducing a testbed deployment in Helsinki, Finland, and carrying out controlled experiments that address collaborative and opportunistic sensor calibration, a key research challenge for our vision.

Is this you? Claim your profile.

Top publicationsby citations