SILVIA METELLI
×

Research.




Working Papers.







Publications.


Abstract:

Objective: Meta-analyses of observational studies are frequently published in the literature, but they are generally considered suboptimal to those involving randomised controlled trials (RCTs) only. This is due to the increased risk of biases that observational studies may entail as well as because of the high heterogeneity that might be present. In this article, we highlight aspects of meta-analyses with observational studies that need more careful consideration in comparison to meta-analyses of RCTs.

Methods: We present an overview of recommendations from the literature with respect to how the different steps of a meta-analysis involving observational studies should be comprehensively conducted. We focus more on issues arising at the step of the quantitative synthesis, in terms of handling heterogeneity and biases. We briefly describe some sophisticated synthesis methods, which may allow for more flexible modelling approaches than common meta-analysis models. We illustrate the issues encountered in the presence of observational studies using an example from mental health, which assesses the risk of myocardial infarction in antipsychotic drug users.

Results: The increased heterogeneity observed among studies challenges the interpretation of the diamond, while the inclusion of short exposure studies may lead to an exaggerated risk for myocardial infarction in this population.

Conclusions: In the presence of observational study designs, prior to synthesis, investigators should carefully consider whether all studies at hand are able to answer the same clinical question. The potential for a quantitative synthesis should be guided through examination of the amount of clinical and methodological heterogeneity and assessment of possible biases.

Link to paper.





Abstract:

Monitoring computer network traffic for anomalous behaviour presents an important security challenge. Arrivals of new edges in a network graph represent connections between a client and server pair not previously observed, and in rare cases these might suggest the presence of intruders or malicious implants. We propose a Bayesian model and anomaly detection method for simultaneously characterising existing network structure and modelling likely new edge formation. The method is demonstrated on real computer network authentication data and successfully identifies some machines which are known to be compromised.

Link to paper.



Abstract:

Computer networks are complex and the analysis of their structure in search for anomalous behaviour is both a challenging and important task for cyber security. For instance, new edges, i.e. connections from a host or user to a computer that has not been connected to before, provide potentially strong statistical evidence for detecting anomalies. Unusual new edges can sometimes be indicative of both legitimate activity, such as automated update requests permitted by the client, and illegitimate activity, such as denial of service (DoS) attacks to cause service disruption or intruders escalating privileges by traversing through the host network. In both cases, capturing and accumulating evidence of anomalous new edge formation represents an important security application. Computer networks tend to exhibit an underlying cluster structure, where nodes are naturally grouped together based on similar connection patterns. What constitutes anomalous behaviour may strongly differ between clusters, so inferring these peer groups constitutes an important step in modelling the types of new connections a user would make. In this article, we present a two-step Bayesian statistical method aimed at clustering similar users inside the network and simultaneously modelling new edge activity, exploiting both overall-level and cluster-level covariates.

Link to paper.



Abstract:

In multilevel models for binary responses, estimation is computationally challenging due to the need to evaluate intractable integrals. In this paper, we investigate the performance of integrated nested Laplace approximation (INLA), a fast deterministic method for Bayesian inference. In particular, we conduct an extensive simulation study to compare the results obtained with INLA to the results obtained with a traditional stochastic method for Bayesian inference (MCMC Gibbs sampling), and with maximum likelihood through adaptive quadrature. Particular attention is devoted to the case of small number of clusters. The specification of the prior distribution for the cluster variance plays a crucial role and it turns out to be more relevant than the choice of the estimation method. The simulations show that INLA has an excellent performance as it achieves good accuracy (similar to MCMC) with reduced computational times (similar to adaptive quadrature).

Link to paper.



Abstract:

Anomalous connections in a computer network graph can be a signal of malicious behaviours. For instance, a compromised computer node tends to form a large number of new client edges in the network graph, connecting to server IP (Internet Protocol) addresses which have not previously been visited. This behaviour can be caused by malware (malicious software) performing a denial of service (DoS) attack, to cause disruption or further spread malware, alternatively, the rapid formation of new edges by a compromised node can be caused by an intruder seeking to escalate privileges by traversing through the host network. However, study of computer network flow data suggests new edges are also regularly formed by uninfected hosts, and often in bursts. Statistically detecting anomalous formation of new edges requires reliable models of the normal rate of new edges formed by each host. Network traffic data are complex, and so the potential number of variables which might be included in such a statistical model can be large, and without proper treatment this would lead to overfitting of models with poor predictive performance. In this paper, Bayesian variable selection is applied to a logistic regression model for new edge formation for the purpose of selecting the best subset of variables to include.

Link to paper.


Reports.