Atmosphere Ocean Science Colloquium

Time Series Analysis and Data-Driven Model Discrimination Beyond Usual Assumptions: Mathematical Ideas, Algorithmic Methods, Hpc and Application Examples

Speaker: Illia Horenko, ICS Lugano

Location: Warren Weaver Hall 1302

Date: Wednesday, February 5, 2014, 3:30 p.m.

Synopsis:

Due to the intrinsically multiscale nature, results of data analysis in many real-life application areas may be distorted by the implicit assumptions imposed by the data analysis methods. For example, parametric Bayesian learning approaches impose the strong implicit a prior assumptions like stationarity, Gaussianity (for Gaussian Mixture Models) and homogenous/stationary Markovianity assumption (e.g., in Hidden Markov Models and related approaches like Bayesian Mixture Models). Such a prior assumptions may also induce problems when comparing different model descriptions for the same data, i.e. at the step of statistical model discrimination (e.g., when comparing the output of a non-parametric K-means clustering with outputs of parametric GMMs or HMMs). On the other hand, the non-parametric statistical methods result ininfinite-dimensional variational problems and are computationally intractable for realistic applications when,e.g., the non-equality constraints are present that prohibit the deployment of standard explicit variational Euler-Lagrange-like optimizers. Moreover, both parametricand non-parametric problems may suffer from ill-posedness of the underlying problem formulation. Generic scenarios where these implicit assumptions of available methods may be violated for realistic processes will be discussed from mathematical perspective and alternative approaches based on applied math view point on statistics, computational data analysis and model discrimination will be described. Main ideas behind the non-stationary and non-parametric time series analysis framework (combining concepts from the functional analysis, partial differential equations,statistics, information theory and high-performance computing) that allows to go beyond these implicit assumptions of standard tools will be explained. Recently published results will be presented, showing that the resulting framework is a non-stationary and non-parametric generalization of a large class of standard statistical data analysis approaches. Through mild and physically-motivated regularization inducing the persistency of the obtained parameters (such as Tykhonov-, L1- and graph-induced regularizations), resulting framework weakens the implicit assumptions necessary for such standard data analysis tools like stationary regression analysis and dimension reduction methods, discrete homogenous Markovand Bernoulli process models, generalized linear models, K-Means and fuzzy clustering methods as well as Bayesian learning methods (like GMMs and HMMs). A brief general overview of published real-life applications of the resulting FEM-BV-framework (Finite Element Model of data analysis with Bounded Variation of model parameters) will be given, showing some examples from: (i) sociology; (ii) economics and risk modeling, (iii) informatics; (iv)geosciences; (v) biophysics and bioinformatics.