# Program

Registration starts at 5 PM on Wednesday.

## pm – Wed Aug 30

17.00-18.00    Registration

18.00-19.00     Opening (E. Vera)

19.00-20.00    Reception

## Thu Aug 31

9.00-9.30    Andrew Connolly, U. Washington, Seattle, USA,  Searching for things that arent always there.

The traditional approach for analysing astronomical data is to work with catalogs of sources where their properties have been measured from astronomical images. With the dramatic increase in the rate at which we collect data these techniques result in a number of statistical and computational challenges: how do we extract knowledge from large and complex data sets; how do we account for the noise and gaps within data streams; and how do we understand when we have detected a fundamentally new class of event or physical phenomena. This is not just a question of the size of the data (collecting and processing petabyte data sets scales well with projected technology developments) it is a fundamental question of how we discover, represent, visualize and interact with the knowledge that these data contain. In this talk I will describe two new approaches for extracting knowledge from images that rely on a generative approach where we model the sources we are searching for and look for evidence that these models are correct. Using the case of differential chromatic refraction and slowly moving asteroids I will show how this change in methodology can enable us to extract information at higher spectral resolution and to lower signal-to-noise than the classical catalog based techniques.

9.30-9.45    Francisco Förster, CMM-U.Chile / MAS,  Real-time analysis for transient detection and classification: from HiTS to ALeRCE

With a new generation of large etendue, survey telescopes the variable sky will be studied with an unprecedented combination of area, depth and cadence. In order to extract relevant information in this new domain the real time analysis of the data will be required. I will briefly present some of the main results of the High cadence Transient Survey (HiTS), a project which explored this new parameter space in real-time using the Dark Energy Camera (DECam). Then I will discuss the problem of dealing with large volumes of alerts generated in this context, which was an important component for HiTS, but which will become much more difficult with larger projects such as ZTF or LSST. I will present some of our efforts under the Automatic Learning for the Rapid Classification of Events (ALeRCE) collaboration in order to characterize and trigger complementary observations to these alerts in order to understand the nature of transients and variable objects and extract physical parameters from their observables.

9.45-10.00    Pablo Estévez, DIE-U.Chile / MAS,  Kalman & correntropy filtering for transient detection in image streams

The surge of very large astronomical cameras has opened new possibilities for astrophysical variability studies that pose new massive data analysis challenges. Increasingly important is the detection and classification of transients in streams of data, such as those that will be produced by the Large Synoptic Survey Telescope (LSST). In this work we introduce a novel method for detecting transients in a stream of images, which we apply to real data from the High Cadence Transient Survey (HiTS). HiTS is a survey looking for young supernova (SN) and other transients with timescales of hours using data from the Dark Energy Camera (DECam). Studying the temporal evolution of photometry flux images, i.e. point spread function (PSF) integrated fluxes centered around every pixel of an image, we can detect light curves that fit a particular model from the data stream. We describe the flux evolution with two iterative filtering tools: the Kalman Filter; and a non-linear successor, the Correntropy Filter. The later implements an information-theoretic cost function called cross-correntropy that allows discarding artifacts with respect to Kalman Filter results. For this work we focus on light curves that rise smoothly relative to their sampling, i.e. with characteristic time derivative evolution timescales larger than the cadence. Assuming that young supernova can be described by smoothly rising light curves, which is a good approximation given our short cadences, the proposed method was able to detect a set of new SN candidates not previously found by HiTS and rediscovered a subset of HiTS supernova candidates with a higher purity. We show that this method could lead to higher detection efficiencies in the case of slowly rising, low signal to noise ratio transient.

10.00-10.15    Pablo Huijse, MAS, Robust Period Estimation using Mutual Information for Multi-band Light Curves

We propose a Quadratic Mutual Information (QMI) estimator for period detection in multidimensional and-sparsely sampled time series. The proposed method does not assume a particular model for the lightcurve and is robust to non-Gaussian noise and outliers. QMI is compared with the multiband Lomb-Scargle and AoV using LSST light curves generated with OpSim and CatSim. Results show that QMI is more robust to noise a low sample size. A python package is publicly available.

10.15-10.30    Guillermo Cabrera, U. Concepción / MAS,  Deep-HiTS: rotation invariant convolutional neural network for transient detection within the ALeRCE system

Deep-HiTS is a rotational invariant convolutional neural network for transient detection which is part of the Automatic Learning for the Rapid Classification of Events (ALeRCE) system. Deep-HiTS is used for classifying images of transients candidates into artifacts or real sources. CNNs have the advantage of learning the features automatically from the data while achieving high performance. Deep-HiTS was conceived to analyze DECam images within the High cadence Transient Survey (HiTS). For a fixed number of approximately 2,000 allowed false transient candidates per night Deep-HiTS is able to reduce the miss-classified real transients by approximately 1/5 as compared to a Random Forest model. Deep-HiTS is easily extendable and applicable to other surveys. We show how we have extended our model for detecting supernovae over data produced by the VLT Survey Telescope (VST). We have made all our code and data available to the community for the sake of allowing further developments and comparisons.

10:30-11:15    Coffee break

11.15-11.30    Bryce Kalmbach, U. Washington, Seattle, USA,  Estimating Spectra from Photometry

Measuring the physical properties of galaxies such as redshift frequently requires the use of Spectral Energy Distributions (SEDs). SED template sets are, however, often small in number and cover limited portions of photometric color space. Here I will present a new method to estimate SEDs as a function of color from a small training set of template SEDs. I will first cover the mathematical background behind the technique before demonstrating its ability to reconstruct spectra based upon colors and then compare to other common interpolation and extrapolation methods and finally show an example application to photometric redshift estimation

11.30-11.45    TBD

11.45-12.15    Felipe Tobar, CMM-U.Chile,  Bayesian nonparametric spectrum estimation: A motivating example from interferometry

Spectral estimation (SE) refers to representing a function in terms of its distribution of power across frequencies. Though of natural importance for Signal Processing, SE is also crucial in a number of disciplines. In Astronomy, for instance, SE allows us to reconstruct images in interferometry and to determine periodicities of light curves. The theory behind the spectral representation is clear: The spectrum of a function is its Fourier transform. However, in real-world applications we need to recover spectra only having partial, non-evenly-sampled, and noisy observations of the process of interest. In this regard, the two main classic approaches to SE can be identified to be radically different: First, the parametric approach which imposes a stringent model to the data that allows for direct calculation of the Fourier transform. Second, the nonparametric approach which computes the Fourier transform via finite differences and averaging, and therefore lacks structure. In this talk, we present a novel view on SE exploiting the nonparametric and structure-preserving properties of continuous-time Gaussian processes. In a nutshell, we model the process of interest (which is partially observed) in probabilistic terms to then compute the posterior probability density of spectra in closed form. To provide intuition into the proposed approach, we will first focus on the finite-parameter case to see how to reconstruct an astronomical image from spectral measurements using a Bayesian sum of Gaussians. We will next see ongoing efforts extending this concept to the infinite-parameter case using a class of Gaussian process that admit closed-form Fourier transformation. Finally, we will interpret the ability of the proposed model to represent and propagate uncertainty from spatial measurements to spectral estimates.

12.15-12.45    Jean Charles Lamirel, U. Henri Poincaré, Nancy, France,  Feature maximization metric and its application to large data: contrast graphs and identification of community roles in graphs

In a first part of our talk, we will present feature maximization metric. This metric is based on the F-measure of a variable f associated with a class c. It is defined as the harmonic mean of the feature recall and the feature preponderance. The feature selection process based on this metric can therefore be defined as a non-parametrized process in which a class feature is characterized by using both its ability to discriminate the class to which it relates (feature recall) and Its ability to faithfully represent the data in that class (feature preponderance). A step of contrast estimation can be exploited in addition to the first selection step. The role of this one is to estimate the gain of information produced by a feature on a class. This is proportional to the ratio between the value of the feature F-measure of a feature in the class and the mean value of the feature F-measure of that same feature in the set of classes. The feature maximization metric has been used successfully for the classification and clustering tasks on highly multidimensional data of different nature. These experiments have shown that it makes it possible to reduce the complexity of the multidimensional learning problems concerned by isolating the variables or representative feature of the problem to be treated from a very large set of variables. These capacities also make it possible to exploit it very efficiently in order to generate explanations of the learning results, which gives it properties similar to those of symbolic methods, without, however, having their computational complexity. In a second step we will show in two different ways how this measure is applicable to analysis of large graphs. First, we present contrast graphs, which are force-driven graphs whose role is to highlight the dependencies between features and classes, by exploiting the gain of information obtained from the feature F-measure. We present a first illustrative application that we have made of these graphs in Scientometrics for the analysis of transdisciplinarity between scientific domains. We show that the method allows both to reduce the complexity of graph representation and to extract key information from the processed data (knowledge transmitters, transitional topics). We present a second application of contrast graphs in the context of the ISTEX-R project to analyze the changes of the topics intervening in the same scientific field explored over a large period of time. We show in particular how the random walk techniques applied on the contrast graphs make it possible to isolate periods of thematic stability which can then be used as input to Bayesian diachronic differentiation methods. Second, we study on realistic graphs the correlations that exist between the feature F- measure and the usual centrality measures, in particular with those aiming to characterize the node community roles. We show that this measure is linked to the node centrality of the graphs, and that it is particularly adapted to the measurement of their connectivity with regard to the structure of communities. We also show that the usual measures for detecting community roles are strongly dependent on the size of the communities, whereas the ones we propose are, by definition, linked to the density of the community, which makes the results comparable from one graph to another.

12.45-14.45    Lunch

14:45-15:15    Axel Osses, CMM-U.Chile,  Numerical biomedicine and inverse problems: some mathematical challenges in nuclear medicine, elastography and cardiovascular medicine

The techniques of photon and positron emission tomography (SPECT, PET) are the most extensively used in nuclear medicine for identifying radioactive sources in the treatment of cancer and of other neurobiological diseases such as Alzheimer’s and epilepsy. Elastography is a medical technique that uses the propagation of sound in living tissues for the detection of tumors through the indirect and non invasive estimation of their elastic parameters. In cardiovascular medicine, an important clinic problem is to indirectly estimate pressure gradients in arteries through the speed ​​of the bloodstream. All these biomedical techniques consist of the estimation of sources or parameters in some partial differential equations. In addition, some of these techniques are fundamentally based in data obtained from specific magnetic resonance (MRI) examinations where the measured magnetization also satisfies some partial differential equations. One of the main objectives of the Numerical Biomedicine research at CMM is the mathematical analysis and numerical simulation of such biomedical inverse techniques for the understanding, optimization and improvement of the existing medical algorithms.

15:15-15:45    Mathew Graham, Caltech, Pasadena, USA,  Trends in astronomical time series analysis

The advent of large archival collections of astronomical time series and the contemporaneous emergence of sophisticated analysis techniques is transforming our understanding of astrophysical temporal phenomena. Systematic explorations are revealing both population behaviors and individual extreme sources. In this talk, we will review these developments using AGN as an illustrative case and consider the potential of facilities on the near horizon, such as ZTF and LSST, for further discoveries.

15.45-16.15     Nelson Padilla, AIUC-UC,  Numerical cosmological simulations of alternative gravity theories

I will present some of the current efforts to simulate alternative gravity cosmologies, where the accelerated expansion of the universe is achieved without recurring to Dark Energy. These gravity theories are covariant modifications of General Relativity that can be written in a simple form consisting of the standard dark energy dominated model plus an extra field. Because of this, numerical simulation codes designed to solve hydrodynamics in expanding universes can be modified to simulate these alternative models. I will show results for f(R), DGP and Simmetron models, along with interesting new ways to tell these apart from General Relativity.

16:45-17:00    Coffee break

17:00-17:30    Marcos Orchard, DIE-U.Chile,  Application of Multiple-imputation-particle-filter for Parameter Estimation of Visual Binary Stars with Incomplete Observations

In visual binary stars, mass estimation can be accomplished through the study of their orbital parameters –Kepler’s Third Law establishes a strict mathematical relation between orbital period, orbit size (semi-major axis) and the system total mass. Although, in theory, few observations on the plane of the sky may be enough to obtain a decent estimate for binary star orbits, astronomers must frequently deal with the problem of partial measurements (i.e.; observations having one component missing, either in $(X, Y)$ or $(\rho, \theta)$ representation), which are often discarded. This article presents a particle-filter-based method to perform the estimation and uncertainty characterization of these orbital parameters in the context of partial measurements. The proposed method uses a multiple imputation strategy to cope with the problem of missing data. The algorithm is tested on synthetic data of relative position of binary stars. The following cases are studied: i) fully available data (ground truth); ii) incomplete observations are discarded; iii) multiple imputation approach is used. In comparison to a situation where partial observations are ignored, a significant reduction in the empirical estimation variance is observed when using multiple imputation schemes; with no numerically significant decrease on estimate accuracy.

17.30-18.00    Jorge Silva, DIE-U.Chile, The astrometric lessons of Gaia-GBOT experiment

Since the beginning of the Gaia mission, to ensure the full capabilities of the Gaia measurements, a programme of daily observations with Earth-based telescopes of the satellite itself – called Ground Based Optical Tracking (GBOT) – was implemented [1]. These observations are carried out mainly with two facilities: the VLT Survey Telescope (ESO VST) at the Cerro Paranal in Chile and the Liverpool Telescope on the Canary Island of La Palma. The constraint of 0.02 second of arc on the tracking astrometric quality and the fact that Gaia is a faint and relatively fast moving target (its magnitude in red is around 21 and its apparent speed around 0.04â€ /s), lead us to rigorously analyse the reachable astrometric precision for CCD observations of this kind of celestial objects. We present here the main results of this study which uses the Cram Ì er-Rao lower bound to characterize the precision limit for the PSF center when drifting in the CCD-frame. This work extends earlier studies dealing with one-dimensional detectors and stationary sources [2,3] firstly to the case of standard two-dimensional CCD sensors, and then, to moving sources. [1] Altmann, M. et al., 2014, in SPIE, 9149, 15. [2] Mendez, R. A. et al., 2013, in PASP, 125, 580. [3] Mendez, R. A. et al., 2014, in PASP, 126, 798

18.00-18.30    Mauricio Marín, U. Santiago de Chile,  Distributed Processing Platform for Supporting Emergency Management from the Social Computing Side

One of the main weakness of disaster management in Chile is the lack of comprehensive data integration among the different technical entities for supporting decision making in emergency management. Currently, this problem is been solved by enabling coordination among organizations capable of providing formal data and situation assessment, all supported by geographical information systems. However, first hour/minute assessment data can be quickly made available into those geographical information systems from affected communities and volunteers by means of software tools such as crowdsourcing for disasters and related tools for supporting humanitarian computing. To make this possible at large scale, a properly distributed computing infrastructure for emergency management needs to be developed and deployed nationwide. This implies the development of processing platforms devised to process massive streams of data in real time. We refer to services for providing parallel and distributed processing of data such as text messages, images and scientific data streams. The underlying infrastructure should be featured by (1) elasticity in use of computational resources, (2) efficient parallel processing of events and operations on events, (3) scalable performance across multiple data centers geographically distributed, (4) mobility of applications among streaming platforms, (5) fault tolerance at processor level and across data centers, (6) platform services deployed in mobile phones, (7) accommodation of multiple concurrent applications with resource allocation policies based on job priority and reputation, and (8) a set of tools devised to support software development. This talk describes an effort in such direction from a R&D project funded by Fondef Conicyt.

## Fri Sep 1

9:00-10:00    Tutorial:   A. Mahabal   Classification of Astronomical Transients

10:00-10:30    Steffen Härtel, SCIAN/BNI/CIMT, F-Med. U-Chile,  Medical Informatics and Health Information System

In recent years, the use of optical microscopy and medical image processing has become increasingly relevant for research and medical practice. In vivo microscopy has contributed to the study of cellular structure and function with previously unattained resolution, setting a basis for a deeper understanding of diseases like Cancer, Alzheimer, or Parkinson. Common goals of computational methods for image processing in this context are the quantification of static or dynamic structures either at cellular level, like nuclei or membranes, or at tissue level. Typical analysis tasks are shape description, and tracking, whose combination leads to dynamics quantification.

10:30-11:00    Coffee break

11:00-11:15    Mauricio Cerda, SCIAN/BNI/CIMT/CENS, F-Med. U-Chile,  Image Processing, Health Information Science, and Big Data Management

In recent years, the use of optical microscopy and medical image processing has become increasingly relevant for research and medical practice. In vivo microscopy has contributed to the study of cellular structure and function with previously unattained resolution, setting a basis for a deeper understanding of diseases like Cancer, Alzheimer, or Parkinson. Common goals of computational methods for image processing in this context are the quantification of static or dynamic structures either at cellular level, like nuclei or membranes, or at tissue level. Typical analysis tasks are shape description, and tracking, whose combination leads to dynamics quantification. The new generation of high-throughput optical microscopy operating now in South America includes lightsheet microscopy, and automated acquisition setups (tissue scanner, multi-well). These technologies generate large volumes of data with specific optical characteristics which make proper image processing methods critical for the efficient extraction of relevant information. In this talk, I will show current work of our lab in deploying storage and network technologies available to local researchers to promote cutting edge collaborative works at the intersection of Cell Biology, Microscopy, and Image Processing. Examples of these efforts are: large-scale cell tracking in development biology, and tissue quantification in gastric cancer.

11.15-11.30    Paola Scavone, IIBCE, Montevideo, Uruguay, In vivo Biofilm Development in Flow Chambers under Dynamic Conditions

Biofilms are the most successful microbial style of life on earth and are assemblies of microbial cells enclosed in a matrix of polysaccharide and associated to a surface. Biofilm-associated organisms differ from their planktonic counterparts with respect to the transcribed genes and the growth rate. They are relevant for biogeochemical processes such as bioremediation or degradation, but also in a medical context. In Biomedical context, biofilms are responsible for more than 80% of the infectious disease and 65% of the nosocomial infections. Cystic fibrosis, native valve endocarditis, otitis media, periodontitis, and chronic prostatitis are caused by biofilm-associated microorganisms. Indwelling medical devices such as urinary catheters harbor biofilms and are associated to infections. One of the most relevant features is that bacteria in Biofilms are highly resistant to antimicrobial agents and to clearance by the immune system. Different biofilm characteristics can be important for the infectious disease: (i) detachment of cells or aggregates may result in bloodstream or urinary tract infection, (ii) cells may exchange resistance plasmids within biofilms, (iii) cells in biofilm have reduced susceptibility to antimicrobial agents even in susceptible strains, and (iv) production of endotoxins and virulence factors. The majority of studies have been based on either fixed biofilms grown on cover slips but considering the nature of biofilm it is mandatory to develop hydrated reliable dynamic flow chambers to study biofilm in real time. In this context, our group is developing in vivo, flow chambers under dynamic conditions within a 3D matrix using 4-lens LSFM. This system will allow us to approximate to the natural conditions encounter by bacteria in urinary catheters and to understand the role of different structures during biofilm formation.

11.30-11.45    Jocelyn Dunstan, SCIAN/CIMT, F-Med. U-Chile, Machine Learning reveals Food Patterns and Obesity Prevalence across Countries

Obesity is recognized as a global pandemic, which has raised interest from a number of scientific disciplines. Using standardized industry data from different countries, we have implemented a machine learning approach to identify food patterns associated with a higher obesity risk. The input variables (or predictors) to the proposed model were the volume of food purchased across 52 categories. For the method, we considered random forests, an ensemble technique composed of multiple rule-based decision trees, to perform a binary classification of countries’ obesity risk, i.e., “low” vs. “high” obesity prevalence. The outcome of the random forest provides estimates of the obesity risk and also a ranking of the variables that best predict the outcome. The latter can be used to design a national dietary risk index based on the impact that certain foods have on the obesity risk. The method had a 71% of accuracy, and identified food highly processed as important predictors of obesity at the country level, which has been recognized as one of the main driver of the nutrition transition.

11.45-12.00    Rodrigo Martínez, SCIAN/CIMT, Escuela de Salud Publica F-Med, U-Chile, Using Simulation to Leverage Patient Safety and Health Efficiency

The recognition of patient´s rights has become a powerful incentive to look for alternatives in medical education and training. Although patient´s participation is irreplaceable, simulation tools can help to create competences, improve knowledge and reasoning prior to contact real patients. Virtual platforms stimulate cooperative creation and analysis of simulated and virtual patients, under standardized conditions in controlled environments. Using the same approach, decision makers in health management that are under increasing pressure to allocate scarce resources, can analyze different scenarios before offering managers, policy makers and stakeholders the best options to deliver healthcare in demanding populations. To accomplish both challenges is necessary to create or have access to already existing databases containing diverse classes of information including clinical, social, economical and educational data, through electronic platforms and tools at a national level to potentiate the exchange of key information. This presentation will discuss some examples regarding clinical simulation and analyze a specific healthcare model that will be used as a basic framework to create virtual environments for managerial purposes.

12.00-12.15    Paulina Ruiz, CIMT/D. Tecnología Medica, F-Med, U-Chile,  Data Quality

The development of personalized or precision medicine has emerged as an alternative to conventional medical therapeutic strategies, which considers the individual characteristics of patients and their relationship with the environment, to choose the best therapy available for each individual. This is achieved through the interaction of internal hospital information systems such as electronic clinical record, laboratory information system, among others, with external data systems such as electronic databases, which should communicate each other. Therefore, the implementation of interoperable systems that use international standards, is imperative to achieve the necessary communication between information systems. Consequently, the Interoperability environment should be generated in our health information systems, to generate improvements related to personalized medicine using Information Technologies in health. In this work, we discuss the current situation regarding the interoperability necessary for implementation, massification and the challenges to achieving the improvements that entail the use of this new paradigm of the treatment of diseases.

12.15-12.30    Víctor Castañeda, SCIAN-Lab, F-Med. U-Chile,  Telemedicine and Data Connectivity @ Campus Norte, Universidad de Chile

BioMedHPC, a U-Redes project of the Universidad de Chile, is focused to create high-speed network infrastructure and to train people in big data. This project has connected six laboratories of Faculty of Medicine and the University Hospital (HCUCH) to the fasted research network in Chile. All of them have different necessities such as microscopic image processing, genetic sequencing and others. Thanks to the close collaboration with NLHPC and REUNA, this network can today process data in the NLHPC Cluster and access our new database at 10 Gbps, allowing transferring data such a local hard drive. One example of using this network is the new telemedicine project promoted by the Rector of the University, establishing the necessity of developing the telemedicine in order to forward Medical Informatics and support in the resolution of public health issues, thanks to the collaboration of a multidisciplinary group coming from different knowledge areas. Recently, 7 telemedicine projects are running in the Hospital using the implemented BioMedHPC and REUNA networks. Palliative care tele-committee, children tele-psychiatry, tele-kinesiology and tele-pathology are some examples of running projects. All of them are focused to reduce gaps of the Chilean public health. The projects are the results of the BioMedHPC network and a telemedicine internal contest. Both have fostered the developing of innovative solution for healthcare assistance, health professional tele-education and medical consultation inside the Faculty of Medicine and Hospital, demonstrating that the multidisciplinary group interaction produce high quality solutions and research.

12.30-12.45    Stefan Sigle, SCIAN/CIMT, F-Med. U-Chile/U- Münster/U-Heilbronn,  Establishing patient-reported data for pharmacoepidemiology through resilience in socio technical systems

Pharmacoepidemiology tries to establish an image of the use and effects of drugs on the population. Collecting continuous pharmacoepidemiologic and pharmacoeconomic data is challenging due to the distributed nature of socio-technical systems (STS) while complexity increases due to system heterogeneity. Patient-reported data (Crowdsourcing) is a promising approach to collect information about drug consumption and obtainment. In order to ensure data quality and acceptance, risk management (RM) is needed to ensure resilience against systemic impairments and a resulting loss of data. A process-based, user-centered development in combination with interoperable healthcare communication standards (e.g. HL7 – FHIR) will be executed in order to provide a resilient STS. The established system will serve as parameterization and evaluation of the resilience-based RM model using key performance indicators and risk state transition probabilities. I address the hypothesis that a resilience based RM improves availability and reliability of distributed socio-technical systems. The goal is to provide a method to formally describe resilience in distributed STS, providing a quantitative measurement for resilience. This leads towards the possibility for a trade-off optimization in terms of cost, value and control, based on probabilistic system transition parameters. Well understood systems can serve as initial parameter providers for the model, allowing for a complemented RM beyond the scope of traditional approaches, taking advantage and revealing potential of resilience to optimize systems. Based on the outcomes, an extrapolated general system description is elaborated and validated on other systems leading towards the possibility of resilience benchmarking.

12:45-14:45    Lunch

14.45-16.15    Round table:   Joint session with CORFO astroinformatics initiative

16.45-17.00    Coffee break

17.00-17.30    Juan Velásquez, DIE-U.Chile, Twitter for marijuana infodemiology

Today online social networks seem to be good tools to quickly monitor what is going on with the population, since they provide environments where users can freely share large amounts of information related to their own lives. Due to well known limitations of surveys, this novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use. The aim of this work is to make use of text messages (tweets) and relationships between Chilean Twitter users to predict marijuana use among them. To do this we collected Twitter accounts using a location-based criteria, and built a set of features based on tweets they made and ego centric network metrics. To get tweet-based features, tweets were filtered using marijuana-related keywords and a set of 1000 tweets were manually labeled to train algorithms capable of predicting marijuana use in tweets. In addition, a sentiment classifier of tweets was developed using the TASS corpus. Then, we made a survey to get real marijuana use labels related to accounts and these labels were used to train supervised machine learning algorithms. The marijuana use per user classifier had precision, recall and F-measure results close to 0.7, implying significant predictive power of the selected variables. We obtained a model capable of predicting marijuana use of Twitter users and estimating their opinion about marijuana. This information can be used as an efficient (fast and low cost) tool for marijuana surveillance, and support decision making about drug policies.

17.30-17.50    Pablo Román, U. Santiago de Chile, High performance compressed sensing for image synthesis

Image synthesis is the image reconstruction process from interferometric data. With the advent of high-throughput telescopes novel methods are required in order to reach the high fidelity regime. Today image synthesis are based on the gridding processing in order to use fast Fourier transform. Therefore, the griding processing reduce data losing statistical significance. We are currently studying a Compressed Sensing approach together with the use of Hermite functions to represent image and data. Thus, the imaging process and its inversion is modelled in a continuous space, and the need for gridding or interpolation is eliminated. Also, since Hermite functions are invariant under Fourer transformation, we are able to compute Fourier data without the use of the Fourier transform. We explore convex splitting algorithm such as the alternating direction of multipliers optimization algorithm (ADMM). Nevertheless, the optimization problem remains compute intensive, and therefore we design and implement the algorithm in GPU using CUDA.

17.50-18.10    Fernando Rannou, U. Santiago de Chile, Convex penalization for image synthesis

Current radio interferometers sample the Fourier transform of the sky image at larger rates than ever before, producing large amount of data. Despite better and more dense uv-coverage, the problem of reconstructing the sky image (image synthesis) remains an ill-posed problem. CLEAN is an efficient synthesis method and the de-facto approach for image reconstruction, but it requires user assistance during iteration, especially in extended, smooth objects. Other methods based on optimization and regularization are proposed as expensive alternatives, because they require large amounts of computing resources. GPGPU is a parallel computing technology that can provide efficient solutions to optimization-based image synthesis, making it practical for astronomical research purposes. In this work we develop and study a penalized maximum-likelihood algorithm for image synthesis on GPU. We study the effect of quadratic and total variation penalizers on several imaging parameters, for both synthetic and real ALMA data sets.

18.10-18.30    Miguel Cárcamo, U. Santiago de Chile,  High performance multifrequency synthesis for radio astronomy

Multi-frequency synthesis uses visibility data measured over a range of frequencies when forming a continuum range. In particular, continuous synchrotron emission is known to show a power-law spectrum. In this work, we propose a method in which the optimization problem is translated into one in which these image parameters are estimated from visibility data from several frequency bands, instead of estimating the sky image directly. We have found that this approach not only fits really well to the Maximum Entropy Method but for the Single Instruction Multiple Thread (SIMT) paradigm too. This has allowed us to develop a powerful High Performance Computing (HPC) algorithm to solve the optimization problem. Several ALMA and simulated datasets are used to evaluate imaging and GPU performance.

20.00    Conference dinner

## am – Sat Sep 2

9.00-9.30    Ignacio Toledo, ALMA,  Scheduling challenges in the era of big observatories: The ALMA experience.

There are plenty of publications, thesis, proceedings and literature about the design and development of scheduling algorithms for the Atacama Large Millimeter and Submillimeter Array. Many of these works have been the result of great research efforts and new algorithms have been proposed. We can trace the first publications on the ALMA scheduling up to more than a decade before the start of ALMA Early science back in 2011. However, most of these proposed solutions and algorithms did not make it to the daily ALMA operations. In practice, less sophisticate algorithms are currently being used to provide recommendations and therefore guide the criteria of the Astronomer on Duty which will ultimately decide the project to be queued for observations. The purpose of this presentation is to share with the community the experience of ALMA in developing a scheduling algorithm throughout the last 10 years. We will show how much of its design was based on the requirements and constraints defined in the original ALMA operations plan, and how the definition of the scheduling problem has changed along with the ALMA operations requirements. We will also introduce the actual constraints the scheduler has to deal today, after almost 6 years of science operations, with an emphasis of how these operation changes have made many of the previous implementations obsolete as a consequence of the change in the problem definition. We will finally present the current scheduling platform and future plans after the long learning path our team has followed. The ALMA’s experience from theoretical design to real implementation may be important for the current design of scheduling algorithms which are being developed for the next generation of big observatories, and it presents new challenges for high performance computing and optimization research.

9.30-10.00    John Carpenter, ALMA,  Mining the ALMA Archive

ALMA will be the premier telescope for sensitive, high resolution observations at submillimeter wavelengths for decades to come. Each year, ALMA solicits proposals from the community for new observations. Successful proposal teams have proprietary access to the data for 1 year, after which the data become publicly available through the ALMA archive. As the public archive continues to grow, it will become the source for an increasing number, if not the majority, of publications. Archival research will include analyzing data in new ways not anticipated by the original proposal, and combining data from different programs to conduct more comprehensive studies. I will highlight areas of potential archival research, and the challenges faced by the data analysis.

10:00-10:30    Ginés Guerrero, CMM-U.Chile, NLHPC,  National Laboratory for High Performance Computing (NLHPC) – How to use Leftraru?

I will make a brief description of the NLHPC project and I will give some tips about using our supercomputer Leftraru.

10:30-11:15    Coffee break

11.15-11.45    Alejandro Jofré, CMM-U.Chile,  In this talk we give an overview of fundamental properties for “optimal points” and some algorithms for computing these equilibria. We also give some exemples coming from logistics, engineering and economics.

In this talk we give an overview of fundamental properties for “optimal points” and some algorithms for computing these equilibria. We also give some exemples coming from logistics, engineering and economics.

11.45-12.25    Marko Budinich & Damien Eveillard, U. Nantes, France

Recent biotechnology experiments drastically improved our ability to investigate biological systems. However, behind the trend of biological «  big data » and related data analysis, such a surge of knowledge also promotes the need for other modeling frameworks. In particular, one must today integrate diverse quantitative behaviors within a single modeling, while considering genomic knowledge. For this purpose, multi-objective (MO) paradigm appears as a natural and promising extension for standard systems biology modelings.In this seminar, we will present how recently multi-objective techniques hold attention in systems biology. After a short overview of  new biological hypotheses raised by MO paradigm, we will highlight few case studies that must be further investigated. For the sake of illustration, we will present a how the multi-objective paradigm allows the modeling of interplays within microbial communities. Indeed, constraint-based modeling (CBMs) build predictive models from recent high-resolution -omics datasets for single-strain systems. We consider herein microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. For modeling this change of modeling changing, two multi-objective extensions of CBMs for modeling communities will be described: multi-objective flux balance analysis (MO-FBA) and multi- objective flux variability analysis (MO-FVA). As a biological result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, will be emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity) that take place at the ecosystem scale.

12.25-12.45    Juan Carlos Letelier, Biology Department, U. Chile,   Big Data and Natural Disasters in Chile’s 21st Century

12.25-13. 00    Final discussion