machine learning – vialab | Dr. Christopher Collins

Visual Analytics Tools for Academic Advising

Contributors:

Riley Weagant, Christopher Collins, Taylor Smith, and Michael Lombardo

Post-secondary institutions have a wealth of student data at their disposal. This data has recently been used to explore a problem that has been prevalent in the education domain for decades. Student retention is a complex issue that researchers are attempting to address using machine learning. This research describes our attempt to use academic data from Ontario Tech University to predict the likelihood of a student withdrawing from the university after their upcoming semester. We used academic data collected between 2007 and 2011 to train a random forest model that predicts whether or not a student will drop out. Finally, we used the confidence level of the model’s prediction to represent a student’s “likelihood of success”, which is displayed on a bee swarm plot as part of an application intended for use by academic advisors.

Publications

Guidance in the human–machine analytics process

Contributors:

Christopher Collins, Natalia Andrienko, Tobias Schreck, Jing Yang, Jaegul Choo, Ulrich Engelke, Amit Jena, and Tim Dwyer

In this paper, we list the goals for and the pros and cons of guidance, and we discuss the role that it can play not only in key low-level visualization tasks but also the more sophisticated model-generation tasks of visual analytics. Recent advances in artificial intelligence, particularly in machine learning, have led to high hopes regarding the possibilities of using automatic techniques to perform some of the tasks that are currently done manually using visualization by data analysts. However, visual analytics remains a complex activity, combining many different subtasks. Some of these tasks are relatively low-level, and it is clear how automation could play a role—for example, classification and clustering of data. Other tasks are much more abstract and require significant human creativity, for example, linking insights gleaned from a variety of disparate and heterogeneous data artifacts to build support for decision making. In this paper, we outline the potential applications of guidance, as well as the inputs to guidance. We discuss challenges in implementing guidance, including the inputs to guidance systems and how to provide guidance to users. We propose potential methods for evaluating the quality of guidance at different phases in the analytic process and introduce the potential negative effects of guidance as a source of bias in analytic decision-making.

Publications

Acknowledgements

This paper is the direct result of an NII Shonan Meeting at the Shonan Village Center in Japan. We acknowledge the hospitality of the Center in making this research possible. This work was partly supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), [grant RGPIN-2015-03916], the Fraunhofer Cluster of Excellence on ‘‘Cognitive Internet Technologies’’ and by the EU through project Track&Know (grant agreement 780754).

A Visual Analytics Framework for Adversarial Text Generation

Contributors:

Brandon Laughlin, Christopher Collins, Karthik Sankaranarayanan, and Khalil El-Khatib

This paper presents a framework that enables a user to more easily make corrections to adversarial texts. While attack algorithms have been demonstrated to automatically build adversaries, changes made by the algorithms can often have poor semantics or syntax. Our framework is designed to facilitate human intervention by aiding users in making corrections. The framework extends existing attack algorithms to work within an evolutionary attack process paired with a visual analytics loop. Using an interactive dashboard a user is able to review the generation process in real-time and receive suggestions from the system for edits to be made. The adversaries can be used to both diagnose robustness issues within a single classifier or to compare various classifier options. With the weaknesses identified, the framework can also be used as a first step in mitigating adversarial threats. The framework can be used as part of further research into defence methods in which the adversarial examples are used to evaluate new countermeasures. We demonstrate the framework with a word swapping attack for the task of sentiment classification.

Publications

Visual Analytics for Topic Model Optimization

Contributors:

Mennatallah El-Assady, Fabian Sperrle, Oliver Deussen, Daniel Keim, and Christopher Collins

To effectively assess the potential consequences of human interventions in model-driven analytics systems, we establish the concept of speculative execution as a visual analytics paradigm for creating user-steerable preview mechanisms. This paper presents an explainable, mixed-initiative topic modelling framework that integrates speculative execution into the algorithmic decision-making process. Our approach visualizes the model-space of our novel incremental hierarchical topic modelling algorithm, unveiling its inner workings. We support the active incorporation of the user’s domain knowledge in every step through explicit model manipulation interactions. In addition, users can initialize the model with expected topic seeds, the backbone priors. For a more targeted optimization, the modelling process automatically triggers a speculative execution of various optimization strategies, and requests feedback whenever the measured model quality deteriorates. Users compare the proposed optimizations to the current model state and preview their effect on the next model iterations, before applying one of them. This supervised human-in-the-loop process targets maximum improvement for minimum feedback and has proven to be effective in three independent studies that confirm topic model quality improvements.

As seen on SpecEx.

Publications

Acknowledgements

ThreadReconstructor: Modeling Reply-Chains to Untangle Conversational Text

Contributors:

Mennatallah El-Assady, Rita Sevastjanova, Daniel Keim, and Christopher Collins

We present ThreadReconstructor, a visual analytics approach for detecting and analyzing the implicit conversational structure of discussions, e.g., in political debates and forums. Our work is motivated by the need to reveal and understand single threads in massive online conversations and verbatim text transcripts. We combine supervised and unsupervised machine learning models to generate a basic structure that is enriched by user-defined queries and rule-based heuristics. Depending on the data and tasks, users can modify and create various reconstruction models that are presented and compared in the visualization interface. Our tool enables the exploration of the generated threaded structures and the analysis of the untangled reply-chains, comparing different models and their agreement. To understand the inner workings of the models, we visualize their decision spaces, including all considered candidate relations. In addition to a quantitative evaluation, we report qualitative feedback from an expert user study with four forum moderators and one machine learning expert, showing the effectiveness of our approach.

Publications

Detecting Negative Emotion for Mixed Initiative Visual Analytics

Contributors:

Prateek Panwar and Christopher Collins

The work describes an efficient model to detect negative mind states caused by visual analytics tasks. We have developed a method for collecting data from multiple sensors, including GSR and eye-tracking, and quickly generating labelled training data for the machine learning model. Using this method we have created a dataset from 28 participants carrying out intentionally difficult visualization tasks. We have concluded the paper by discussing the best performing model, Random Forest, and its future applications for providing just-in-time assistance for visual analytics.

Publications

Progressive Learning of Topic Modeling Parameters

Contributors:

Mennatallah El-Assady, Rita Sevastjanova, Fabian Sperrle, Daniel Keim, and Christopher Collins

Topic modelling algorithms are widely used to analyze the thematic composition of text corpora but remain difficult to interpret and adjust. Addressing these limitations, we present a modular visual analytics framework, tackling the understandability and adaptability of topic models through a user-driven reinforcement learning process that does not require a deep understanding of the underlying topic modelling algorithms. Given a document corpus, our approach initializes two algorithm configurations based on a parameter space analysis that enhances document separability. We abstract the model complexity in an interactive visual workspace for exploring the automatic matching results of two models, investigating topic summaries, analyzing parameter distributions, and reviewing documents. The main contribution of our work is an iterative decision-making technique in which users provide document-based relevance feedback that allows the framework to converge to a user-endorsed topic distribution. We also report feedback from a two-stage study which shows that our technique results in topic model quality improvements on two independent measures.

This research was given a Best VAST Paper Honorable Mention Award at VAST 2017.

To apply our technique on your own data or try out a demo, please visit http://visargue.dbvis.de/ (Individual accounts will be created upon request).

Demo Video

Talk from IEEE VAST 2017

Publications

#FluxFlow

Contributors:

Jian Zhao, Nan Cao, Zhen Wen, Yale Song, Yu-Ru Lin, and Christopher Collins

We present FluxFlow, an interactive visual analysis system for revealing and analyzing anomalous information spreading in social media. Every day, millions of messages are created, commented on, and shared by people on social media websites, such as Twitter and Facebook. This provides valuable data for researchers and practitioners in many application domains, such as marketing, to inform decision-making. Distilling valuable social signals from the huge crowd’s messages, however, is challenging, due to the heterogeneous and dynamic crowd behaviours. The challenge is rooted in data analysts’ capability of discerning the anomalous information behaviours, such as the spreading of rumours or misinformation, from the rest that are more conventional patterns, such as popular topics and newsworthy events, in a timely fashion. FluxFlow incorporates advanced machine learning algorithms to detect anomalies and offers a set of novel visualization designs for presenting the detected threads for deeper analysis. We evaluated FluxFlow with real datasets containing the Twitter feeds captured during significant events such as Hurricane Sandy. Through quantitative measurements of the algorithmic performance and qualitative interviews with domain experts, the results show that the back-end anomaly detection model is effective in identifying anomalous retweeting threads, and its front-end interactive visualizations are intuitive and useful for analysts to discover insights in data and comprehend the underlying analytical model.