data investigation – vialab | Dr. Christopher Collins

Hierarchical Matrix for Visual Analysis of Cross-Linguistic Features

Contributors:

Mariana Shimabukuro, Jessica Zipf, Mennatallah El-Assady, and Christopher Collins

This paper presents a visualization technique for cross-linguistic error analysis in large learner corpora. H-Matrix combines a matrix, which is commonly used by linguists to investigate cross-linguistic patterns, with a tree diagram to aggregate and interactively re-weight the importance of matrix rows to create custom investigative views. Our technique can help experts to perform data operations, such as feature aggregation, filtering, ordering and language comparison interactively without having to reprocess the data. H-Matrix dynamically links the high-level multi-language overview to the extracted textual examples, and a reading view where linguists can see the detected features in context, confirm and generate hypotheses.

The source code for H-matrix can be found on our Github.

Publications

Acknowledgements

The authors wish to thank the reviewers, our colleagues, and domain experts. This work was supported in part by NSERC Canada Research Chairs and a grant from SFB-TRR 161. This research has also been made possible by the Ontario Research Fund, funding research excellence.

EduApps: Helping Non-Native English Speakers with Language Structure

Contributors:

Mariana Shimabukuro and Christopher Collins

First language (L1) influence errors are very frequent in English learners (L2), even more so when the learner’s proficiency level is higher (upper-intermediate/advanced). Our project aims to analyze errors made by learners from specific L1’s using learner corpora. Based on the analysis we want to focus on a specific type of error and research a way to identify it automatically in learners’ essays depending on their L1. This would allow us to implement an application that helps English as Second Language (ESL) students to identify and analyze their errors and to better understand the reasoning behind them, consequently improving the students’ English level.

About the EduApps initiative

EduApps is a suite of apps housed in an online environment that focuses on the health, well-being and development of one’s mind, body and community. Our research project titled, “There’s an App for That” is investigating the design process, development, implementation and evaluation of this suite of educational apps. Specifically, we are interested in helping students build confidence and competence in the cognitive, socio-emotional and physical domains. We are also interested in the impact a learning portal can have on students’ learning, teachers and the surrounding community. We hope that our research can build capacity for investigating and affecting innovation in formal and informal education settings in the use of digital technology. We have partnered with school boards and community organizations to develop and research the apps. More about each of the domains — their purpose, apps and related research can be found at http://eduapps.ca/.

Publications

Acknowledgements

Parallel Tag Clouds

Contributors:

Christopher Collins, Fernanda B. Viégas, and Martin Wattenberg

Do court cases differ from place to place? What kind of picture do we get by looking at a country’s collection of law cases? We introduce Parallel Tag Clouds: a new way to visualize differences amongst facets of very large metadata-rich text corpora. We have pointed Parallel Tag Clouds at a collection of over 600,000 US Circuit Court decisions spanning a period of 50 years and have discovered regional as well as linguistic differences between courts. The visualization technique combines graphical elements from parallel coordinates and traditional tag clouds to provide rich overviews of a document collection while acting as an entry point for the exploration of individual texts. We augment basic parallel tag clouds with a details-in-context display and an option to visualize changes over a second facet of the data, such as time. We also address text mining challenges such as selecting the best words to visualize, and how to do so in reasonable time periods to maintain interactivity.

This research was given the VAST Test of Time Award at the IEEE Conference in 2019.

[Download high-resolution mp4]

Slides from the presentation at IEEE VAST 2009