text entity extraction – vialab | Dr. Christopher Collins

NEREx: Named-Entity Relationship Exploration in Conversations

Contributors:

Mennatallah El-Assady, Rita Sevastjanova, Bela Gipp, Daniel Keim, and Christopher Collins

We present NEREx, an interactive visual analytics approach for the exploratory analysis of verbatim conversational transcripts. By revealing different perspectives on multi-party conversations, NEREx gives an entry point for the analysis through high-level overviews and provides mechanisms to form and verify hypotheses through linked detail-views. Using a tailored named-entity extraction, we abstract important entities into ten categories and extract their relations with a distance-restricted entity-relationship model. This model complies with the often ungrammatical structure of verbatim transcripts, relating two entities if they are present in the same sentence within a small distance window. Our tool enables the exploratory analysis of multi-party conversations using several linked views that reveal thematic and temporal structures in the text. In addition to distant-reading, we integrated close-reading views for a text-level investigation process. Beyond the exploratory and temporal analysis of conversations, NEREx helps users generate and validate hypotheses and perform comparative analyses of multiple conversations. We demonstrate the applicability of our approach on real-world data from the 2016 U.S. Presidential Debates through a qualitative study with three domain experts from political science.

For a demo, please visit: http://visargue.inf.uni-konstanz.de/

Publications

Lattice Uncertainty Visualization: Understanding Machine Translation

Contributors:

Christopher Collins, Gerald Penn, and Sheelagh Carpendale

Lattice graphs are used as underlying data structures in many statistical processing systems, including natural language processing. Lattices compactly represent multiple possible outputs and are usually hidden from users. We present a novel visualization intended to reveal the uncertainty and variability inherent in statistically-derived outputs of language technologies. Applications such as machine translation and automated speech recognition typically present users with a best guess about the appropriate output, with apparent complete confidence.

Through case studies in cross-lingual instant messaging chat and speech recognition, we show how our visualization uses a hybrid layout along with varying transparency, colour, and size to reveal the various hypotheses considered by the algorithms and help people make better-informed decisions about statistically derived outputs.

NEREx: Named-Entity Relationship Exploration in Conversations

Contributors:

Publications

Lattice Uncertainty Visualization: Understanding Machine Translation

Contributors:

Publications

Acknowledgements