data visualization – Page 4 – vialab

DocuBurst: Visualizing Document Content using Language Structure

Contributors:

Christopher Collins, Gerald Penn, Sheelagh Carpendale, Brittany Kondo, Bradley Chicoine

DocuBurst is the first visualization of document content that takes advantage of the human-created structure in lexical databases. We use an accepted design paradigm to generate visualizations that improve the usability and utility of WordNet as the backbone for document content visualization. A radial, space-filling layout of hyponymy (IS-A relation) is presented with interactive techniques of zoom, filter, and details-on-demand for the task of document visualization. The techniques can be generalized to multiple documents.

Check out the live demo here.

Media Coverage

DocuBurst featured in Marti Hearst’s wonderful book, Search User Interfaces
DocuBurst featured in the Toronto Star!
DocuBurst on ‘information aesthetics’ blog
Interview with Margaux Watt of CBC Radio One Manitoba’s “Up To Speed“, 21 Feb, 2008:
A feature story on DocuBurst aired on FairChild TV “Media Focus” (cable 36 in Toronto), Friday, March 14, 2008!

Resources

Software

The code for displaying and interacting with radial, space-filling trees in prefuse is open source and is available for download. The code is distributed as a zip file and can be imported into Eclipse. It is dependent on the prefuse information visualization toolkit and, unfortunately, is minimally documented at this time:

Radial Space Filling Trees in prefuse [.zip] (requires separate prefuse download) or
Mavenized code, including pom, courtesy Brian O’Neill or
Executable Jar with prefuse embedded [.jar]

Publications

Acknowledgements

Lattice Uncertainty Visualization: Understanding Machine Translation

Contributors:

Christopher Collins, Gerald Penn, and Sheelagh Carpendale

Lattice graphs are used as underlying data structures in many statistical processing systems, including natural language processing. Lattices compactly represent multiple possible outputs and are usually hidden from users. We present a novel visualization intended to reveal the uncertainty and variability inherent in statistically-derived outputs of language technologies. Applications such as machine translation and automated speech recognition typically present users with a best guess about the appropriate output, with apparent complete confidence.

Through case studies in cross-lingual instant messaging chat and speech recognition, we show how our visualization uses a hybrid layout along with varying transparency, colour, and size to reveal the various hypotheses considered by the algorithms and help people make better-informed decisions about statistically derived outputs.

Publications

Acknowledgements

WordNet Visualization

Contributors:

Christopher Collins

Interface designs for lexical databases in NLP have suffered from not following design principles developed in the information visualization research community. We present a design paradigm and show it can be used to generate visualizations that maximize the usability and utility of WordNet. The techniques can be generally applied to other lexical databases used in NLP research.