Information overload? There's a solution for that

Researchers develop new software architecture that accelerates research and boosts quality of writing

March 5, 2013

By Laurence Miall

René Witte and Bahar Sateli. | Photo by Concordia University

It is estimated that every two days, humans create as much information as they did between the dawn of civilization and 2003. For scientific researchers and professionals in various industries, reviewing and selecting information that is relevant for a specific purpose is difficult and time-intensive. Understanding it can be even more challenging.

This is why Concordia University researchers are pioneering computer solutions that make reading, understanding and writing faster and more accurate. It’s a potent mix of two computer applications, one very specialized, and one enormously popular.

Natural Language Processing (NLP) has been married to the MediaWiki software used by such websites as Wikipedia.

The new open source architecture was developed by Bahar Sateli, a current PhD student, as part of her master’s thesis under the supervision of professor René Witte in the Department of Computer Science and Software Engineering. The novel architecture has been tested through two new applications, IntelliGenWiki, for genomics, and ReqWiki, for software engineering. Sateli will present the work at the Semantic MediaWiki Conference in New York City, March 20 to 22.

IntelliGenWiki was tested at Concordia’s Centre for Structural and Functional Genomics, where researchers have been working on identifying enzymes that can produce environmentally-friendly biofuels. IntelliGenWiki helped scientists identify relevant research abstracts 67 per cent faster than conventional reading. Retrieving useful data from full academic papers was 20 per cent faster. Best of all, because of the wiki format, these searches could be conducted by people with no training in Natural Language Processing.

“What we do speeds up research and increases the potential for faster scientific breakthroughs,” says Sateli.

“This is a new paradigm for human-computer interaction,” explains Witte, founder and director of Concordia’s Semantic Software Lab, where IntelliGenWiki and other applications have been developed. “People are used to the idea of humans working together on sites such as Wikipedia. What’s new is we’ve introduced a way for humans to work collaboratively with computers and use Semantic Assistants (software architecture) for these knowledge-intensive tasks. The novel introduction of Natural Language Processing architecture to MediaWiki helps with such tasks as summarization, entity detection, quality assurance and more.”

Another example of the researchers’ open source Wiki-NLP innovation is ReqWiki, now in its second year of use among some of Concordia’s software engineering students. A vital part of making new software is the requirements development process, where business leaders and engineers have to agree on what exactly will go into a new product. Engineers prepare requirements documents, but being human, they make errors. A simple mistake, such as employing the passive instead of active voice, can lead to costly confusion.

ReqWiki can help with the writing of requirements documents so that ambiguities are reduced and different parties understand each other better. Concordia students who use ReqWiki are producing requirements documents that have a measurably better quality than those created with a traditional word processor.

Greg Butler, co-founder of the Centre for Structural and Functional Genomics, who co-authored the paper on IntelliGenWiki, sums up the overall importance of the potential of the Concordia research. “What we’re doing is capturing meaning,” he says. “With computer assistance, humans can understand each other better."

Related links:
•    Concordia Semantic Software Lab
•    Wiki-NLP Integration Announcement
•    Department of Computer Science and Software Engineering
•    Faculty of Engineering and Computer Science

Information overload? There's a solution for that

Related Topics