Thesis - Context QA

Experiments in Interactive Restricted-Domain Question Answering

Author: Martin Liljenback

Abstract:

The need for more advanced data mining and search engine technologies has been steadily increasing since the introduction of the Internet. With the exponential growth of information available on the web combined with a public that is becoming more educated in search technology, there exists a great need to quickly and efficiently be able to provide results for a large range of very specific questions. The current natural language processing is still in a primitive state. There is no single solution that will be able to provide quality results to the broad range of potential questions by using indexed data extracted from the web. However there exist several ways to provide more efficient results. One way is to develop more extensive ways to interact with users to target results related to the individual's specific needs.

This thesis focuses on a particular field of research that is called Question Answering Systems. In Question Answering the system provides answers on plain text questions through natural language processing, information retrieval, and data mining on structured or unstructured text data. A summary of the research development in this area is provided and also a description of how the algorithms and techniques have evolved over time until we are left in the current state.

Furthermore, I conclude that there are many compelling reasons to build more refined and targeted knowledge bases. With a targeted knowledgebase and knowledge about an individual?s specific needs, several algorithms can be applied which provide better results and efficiency than that of an open-domain question answering system. I show that index based search engines are far from providing the same level of accuracy as a restricted-domain QA systems. As part of the thesis a complete restricted-domain QA system is developed named ContextQA. A series of experiments are conducted where ContextQA is configured to use different approaches on restricted-domain question answering algorithms. The results show that high accuracy can be obtained within a restricted-domain with limited resources.