A Distributed Graph Database for Large-Scale Text Analytics

SCHEME: CORE

CALL: 2017

DOMAIN: IS - Business Service Design

FIRST NAME: Martin

LAST NAME: Theobald

INDUSTRY PARTNERSHIP / PPP: No

INDUSTRY / PPP PARTNER:

HOST INSTITUTION: University of Luxembourg

KEYWORDS: Information Extraction, Knowledge Base Construction, Distributed Graph Databases, Big Data Analytics

START: 2018-06-01

END: 2021-05-31

WEBSITE: https://www.uni.lu

Submitted Abstract

The World Wide Web is the most comprehensive – but likely also the most complex – source of information that we have access to today. More than 95 percent of all information in the Surface Web, i.e., the part of the Web that is publically accessible either as static pages or in the form of dynamically created contents, is in fact estimated to consist of text. This textual data just happens to sometimes be interspersed with semi-structured components such as form fields, lists, and tables – or so-called “infoboxes” in Wikipedia. These infoboxes, plus perhaps some more metadata, however still constitute the main source of information for all of the currently available, Web-extracted knowledge bases (KBs) such as DBpedia, YAGO, Freebase, and Wikidata. This means that we currently exploit only a very small fraction of the information that is published on the Web for the purpose of Information Extraction (IE) and KB construction. In this project proposal, and as opposed to virtually all of the present KB endeavours, we advocate that the text itself is the most comprehensive knowledge base we can possibly have. That is, by exploiting the syntactic and semantics dependencies of information conveyed in Web documents, BigText aims to build a large-scale, distributed graph database of highly interlinked and semantically enriched documents that serves as a basis for high-accuracy retrieval of information, mining of syntactic and semantic relationships among real-world entities, and – more broadly – a whole line of online analytical tasks in the context of text and knowledge mining. In other words, we intended to investigate a radically new approach to information access and retrieval that bridges the three key areas of Information Extraction, Information Retrieval and Big Data Analytics.

This site uses cookies. By continuing to use this site, you agree to the use of cookies for analytics purposes. Find out more in our Privacy Statement