Geospatial crowdsourced data fitness analysis for spatial data infrastructure based disaster management actions

Koswatte, Saman (2017) Geospatial crowdsourced data fitness analysis for spatial data infrastructure based disaster management actions. [Thesis (PhD/Research)]

Text (Whole Thesis)

Download (5MB) | Preview


The reporting of disasters has changed from official media reports to citizen reporters who are at the disaster scene. This kind of crowd based reporting, related to disasters or any other events, is often identified as 'Crowdsourced Data' (CSD). CSD are freely and widely available thanks to the current technological advancements. The quality of CSD is often problematic as it is often created by the citizens of varying skills and backgrounds. CSD is considered unstructured in general, and its quality remains poorly defined. Moreover, the CSD's location availability and the quality of any available locations may be incomplete. The traditional data quality assessment methods and parameters are also often incompatible with the unstructured nature of CSD due to its undocumented nature and missing metadata. Although other research has identified credibility and relevance as possible CSD quality assessment indicators, the available assessment methods for these indicators are still immature.

In the 2011 Australian floods, the citizens and disaster management administrators used the Ushahidi Crowd-mapping platform and the Twitter social media platform to extensively communicate flood related information including hazards, evacuations, help services, road closures and property damage. This research designed a CSD quality assessment framework and tested the quality of the 2011 Australian floods' Ushahidi Crowdmap and Twitter data. In particular, it explored a number of aspects namely, location availability and location quality assessment, semantic extraction of hidden location toponyms and the analysis of the credibility and relevance of reports. This research was conducted based on a Design Science (DS) research method which is often utilised in Information Science (IS) based research.

Location availability of the Ushahidi Crowdmap and the Twitter data assessed the quality of available locations by comparing three different datasets i.e. Google Maps, OpenStreetMap (OSM) and Queensland Department of Natural Resources and Mines' (QDNRM) road data. Missing locations were semantically extracted using Natural Language Processing (NLP) and gazetteer lookup techniques. The Credibility of Ushahidi Crowdmap dataset was assessed using a naive Bayesian Network (BN) model commonly utilised in spam email detection. CSD relevance was assessed by adapting Geographic Information Retrieval (GIR) relevance assessment techniques which are also utilised in the IT sector. Thematic and geographic relevance were assessed using Term Frequency – Inverse Document Frequency Vector Space Model (TF-IDF VSM) and NLP based on semantic gazetteers.

Results of the CSD location comparison showed that the combined use of non-authoritative and authoritative data improved location determination. The semantic location analysis results indicated some improvements of the location availability of the tweets and Crowdmap data; however, the quality of new locations was still uncertain. The results of the credibility analysis revealed that the spam email detection approaches are feasible for CSD credibility detection. However, it was critical to train the model in a controlled environment using structured training including modified training samples. The use of GIR techniques for CSD relevance analysis provided promising results. A separate relevance ranked list of the same CSD data was prepared through manual analysis. The results revealed that the two lists generally agreed which indicated the system's potential to analyse relevance in a similar way to humans.

This research showed that the CSD fitness analysis can potentially improve the accuracy, reliability and currency of CSD and may be utilised to fill information gaps available in authoritative sources. The integrated and autonomous CSD qualification framework presented provides a guide for flood disaster first responders and could be adapted to support other forms of emergencies.

Statistics for USQ ePrint 34617
Statistics for this ePrint Item
Item Type: Thesis (PhD/Research)
Item Status: Live Archive
Additional Information: Doctor of Philosophy (PhD) thesis.
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Civil Engineering and Surveying (1 Jul 2013 -)
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Civil Engineering and Surveying (1 Jul 2013 -)
Supervisors: McDougall, Kevin; Liu, Xiaoye
Date Deposited: 24 Jul 2018 04:13
Last Modified: 05 Mar 2019 05:43
Uncontrolled Keywords: crowdsourced data; relevance; semantics; geographic information retrieval; natural language processing; disaster management
Fields of Research (2008): 08 Information and Computing Sciences > 0899 Other Information and Computing Sciences > 089999 Information and Computing Sciences not elsewhere classified
04 Earth Sciences > 0406 Physical Geography and Environmental Geoscience > 040604 Natural Hazards
09 Engineering > 0909 Geomatic Engineering > 090903 Geospatial Information Systems
Identification Number or DOI: doi:10.26192/5c09fc67f0cd3

Actions (login required)

View Item Archive Repository Staff Only