Integrating Chinese Data from Sina Weibo to the LITMUS Landslide Detection System
MetadataShow full item record
The detection of landslides has been a challenging problem for researchers since there are no dedicated physical sensors to detect landslides. LITMUS is a landslide detection system based on information from both social media platforms and physical sensors. It does have its own limitations, however, because it only supports English data. We propose to integrate the Chinese data from Sina Weibo to the LITMUS landslide detection system to extend its service. The Chinese LITMUS system pipeline starts off by collecting data from Sina Weibo using a web crawler. Then, it applies a few filtering techniques to tackle part of the noise that comes with the dataset. Subsequently, the system uses a combination of Named Entity Recognition (NER)-based and gazetteer-based approach to geo-tag the data items. The data-items that contain the same location entity are grouped to one cluster, which represents a candidate event. The system then classifies each data item to identify the remaining noise by using Word2Vec and Support Vector Machine (SVM). Lastly, the system makes a decision based on the majority label assigned to each cluster by the classifier as to whether or not a candidate event is an actual landslide event. Through our experiments, we show that the classification component of the system achieves about 0.96 in precision, recall and F-measure using the evaluation dataset, and that the system is able to detect a large number of landslides in China.