From storms to quakes, natural disasters wreak havoc
As soon as natural disaster events happen, users are eager to know more about them. Often times they look out for related facts. For example, what is the severity of the storm? Or the magnitude of an earthquake? In 2012, there were 905 natural catastrophes worldwide, 93% of which were weather-related ones. Overall costs were $170 billion! Storms (meteorological) represented 45% of such disasters; floods (hydrological), 36%; heat waves, cold waves, droughts, wildfires (climatological), 12%; and earthquakes and volcanic eruptions (geophysical events), 7%.
Netizens turn to search engines for news on disasters
Searchers are interested in knowing about the damage caused by these natural calamities, e.g., the death toll, number of homes destroyed. Search engines currently provide a ten-blue-link interface for such queries. Users will find the results more relevant if they are shown a structured summary of the fresh event related to such queries. This would not just reduce the number of user clicks required to get the relevant information but would also keep users updated with detailed event information. This is especially helpful for users in times of disaster by making information available at their fingertips quickly, thus putting their concerns to rest.
Data, data everywhere - but extracting useful data can be a challenge
To show a structured summary of such events, one needs to obtain fresh information about such events. In case of natural disasters, where else but Twitter would one find information fastest? There’s a catch; tweets can be highly unstructured and noisy too. Hence, extracting useful information from tweets is, no doubt, challenging. First, one needs to pick out all tweets relating to a particular disaster. Next, these must be linguistically analysed to extract useful and structured information. How do we extract meaning (semantic attribute-values) from tweets? In many cases, there wouldn’t be any standard approach (schema) in place. How then do we create such a schema? And after we have created a schema and, further, extracted structured information, what is the approach to mapping attributes from the information extracted to standard attributes in event schemas?
Here’re some ways to decode social media data
We use dependency parsing and part-of-speech tagging for linguistic parsing of tweets. Next, we design novel algorithms to extract both numeric as well as textual attribute-value pairs from tweets. We generate schemas for five different event-types by leveraging Wikipedia Infoboxes and complement this with some manual efforts. Finally, we present a novel algorithm to map extracted information to standard structured fields in the event schemas.
Dependencies representation is a simple description of the grammatical relationships in a sentence, which is used to extract textual relations. Dependencies are binary grammatical relations each between a governor and a dependent. For example, in “Conflagration engulfs 110000 acres”, “engulfs” is the governor, and “conflagration” is the dependent of the “nsubj” (nominal subject) dependency.
Extracting the values for numeric attributes
The first step is to split sentences into self-complete sub-units. Next, doing a dependency analysis on such sub-units will help detect numeric words (values) and things (attributes) described by those numbers using the “nn/num” (numeric) dependencies. The dependencies indicate relationships between pairs of words only, so in order to extract complete attribute names, we need to combine a few dependencies. For example, “The earthquake had a magnitude of 5.7” leads to “(magnitude, 5.7).”
Extracting the values for textual attributes
Textual attribute-value pairs are ones in which the value is non-numeric text. Given a tweet, there are three ways to obtain attribute-value pairs:
- A central attribute-value pair related to subject of the tweet
- Attribute-value pairs related to the root word of the tweet
- Attribute-value pairs connected to preposition dependencies
For example, “Cyclone Hudhud damages Visakhapatnam Airport” leads to “(Airport, Visakhapatnam).”
Schema Filling
For each event type, we use tweets of one event in training phase to manually learn attributes for the event schema. For every event, the schema contains attribute names, units, value ranges, data-types, and synonyms. We use each extracted (attribute, value) pair to fill the value for the best matching schema attribute.
Study provides encouraging results
We selected five natural disaster event types: earthquakes, hurricanes, floods, wildfires and landslides. For each event type, we crawled tweets of 3–5 recent events using the Twitter search API. On average, the data set consists of more than 3000 tweets per event. The event schemas contain 30 attributes per event on average. We measured the precision with which the proposed algorithms could fill event schemas, and also the recall, that is, the number of event attributes that could be filled. The table below shows the attribute-value pairs extracted for the event “Chile Earthquake 2014”.
We observed an average precision and recall of 0.851 and 0.385 respectively across 20 events. Tweets may not always contain all information. We extracted top 20 URLs mentioned in tweets for an event and extracted attribute-value pairs from them. Using tweets+URLs increases the precision and recall to 0.874 and 0.460 respectively. We discussed the problem of extracting structured information for natural disaster events from Twitter. We solved this problem by first performing linguistic analysis like part-of-speech tagging and dependency parsing. We designed novel algorithms for attribute-value extraction and to map extracted attributes to a schema for the corresponding event type.
So far so good. There is more work to do
Such a structured event summary can significantly improve the relevance of the displayed results by providing key information about the event to the user without any extra clicks. In the future, we would like to generalize this approach to other events on Twitter. Which means, some more work in the days ahead
- As told to Jasmine Kohli