0 实验内容

根据一项调查，每年约有2.18亿人受到自然和人为灾害的影响，每年约有68000人失去生活。地震、火山等自然灾害的频率基本保持不变，但在此期间恐怖活动的数量有所增加。

本实验的目的是探索世界各地的恐怖事件。我们将探讨恐怖主义的趋势、恐怖主义多发地区等。

1 开发准备

1.1 数据集准备

数据集有1个，名字叫globalterrorismdb_0617dist.csv。原数据集记录了1970.1.1 2017.1.27之间的恐怖主义事件，由于原数据集太大，本实验的数据集截取了2016.1.1 2017.1.27之间的恐怖主义事件。

1.2 数据集解释

数据集存储的是恐怖主义事件的数据，包含恐怖事件发生的时间、国家、地区、攻击类型、目标人群、死亡人数、受伤人数、动机、城市、经度、维度等共135个字段。

由于数据集有135个字段，无法展示。后面的步骤会从数据集中筛选重要的字段，并展示数据集的前5行数据，同学们可以通过它了解数据集。

1.3 导入包和数据集

导入包和数据集，代码如下：

# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.patches as mpatches

# import dataset
dataset = pd.read_csv('globalterrorismdb_0617dist.csv',encoding='ISO-8859-1')

数据集共有135个字段，有的字段我们根本不感兴趣。所以对自己感兴趣的字段重命名，然后筛选出自己感兴趣的17个字段。

字段的含义如下：

‘Year’: 事件发生的年份
‘Month’: 事件发生的月份
‘Day’: 事件发生的日子
‘Country’: 国家
‘Region’: 地区
‘AttackType’: 攻击类型（如炸弹袭击）
‘Target’: 目标人群（如平民）
‘Killed’: 死亡人数
‘Wounded’: 受伤人数
‘Summary’: 事件概述
‘Group’: 恐怖组织（如塔利班、ISIL）
‘TargetType’: 目标人群类型
‘WeaponType’: 武器类型
‘Motive’: 攻击动机
‘City’: 城市
‘Latitude’: 纬度
'Longitude: 经度

筛选出感兴趣的字段代码如下：

# selected subset of dataset
dataset.rename(columns={'iyear':'Year',
                        'imonth':'Month',
                        'iday':'Day',
                        'country_txt':'Country',
                        'region_txt':'Region',
                        'attacktype1_txt':'AttackType',
                        'target1':'Target',
                        'nkill':'Killed',
                        'nwound':'Wounded',
                        'summary':'Summary',
                        'gname':'Group',
                        'targtype1_txt':'TargetType',
                        'weaptype1_txt':'WeaponType',
                        'motive':'Motive',
                        'city':'City',
                        'latitude':'Latitude',
                        'longitude':'Longitude',},inplace=True)
dataset = dataset[['Year','Month','Day','Country','Region','City','Latitude',
                   'Longitude','AttackType','Killed','Wounded','Target',
                   'Summary','Group','TargetType','WeaponType','Motive']] # 17 columns

在恐怖主义事件中，经常关注死亡多少人、受伤多少人、伤亡多少人。所以新建一个字段Casualties代表伤亡人数，它是死亡人数+受伤人数。代码如下：

1	dataset['Casualties'] = dataset['Killed'] + dataset['Wounded']

显示数据集的前5行数据，对数据集有一个初步的认识，代码如下：

1	dataset.head(5)

截止到现在，同学们可以理解字段所代表的含义。

查看2016.1.1~2017.1.27阶段有哪些恐怖组织，代码如下：

1	dataset['Group'].unique()

我们对Taliban（塔利班）比较熟悉，新闻上经常讲。

查看2016.1.1~2017.1.27有几起恐怖主义事件，代码如下：

1	dataset.shape[0]

共有13490起事件，平均每天34.41起。

统计一下字段缺数值，代码如下：

1
2
3

# take care of missing data
missing_data_df = dataset.isnull().sum()
missing_data_df

由上图可见，动机这个字段大部分是缺失的；有1237起事件，要么死亡人数未知，要么受伤人数未知。

2 探索数据集

探索发生恐怖事件数量最多的国家，代码如下：

# statistics by country/region
stats_with_country = dataset['Country'].value_counts() # count group by 
stats_with_country = stats_with_country.reset_index()
stats_with_country.rename(columns={'index':'Country', 'Country':'Count'}, inplace = True)
print('%s has most terrorism counts. The number is %d'  %(stats_with_country.iloc[0,0], stats_with_country.iloc[0,1]))

我们发现伊拉克是发生恐怖袭击次数最多的国家。

探索发生恐怖事件数量最多的地区，代码如下：

stats_with_region = dataset['Region'].value_counts()
stats_with_region = stats_with_region.reset_index()
stats_with_region.rename(columns={'index':'Region', 'Region':'Count'}, inplace = True)
print('%s has most terrorism counts. The number is %d'  %(stats_with_region.iloc[0,0], stats_with_region.iloc[0,1]))

我们发现中东&北非地区是发生恐怖袭击次数最多的地区。

伊拉克是发生恐怖袭击次数最多的国家。哪个国家是死亡人数最多的国家？代码如下：

stats_with_killed = dataset.groupby(['Country'])[['Killed']].sum()
stats_with_killed = stats_with_killed.reset_index()
stats_with_killed = stats_with_killed.sort_values(['Killed'], ascending=[False])
print('%s killed most people. The killed people number is %d'  %(stats_with_killed.iloc[0,0], stats_with_killed.iloc[0,1]))

由上图可知，死亡人数最多的国家也是伊拉克。

在发生的13490次恐怖事件中，哪次事件杀死了最多的人？代码如下：

max_killed_index = dataset.loc[dataset['Killed'].idxmax()]
max_killed_year = max_killed_index.Year
max_killed_country = max_killed_index.Country
max_killed_number = max_killed_index.Killed
print('In all terrorism events, this event killed most people. In %s, %s killed %d'  %(max_killed_year, max_killed_country, max_killed_number))

由上图可知，2016年叙利亚恐怖主义事件死亡了433人，是最惨的一次事件。

身在中国，我们很关心自己国家的状况。代码如下：

1 2	china_killed_people = stats_with_killed.query('Country == "China"').iloc[0,1] print('In China, %d people are killed during all the terrorism events.' %(china_killed_people))

由上图可知，虽然中国也有恐怖主义事件，但是死亡人数非常少。

3 可视化数据集

可视化发生恐怖主义事件次数最多的10个国家，代码如下：

# visualize them (top 10)
plt.figure()
plt.title('Country vs Count')
plt.xticks(rotation = 90)
ax = sns.barplot(x = 'Country', y = 'Count', data = stats_with_country.iloc[0:10, :])

图37 执行结果

由上图可知，伊拉克排第一，叙利亚排第十。

可视化发生恐怖主义事件次数最多的10个地区，代码如下：

plt.figure()
plt.title('Region vs Count')
plt.xticks(rotation = 90)
ax = sns.barplot(x = 'Region', y = 'Count', data = stats_with_region.iloc[0:10, :])

图38 执行结果

由上图可知，中东&北非地区最乱。

可视化发生恐怖主义事件死亡人数最多的10个国家，代码如下：

plt.figure()
plt.title('Country vs Killed')
plt.xticks(rotation = 90)
ax = sns.barplot(x = 'Country', y = 'Killed', data = stats_with_killed.iloc[0:10, :])

图39 执行结果

由上图可知，伊拉克死亡人数最高。

同学们可能会说这是美国的错。这么说不严谨。建议同学们使用1970年开始的恐怖主义数据，然后为伊拉克的恐怖主义事件数量画一条折线图。如果2003年以前数量很少，2003年开始数量暴增，这才能说明和美国有关系。

可视化恐怖主义类型排名，代码如下：

# statistics by attack type
plt.figure()
plt.title('Attacy Methods by Terrorists')
plt.xticks(rotation = 90)
sns.countplot(x = 'AttackType', data = dataset, order = dataset['AttackType'].value_counts().index)

图40 执行结果

由上图可知，炸弹袭击最普遍，可能这样造成的伤亡大。

可视化目标人群类型排名，代码如下：

# statistics by target type
plt.figure()
plt.title('Favorite Targets')
plt.xticks(rotation = 90)
sns.countplot(x = 'TargetType', data = dataset, order = dataset['TargetType'].value_counts().index)