Friday, October 6, 2017

Using XML file to restore posts to Blogger - by Python ElementTree API

Using XML file to restore posts to Blogger - by Python ElementTree API

Backup/Restore function of Blogger

The Blogger backup file is a xml file.
Please refer to Blogger Developer’s Guide for more information. The following is an example of a feed for a blog with only one post. In particular, a real Blogger feed contains actual IDs and URLs.
<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css"
  type="text/css"?>
<feed xmlns='http://www.w3.org/2005/Atom'
    xmlns:gd='http://schemas.google.com/g/2005'
    gd:etag='W/"D08FQn8-eip7ImA9WxZbFEw."'>
  <id>tag:blogger.com,1999:blog-blogID</id>
  <updated>2008-04-17T00:03:33.152-07:00</updated>
  <title>Lizzy's Diary</title>
  <subtitle type='html'></subtitle>
  <link rel='http://schemas.google.com/g/2005#feed'
    type='application/atom+xml'
    href='http://blogName.blogspot.com/feeds/posts/default' />
  <link rel='self' type='application/atom+xml'
    href='http://www.blogger.com/feeds/blogID/posts/default' />
  <link rel='alternate' type='text/html'
    href='http://blogName.blogspot.com/' />
  <author>
    <name>Elizabeth Bennet</name>
    <uri>http://www.blogger.com/profile/profileID</uri>
    <email>noreply@blogger.com</email>
  </author>
  <generator version='7.00'
    uri='http://www2.blogger.com'>Blogger</generator>
  <entry gd:etag='W/"D0YHRn84eip7ImA9WxZUFk8."'>
    <id>tag:blogger.com,1999:blog-blogID.post-postID</id>
    <published>2008-04-07T20:25:00.005-07:00</published>
    <updated>2008-04-07T20:25:37.132-07:00</updated>
    <title>Quite disagreeable</title>
    <content type='html'>&lt;p&gt;I met Mr. Bingley's friend Mr. Darcy
      this evening. I found him quite disagreeable.&lt;/p&gt;</content>
    <link rel='edit' type='application/atom+xml'
      href='http://www.blogger.com/feeds/blogID/posts/default/postID' />
    <link rel='self' type='application/atom+xml'
      href='http://www.blogger.com/feeds/blogID/posts/default/postID' />
    <link rel='alternate' type='text/html'
      href='http://blogName.blogspot.com/2008/04/quite-disagreeable.html' />
    <author>
      <name>Elizabeth Bennet</name>
      <uri>http://www.blogger.com/profile/profileID</uri>
      <email>noreply@blogger.com</email>
    </author>
  </entry>
</feed>
Reference: Blogger APIs Client Library for Python

Using ElementTree to parse and insert posts to backup xml

ElementTree is a Python API for parsing and creating XML data. To utilize it, just include the following line in the program.
from lxml import etree
or
from lxml import etree as ET

Loading an xml file as an template

Takes an xml file as input. Outputs ElementTree and element.
def load_xml_template(self, name):
    parser = ET.XMLParser(encoding='utf-8')
    tree = ET.parse(name, parser)
    root = tree.getroot()
    return tree, root

Output to xml file using ‘Find’ function in ElementTree

def output_to_xml(self, post_list):
  # Change and write the new xml
  tree, root = self.load_xml_template('template.xml')

  entry = root.find(self.prepend_ns('entry'))

  entry.find(self.prepend_ns('id')).text        = post_list[0][1]
  entry.find(self.prepend_ns('published')).text = post_list[0][3]
  entry.find(self.prepend_ns('updated')).text   = post_list[0][3]
  entry.find(self.prepend_ns('title')).text     = post_list[0][2]
  entry.find(self.prepend_ns('content')).text   = post_list[0][4]

  # Ignore the first one
  for post in post_list[1:]:
    entry2 = copy.deepcopy(entry)
    entry2.find(self.prepend_ns('id')).text        = post[1]
    entry2.find(self.prepend_ns('published')).text = post[3]
    entry2.find(self.prepend_ns('updated')).text   = post[3]
    entry2.find(self.prepend_ns('title')).text     = post[2]
    entry2.find(self.prepend_ns('content')).text   = post[4]
    root.append(entry2)

  global xml_filename
  tree.write(xml_filename, encoding='utf-8', xml_declaration=True)

  self.log('Saved file %s' % xml_filename)

Tags with Namespace declared in ElementTree

Since the tags searching for are declared within a namespace, hence: “http://www.w3.org/2005/Atom” , we have to specify that namespace when searching for those tags. In order to simply the process, a function prepend_ns()is created.
def prepend_ns(self, s):
    return '{http://www.w3.org/2005/Atom}' + s

Using ISO datetime

Blogger uses ISO datetime format. Here is the transformation function.
def iso_datetime(datetime_string):
  real_date  = datetime.strptime(datetime_string, '%d, %b %Y %H:%M')
  # ISO8601 '2017-14-07T20:25:00.005-07:00'
  iso_date = real_date.strftime('%Y-%d-%mT%H:%M:%S.000-08:00')
  return real_date, iso_date

Wednesday, September 20, 2017

Backup articles from "tian.yam.com" using Python Scrapy framework as a crawler

Backup articles from "tian.yam.com" using Python Scrapy framework as a crawler

Preface

One of my friends needed to backup all articles from a blog site called “tian.yam.com” (天空部落格). She might move her blog to another platform such as Blogger or self-built Wordpress site. Unfortunately, it seemed that there is no tools provided by “tian.yam.com” to backup all articles for her. ( Maybe I don’t know. ) I started to research how to backup all articles using any feasible way.

Scrapy

I have found a perfect tool for this kind of job: Scrapy
Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
I utilized Scrapy as a Python crawler to get the contents from the blog and save them to the local disk storage.

Installation

To install Scrapy using conda, run:
conda install -c conda-forge scrapy 
( I use conda for installation since I have installed Anaconda )
or
pip install Scrapy

Start A Project

It is easy to start to learn from the tutorial.
We can start a project by typing the command:
scrapy startproject blog
Then it will create the whole blog directory:
blog/
    scrapy.cfg   # deploy configuration file
    blog/        # project's Python module, you'll import your code from here
        __init__.py
        items.py      # project items definition file
        pipelines.py  # project pipelines file
        settings.py   # project settings file
        spiders/      # a directory where you'll later put your spiders
            __init__.py

Write The Spider

Then I wrote a Pyhton program called blog.py under the directory blog/spiders.
A class ‘BlogCrawler’ is declared. The functions ‘start_requests’ and ‘parse’ were built-in. All we need to do is to add code in those functions.
In addition, Beautiful Soup is a powerful Python library used for parsing documents in HTML and XML format.
import scrapy
from bs4 import BeautifulSoup
from datetime import datetime
import requests
import shutil
import os

class BlogCrawler(scrapy.Spider):
  name = 'blog'

  global mt_filename, post_list
  mt_filename = 'duduh_blog.txt'
  post_list = []

  # delete the mt file if exist.
  if os.path.exists(mt_filename):
    os.remove(mt_filename)

  def start_requests(self):
    # Get every post from page 1 to 34
    #
    post_url = 'https://duduh.tian.yam.com/posts?page='
    urls = []
    for i in range(1,35):
      urls.append(post_url + str(i))

    # parse each post in every page
    for url in urls:
      yield scrapy.Request(url=url, callback=self.parse)

  def closed(self, reason):
    global post_list
    post_list.sort(reverse=True)
    # print(post_list)
    # print(len(post_list))
    self.output_to_mt(post_list)

  def parse(self, response):
    global post_list
    res = BeautifulSoup(response.body, "html.parser")
    titles = res.select('.post-content-block')

    # Iterating the titile in the titles
    for title in titles:
      link = title.select('a')[0]['href']
      post_date = title.select('.post-date')[0].text
      yield scrapy.Request(link, self.parse_detail)

Parse Multiple Layers of Web Page

If there are multiple layers of web page need to be parsed, a scrapy.Request can be called within the parse function. In our code, parse_detail is called by the parse function.
def parse_detail(self, response):
    global post_list

    def get_and_save_img(post_image_url):
      res = requests.get(post_image_url, stream = True)
      directory = os.getcwd() + '/images/'
      if not os.path.exists(directory):
        os.makedirs(directory)

      if (res.status_code == 200) & (post_image_url.split('.')[-1] == 'jpg'):
        filename = post_image_url.split('/')[-1]
        filepath = directory + filename
        f = open(filepath, 'wb')
        res.raw.decode_content = True
        shutil.copyfileobj(res.raw, f)
        f.close
        del res

    def convert_datetime(datetime_string):
      # code example:
      # d = datetime.strptime('2007-07-18 10:03:19', '%Y-%m-%d %H:%M:%S')
      # day_string = d.strftime('%Y-%m-%d')
      #
      # Now, date input format example:
      #   31, Jul 2014 15:19
      # This should be converted in the format MM/DD/YYYY hh:mm:ss AM|PM.
      # The AM|PM is optional.
      #
      # ** using %b for month name.
      #
      real_date  = datetime.strptime(datetime_string, '%d, %b %Y %H:%M')
      mt_date = real_date.strftime('%m/%d/%Y %H:%M:%S')
      return real_date, mt_date

    res = BeautifulSoup(response.body, "lxml")
    detail = res.select('.post-content-block')

    post_title   = detail[0].select('h3')[0].text
    real_date, post_date    = convert_datetime(detail[0].select('.post-date')[0].text)
    post_content = detail[0].select('.post-content')[0].extract()

    post_images  = detail[0].select('img')
    ## Get and save images
    for post_image in post_images:
      get_and_save_img(post_image['src'])
    # Save the results in the global list 'post_list' 
    post_list.append([real_date, post_title, post_date, post_content])

The Close Function

After all the iterations, all data has been retrieved and save to the global list post_list. It’s time to save to file. In Scrapy framework, there is a built-in function called closed to utilized when all works have been done.
def closed(self, reason):
  global post_list
  post_list.sort(reverse=True)
  self.output_to_mt(post_list)

Save Backup File in MT (Movable Type Import / Export) Format

The Movable Type Import / Export format document is here.
An example is as follows:
TITLE: A dummy title
BASENAME: a-dummy-title
AUTHOR: Foo Bar
DATE: 01/31/2002 03:31:05 PM
PRIMARY CATEGORY: Media
CATEGORY: News
—– (—–\n)
BODY:
This is the body.
Another paragraph here.
Another paragraph here.
——
EXTENDED BODY:
Here is some more text.
Another paragraph here.
Another paragraph here.
—–
COMMENT:
AUTHOR: Foo
DATE: 01/31/2002 15:47:06
This is
the body of this comment.
—–
COMMENT:
AUTHOR: Bar
DATE: 02/01/2002 04:02:07 AM
IP: 205.66.1.32
EMAIL: me@bar.com
This is the body of
another comment. It goes
up to here.
—–
PING:
TITLE: My Entry
URL: http://www.foo.com/old/2002/08/
IP: 206.22.1.53
BLOG NAME: My Weblog
DATE: 08/05/2002 16:09:12
This is the start of my
entry, and here it…
—– (—–\n)
——– (——–\n)
The code is here:
def output_to_mt(self, post_list):
  global mt_filename
  for post in post_list:
    mt  = 'TITLE: ' + post[1] + '\n'
    mt  = mt + 'AUTHOR: duduh' + '\n'
    mt += 'DATE: '
    mt  = mt + post[2] + '\n'
    mt += '-----\n'
    mt  = mt + 'BODY:' + '\n'
    mt  = mt + str(post[3]) + '\n'
    mt += '-----\n'
    mt += '--------\n'
    with open(mt_filename, 'a+') as f:
      f.write(mt)
    self.log('Saved file %s' % mt_filename)

Run The Spider

The command to run the spider is:
scrapy crawl blog
The we get the result file. Done!

Tuesday, August 1, 2017

BikeOrderForm

BikeOrderForm

Here is my recent work: (other works)

BikeOrderForm

BikeOrderFrom is a site which demos the major functionality from an on-line ordering page of a bike website.
It consists of an ordering from for choosing models as well as color and size accrodingly.
The color and size attributes are different from model to model. For example, model A might have blue, red and yellow in colors, model B might have blue only. There is no rule for assigning colors and size to each model. This system builds up a form to setup the color and size configurations for each model. Thus, users can find the correct color and size to choose for a certain model.
enter image description here
This demo site utilizes techniques as follows:
The demo site: https://bikeorderform.herokuapp.com/
The code in github: https://github.com/chaoyee/bikeorderform
My blog for coding: http://charles4code.blogspot.tw/

Monday, July 17, 2017

Spree Commerce Customization

Spree Commerce Customization

Spree Commerce Customization

  1. Create an image with 116x50 pixels named “spree_50.png” and put it at app/assets/images/logo/.

  2. Using a new file to overwrite the original one.

    • Create a new file app/models/spree/app_configuration.rb.

    • copy the content of
      (user)/.rvm/gems/ruby02.X.X/gems/spree_core/app/model/spree/app_configuration.rb“`

    • Modify the default preference:

      preference :logo, :string, default: 'logo/spree_50.png'

      change spree_50.pngto xxx.png

  3. Decorator design pattern

    Add a decorative file:
    app/models/spree/app_configuration_decorator.rb

    then, add xxx.png as your logo image.

Spree::AppConfiguration.class_eval do
  preference :logo, :string, default: 'logo/xxx.png'
end

The decorator design pattern is a better solution for changing the default logo for Spree site.

Deployment

bundle exec rake railties:install:migrations
bundle exec rake db:migrate
bundle exec rake db:seed
bundle exec rake spree_sample:load

Demo Site

The demo site:
- Demoshop
- Admin page, login as: admin@test.com/spree123

The code base is here

Monday, July 10, 2017

Upgrade Rails From 4.2 to 5.0

Upgrade Rails From 4.2 to 5.0

Please see the reference: Upgrading from Rails 4.2 to Rails 5.0

Ruby version > 2.2.2

  $ rvm get stable
  $ rvm install 5.1.2
  $ rvm --default use 5.1.2 (use 5.1.2 as default and current version)
  $ rvm list

Modify Gemfile

  gem 'rails', '4.2.5.1' -> '5.1.2'
  gem 'coffee-rails'     -> # , '~> 4.1.0' 

Active Record Models Now Inherit from ApplicationRecord by Default.

Create an application_record.rb file in  ```app/models/``` and add the following content:
    class ApplicationRecord < ActiveRecord::Base
      self.abstract_class = true
    end

Then modify all model as:

    class Post < ApplicationRecord
                :
    end

Comment the following line in config/application.rb

#config.active_record.raise_in_transactional_callbacks = true

Change ‘for’ to ‘permit’ in ‘controllers/application_controller.rb’ (if gem devise is used)

class ApplicationController < ActionController::Base
  :
  :
  protected

  def configure_permitted_parameters
    devise_parameter_sanitizer.permit(:sign_up) { |u| u.permit(:name, :email, :password, :password_confirmation)}
    devise_parameter_sanitizer.permit(:account_update) { |u| u.permit(:name, :email, :password, :password_confirmation, :current_password) }
  end
end

Update Database Migration File

It is necessary to add version information after version information in database migration file after Rails 5. For example, [5.0]is added in the following example:

class CreateOrders < ActiveRecord::Migration[5.0]
  def change
    create_table :orders do |t|
      t.string :po_number
      t.date :shipment_require_date
      t.date :order_date
      t.string :ship_to
      t.text :reference

      t.timestamps
    end
  end
end

Or, we can add a bunch of code to test which version of Rails and add the information to the end of class declaration. Likewise:

migration_superclass = if ActiveRecord::VERSION::MAJOR >= 5
  ActiveRecord::Migration["#{ActiveRecord::VERSION::MAJOR}.#{ActiveRecord::VERSION::MINOR}"]
else
  ActiveRecord::Migration
end

And also change ActiveRecord::Migration to migration_superclass.

class CreateOrders < migration_superclass

Update gem spring

System shows warning message while execute bundle install:

Array values in the parameter to `Gem.paths=` are deprecated.
Please use a String or nil.

The issue has been posted here.

The solution is to update gem spring:

bundle update spring && bundle exec spring binstub --remove --all && bundle exec spring binstub --all

Git Reset to the Certain Commit

Git Reset to the Certain Commit

“git revert”

If I need to delete the change after the last commit, however, it has been committed and pushed to the remote repository. How can I do ?

If you just want to revert to the last commit, the command git revert can recover the change.

git revert HEAD

But in this way, the log would record the revert history in the log as follows:

$ git log
commit 7bcf5e3b6fc47e875ec226ce2b13a53df73cf626
Author: yourname <yourname@yourmail.com>
Date:   Wed Jul 8 15:46:28 2017 +0900

    Revert "a certain change"

    This reverts commit 0d4a808c26908cd5fe4b6294a00150342d1a58be.

commit 0d4a808c26908cd5fe4b6294a00150342d1a58be
Author: yourname <yourname@yourmail.com>
Date:   Mon Jul 6 23:19:26 2017 +0900

    a certain change

Back to a certain commit

If I need to move back to a certain commit ( for example 4 commits ), I can do as follows:

First, git reset to the commit with its SHA value. It would move back to that commit. For example:

git reset a8b5a0afea1e1f5faccda4a698c0002bdcc7bf892

The commit is back to that point, but not the content. The commandgit status would let yo know the files changed during this period of time.

And then:

git checkout -f

The contents will be moved back to the status of that commit.

Finally,

git reset origin/master

Point the commit to the latest position and push. The remote repository will move back to that commit and the change history (commit log) remains.

Tuesday, May 9, 2017

How to Install OpenCV 3 for Python3 through Conda in OSX?

How to Install OpenCV 3 for Python3 through Conda in OSX?

System using Python 3.

$ python --version
Python 3.6.1 :: Anaconda custom (x86_64)

While typing the following command:

$ anaconda show menpo/opencv3

It shows:

Using Anaconda API: https://api.anaconda.org
Name:    opencv3
Summary:
Access:  public
Package Types:  conda
Versions:
   + 3.1.0
   + 3.2.0

To install this package with conda run:
     conda install --channel https://conda.anaconda.org/menpo opencv3

So, just type the last line of command above:

$ conda install --channel https://conda.anaconda.org/menpo opencv3

or

$ conda install -c menpo opencv3=3.2.0

(from https://anaconda.org/menpo/opencv3)

Done!

Tuesday, April 11, 2017

台北市住宅竊盜件數統計分析洞察報告

台北市住宅竊盜件數統計分析洞察報告

台北市住宅竊盜件數統計分析洞察報告

104年01月-106年02月

資料來源:臺北市政府警察局刑事警察大隊

資料最後更新時間:2017-03-13 19:56:55

資料連結

資料集內容描述

該資料集為臺北市政府警察局刑事警察大隊所做之台北市住宅竊盜件數原始資料。 大部分有效資料集中於 104年01月-106年02月。 內容中有「發生日期」、「發生時段」與「發生地點」三項主要項目。
本報告欲找出那一個行政區的住宅竊盜最嚴重。 以及那個時段發生的住宅竊盜案最多及最少。 以下先進行資料預處理。
In [1]:
# 首先先將之前已經整理過的資料(df_time1.csv檔)讀入
#
import pandas as pd
df = pd.read_csv("台北市10401-10602住宅竊盜點位資訊.csv", encoding='big5')
print('筆數', df['編號'].count())
df.head()
筆數 1293
Out[1]:
編號 案類 發生日期 發生時段 發生地點
0 1 住宅竊盜 970609 01~03 台北市信義區松隆里松山路615巷1 ~ 30號
1 2 住宅竊盜 991013 22~24 台北市內湖區成功路4段331 ~ 360號
2 3 住宅竊盜 991024 13~15 台北市南港區東新里興南街52巷1 ~ 30號
3 4 住宅竊盜 1010601 07~09 台北市大安區誠安里忠孝東路3段251巷10弄1 ~ 30號
4 5 住宅竊盜 1010606 10~12 台北市中山區通北街65巷2弄1 ~ 30號
In [2]:
# 資料預處理
#
del df['案類']
df.head()
Out[2]:
編號 發生日期 發生時段 發生地點
0 1 970609 01~03 台北市信義區松隆里松山路615巷1 ~ 30號
1 2 991013 22~24 台北市內湖區成功路4段331 ~ 360號
2 3 991024 13~15 台北市南港區東新里興南街52巷1 ~ 30號
3 4 1010601 07~09 台北市大安區誠安里忠孝東路3段251巷10弄1 ~ 30號
4 5 1010606 10~12 台北市中山區通北街65巷2弄1 ~ 30號
In [3]:
# 我們可以利用'發生地點'產生'發生地點行政區',藉此以行政區為基準做出統計資料。
# 以 df['發生地點']中字串之第4~6個字,作為 df['發生地點行政區']之內容。
#
for i in range(len(df)):
    df.loc[i,'發生地點行政區'] = df['發生地點'][i][3:6] 
del df['發生地點']           # 去除 '發生地點', 欄位   
df.head()
Out[3]:
編號 發生日期 發生時段 發生地點行政區
0 1 970609 01~03 信義區
1 2 991013 22~24 內湖區
2 3 991024 13~15 南港區
3 4 1010601 07~09 大安區
4 5 1010606 10~12 中山區

分析⽅法

以下進行統計分析,分別進行:

  • 各行政區住宅竊盜件數總和及統計
  • 不同時段發生的住宅竊盜件數

A.各行政區住宅竊盜件數總和及統計

In [4]:
# A.各行政區住宅竊盜件數總和及統計
#
df3 = df.groupby('發生地點行政區').count()
del df3['發生日期'],df3['發生時段']
df3 = df3.rename(columns = {'編號':'件數'})
df3 = df3.sort_values('件數', ascending=False)
df3['比率'] = round(df3['件數']/df3['件數'].sum()*100, 2)
df3
Out[4]:
件數 比率
發生地點行政區
中山區 177 13.69
士林區 140 10.83
內湖區 130 10.05
北投區 129 9.98
萬華區 124 9.59
大安區 113 8.74
中正區 96 7.42
松山區 96 7.42
文山區 92 7.12
信義區 82 6.34
大同區 59 4.56
南港區 55 4.25
In [5]:
# 平均、最大、最小
#
print('最多件數', df3['件數'].max())
print('最少件數', df3['件數'].min())
print('平均件數', df3['件數'].mean())
最多件數 177
最少件數 55
平均件數 107.75
In [6]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['font.family'] = 'SimHei'

df3.plot(kind='bar', title='台北市住宅竊盜件數長條圖 (97年06月09日 ~ 106年2月27日)', fontsize=15, figsize=(15,5))
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x111077e10>
In [7]:
df3['件數'].plot(kind='pie', title='台北市住宅竊盜件數圓餅圖 (97年06月09日 ~ 106年2月27日)', autopct='%1.1f%%', startangle=270, fontsize=14, figsize=(10,10))
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x1136d1748>

B.不同時段發生的住宅竊盜件數

In [8]:
# B. 得出在 97年06月09日 ~ 106年2月27日之間,不同時段發生的住宅竊盜件數
#    df_ti (time inmterval)
#
df_ti = df.groupby('發生時段').count()
del df_ti['發生日期'],df_ti['發生地點行政區']
df_ti = df_ti.rename(columns = {'編號':'件數'})
df_ti['比率'] = round(df_ti['件數']/df_ti['件數'].sum()*100, 2)
df_ti
Out[8]:
件數 比率
發生時段
01~03 123 9.51
04~06 98 7.58
07~09 166 12.84
10~12 222 17.17
13~15 179 13.84
16~18 160 12.37
19~21 176 13.61
22~24 169 13.07
In [9]:
# 長條圖
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['font.family'] = 'SimHei'

df_ti.sort_values(by='件數', ascending=False).plot(kind='bar', title='台北市 97年06月09日 ~ 106年2月27日之間,不同時段發生的住宅竊盜件數', figsize=(10,5), fontsize=19)
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x113b9b940>
In [10]:
# 樞紐分析表( '發生時段' vs '發生地點行政區')
#
df_pt2 = df.pivot_table(values='編號',index='發生時段', columns='發生地點行政區', aggfunc='count',fill_value='0')
print('台北市 97年06月09日 ~ 106年2月27日之間,各行政區相對於發生時段的住宅竊盜件數')
df_pt2
台北市 97年06月09日 ~ 106年2月27日之間,各行政區相對於發生時段的住宅竊盜件數
Out[10]:
發生地點行政區 中山區 中正區 信義區 內湖區 北投區 南港區 士林區 大同區 大安區 文山區 松山區 萬華區
發生時段
01~03 22 11 7 8 14 3 16 7 8 9 6 12
04~06 14 7 10 14 12 3 6 6 5 7 3 11
07~09 24 13 8 18 10 10 14 9 16 11 16 17
10~12 22 17 10 21 24 10 19 8 27 16 28 20
13~15 17 9 15 21 23 5 22 12 17 13 10 15
16~18 16 11 5 16 24 6 23 5 15 11 8 20
19~21 26 14 13 17 9 10 24 7 10 10 16 20
22~24 36 14 14 15 13 8 16 5 15 15 9 9
In [11]:
df_pt2.plot(kind='area', title='台北市 97年06月09日 ~ 106年2月27日之間,各行政區相對於發生時段的住宅竊盜件數 (堆疊面積圖)', figsize=(15,8), fontsize=17)
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x111077da0>

分析洞察結果

A. 各行政區住宅竊盜件數總和及統計

在這段期間,最多住宅竊盜件數的行政區是中山區,共177件,佔總件數的13.7%。最少件數住宅竊盜件數的行政區是南港區,只有55件,佔總件數的4.25%。每個行政區平均件數為107.75件。最多件數的行政區比最少件數的行政區,件數為 3.22倍 (177/55),或許是中山區有比較多的特種行業有關,不過這個推論還需要相關數據以資佐證。
最少件數的行政區為南港區,共55件,佔總件數的4.25%,與59件(4.56%)的大同區同樣都是住宅比例較多,較為單純的行政區。同樣的,也是需要相關數據來證實。

B. 不同時段發生的住宅竊盜件數

在這段期間,10~12這段期間,共發生222件為最多,佔總件數的17.17%。而最少發生件數的時段是04~06這段時間,只有98件,佔總件數的7.58%。這讓我們知道,住宅最容易被竊的時段是早上上班上學以後10~12時的時段,與我們一般的瞭解,凌晨時分才是宵小猖獗時間的認知恰好相反。凌晨時分(01~03, 04~06)的兩個時段,件數比率都低於總件數的10%。
另外,由樞紐分析表及堆疊面積圖得知,各行政區的趨勢大致上相同。
結論就是:畢竟宵小可能還是覺得早上大家都上班上學後,家中空無一人的時候才是最好下手的時段。而小偷也是人,凌晨時分也是需要休息的。
In [ ]: