Friday, October 6, 2017

Using XML file to restore posts to Blogger - by Python ElementTree API

Using XML file to restore posts to Blogger - by Python ElementTree API

Backup/Restore function of Blogger

The Blogger backup file is a xml file.
Please refer to Blogger Developer’s Guide for more information. The following is an example of a feed for a blog with only one post. In particular, a real Blogger feed contains actual IDs and URLs.
<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css"
  type="text/css"?>
<feed xmlns='http://www.w3.org/2005/Atom'
    xmlns:gd='http://schemas.google.com/g/2005'
    gd:etag='W/"D08FQn8-eip7ImA9WxZbFEw."'>
  <id>tag:blogger.com,1999:blog-blogID</id>
  <updated>2008-04-17T00:03:33.152-07:00</updated>
  <title>Lizzy's Diary</title>
  <subtitle type='html'></subtitle>
  <link rel='http://schemas.google.com/g/2005#feed'
    type='application/atom+xml'
    href='http://blogName.blogspot.com/feeds/posts/default' />
  <link rel='self' type='application/atom+xml'
    href='http://www.blogger.com/feeds/blogID/posts/default' />
  <link rel='alternate' type='text/html'
    href='http://blogName.blogspot.com/' />
  <author>
    <name>Elizabeth Bennet</name>
    <uri>http://www.blogger.com/profile/profileID</uri>
    <email>noreply@blogger.com</email>
  </author>
  <generator version='7.00'
    uri='http://www2.blogger.com'>Blogger</generator>
  <entry gd:etag='W/"D0YHRn84eip7ImA9WxZUFk8."'>
    <id>tag:blogger.com,1999:blog-blogID.post-postID</id>
    <published>2008-04-07T20:25:00.005-07:00</published>
    <updated>2008-04-07T20:25:37.132-07:00</updated>
    <title>Quite disagreeable</title>
    <content type='html'>&lt;p&gt;I met Mr. Bingley's friend Mr. Darcy
      this evening. I found him quite disagreeable.&lt;/p&gt;</content>
    <link rel='edit' type='application/atom+xml'
      href='http://www.blogger.com/feeds/blogID/posts/default/postID' />
    <link rel='self' type='application/atom+xml'
      href='http://www.blogger.com/feeds/blogID/posts/default/postID' />
    <link rel='alternate' type='text/html'
      href='http://blogName.blogspot.com/2008/04/quite-disagreeable.html' />
    <author>
      <name>Elizabeth Bennet</name>
      <uri>http://www.blogger.com/profile/profileID</uri>
      <email>noreply@blogger.com</email>
    </author>
  </entry>
</feed>
Reference: Blogger APIs Client Library for Python

Using ElementTree to parse and insert posts to backup xml

ElementTree is a Python API for parsing and creating XML data. To utilize it, just include the following line in the program.
from lxml import etree
or
from lxml import etree as ET

Loading an xml file as an template

Takes an xml file as input. Outputs ElementTree and element.
def load_xml_template(self, name):
    parser = ET.XMLParser(encoding='utf-8')
    tree = ET.parse(name, parser)
    root = tree.getroot()
    return tree, root

Output to xml file using ‘Find’ function in ElementTree

def output_to_xml(self, post_list):
  # Change and write the new xml
  tree, root = self.load_xml_template('template.xml')

  entry = root.find(self.prepend_ns('entry'))

  entry.find(self.prepend_ns('id')).text        = post_list[0][1]
  entry.find(self.prepend_ns('published')).text = post_list[0][3]
  entry.find(self.prepend_ns('updated')).text   = post_list[0][3]
  entry.find(self.prepend_ns('title')).text     = post_list[0][2]
  entry.find(self.prepend_ns('content')).text   = post_list[0][4]

  # Ignore the first one
  for post in post_list[1:]:
    entry2 = copy.deepcopy(entry)
    entry2.find(self.prepend_ns('id')).text        = post[1]
    entry2.find(self.prepend_ns('published')).text = post[3]
    entry2.find(self.prepend_ns('updated')).text   = post[3]
    entry2.find(self.prepend_ns('title')).text     = post[2]
    entry2.find(self.prepend_ns('content')).text   = post[4]
    root.append(entry2)

  global xml_filename
  tree.write(xml_filename, encoding='utf-8', xml_declaration=True)

  self.log('Saved file %s' % xml_filename)

Tags with Namespace declared in ElementTree

Since the tags searching for are declared within a namespace, hence: “http://www.w3.org/2005/Atom” , we have to specify that namespace when searching for those tags. In order to simply the process, a function prepend_ns()is created.
def prepend_ns(self, s):
    return '{http://www.w3.org/2005/Atom}' + s

Using ISO datetime

Blogger uses ISO datetime format. Here is the transformation function.
def iso_datetime(datetime_string):
  real_date  = datetime.strptime(datetime_string, '%d, %b %Y %H:%M')
  # ISO8601 '2017-14-07T20:25:00.005-07:00'
  iso_date = real_date.strftime('%Y-%d-%mT%H:%M:%S.000-08:00')
  return real_date, iso_date