Backup/Restore function of Blogger
The Blogger backup file is a xml file.Please refer to Blogger Developer’s Guide for more information. The following is an example of a feed for a blog with only one post. In particular, a real Blogger feed contains actual IDs and URLs.
<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css"
type="text/css"?>
<feed xmlns='http://www.w3.org/2005/Atom'
xmlns:gd='http://schemas.google.com/g/2005'
gd:etag='W/"D08FQn8-eip7ImA9WxZbFEw."'>
<id>tag:blogger.com,1999:blog-blogID</id>
<updated>2008-04-17T00:03:33.152-07:00</updated>
<title>Lizzy's Diary</title>
<subtitle type='html'></subtitle>
<link rel='http://schemas.google.com/g/2005#feed'
type='application/atom+xml'
href='http://blogName.blogspot.com/feeds/posts/default' />
<link rel='self' type='application/atom+xml'
href='http://www.blogger.com/feeds/blogID/posts/default' />
<link rel='alternate' type='text/html'
href='http://blogName.blogspot.com/' />
<author>
<name>Elizabeth Bennet</name>
<uri>http://www.blogger.com/profile/profileID</uri>
<email>noreply@blogger.com</email>
</author>
<generator version='7.00'
uri='http://www2.blogger.com'>Blogger</generator>
<entry gd:etag='W/"D0YHRn84eip7ImA9WxZUFk8."'>
<id>tag:blogger.com,1999:blog-blogID.post-postID</id>
<published>2008-04-07T20:25:00.005-07:00</published>
<updated>2008-04-07T20:25:37.132-07:00</updated>
<title>Quite disagreeable</title>
<content type='html'><p>I met Mr. Bingley's friend Mr. Darcy
this evening. I found him quite disagreeable.</p></content>
<link rel='edit' type='application/atom+xml'
href='http://www.blogger.com/feeds/blogID/posts/default/postID' />
<link rel='self' type='application/atom+xml'
href='http://www.blogger.com/feeds/blogID/posts/default/postID' />
<link rel='alternate' type='text/html'
href='http://blogName.blogspot.com/2008/04/quite-disagreeable.html' />
<author>
<name>Elizabeth Bennet</name>
<uri>http://www.blogger.com/profile/profileID</uri>
<email>noreply@blogger.com</email>
</author>
</entry>
</feed>
Reference: Blogger APIs Client Library for PythonUsing ElementTree to parse and insert posts to backup xml
ElementTree is a Python API for parsing and creating XML data. To utilize it, just include the following line in the program.from lxml import etree
or
from lxml import etree as ET
Loading an xml file as an template
Takes an xml file as input. Outputs ElementTree and element.def load_xml_template(self, name):
parser = ET.XMLParser(encoding='utf-8')
tree = ET.parse(name, parser)
root = tree.getroot()
return tree, root
Output to xml file using ‘Find’ function in ElementTree
def output_to_xml(self, post_list):
# Change and write the new xml
tree, root = self.load_xml_template('template.xml')
entry = root.find(self.prepend_ns('entry'))
entry.find(self.prepend_ns('id')).text = post_list[0][1]
entry.find(self.prepend_ns('published')).text = post_list[0][3]
entry.find(self.prepend_ns('updated')).text = post_list[0][3]
entry.find(self.prepend_ns('title')).text = post_list[0][2]
entry.find(self.prepend_ns('content')).text = post_list[0][4]
# Ignore the first one
for post in post_list[1:]:
entry2 = copy.deepcopy(entry)
entry2.find(self.prepend_ns('id')).text = post[1]
entry2.find(self.prepend_ns('published')).text = post[3]
entry2.find(self.prepend_ns('updated')).text = post[3]
entry2.find(self.prepend_ns('title')).text = post[2]
entry2.find(self.prepend_ns('content')).text = post[4]
root.append(entry2)
global xml_filename
tree.write(xml_filename, encoding='utf-8', xml_declaration=True)
self.log('Saved file %s' % xml_filename)
Tags with Namespace declared in ElementTree
Since the tags searching for are declared within a namespace, hence: “http://www.w3.org/2005/Atom
” , we have to specify that namespace when searching for those tags. In order to simply the process, a function prepend_ns()
is created.def prepend_ns(self, s):
return '{http://www.w3.org/2005/Atom}' + s
Using ISO datetime
Blogger uses ISO datetime format. Here is the transformation function.def iso_datetime(datetime_string):
real_date = datetime.strptime(datetime_string, '%d, %b %Y %H:%M')
# ISO8601 '2017-14-07T20:25:00.005-07:00'
iso_date = real_date.strftime('%Y-%d-%mT%H:%M:%S.000-08:00')
return real_date, iso_date