Friday, December 30, 2016

Convert String to Base64 in Python


Code to convert string (from utf-8) to base64 encoding and base64 back to utf-8:

import base64

si = input('\nenter text:\n')
print('\nyou entered\n' + si)

b64 = base64.b64encode(bytes(si, 'utf-8'))
print('\nbase64 is\n' + str(b64))

b64_bs = bytearray(base64.b64encode(bytes(si, 'utf-8')))
print('\nbase64 byte stream is\n' + str(list(b64_bs)))

so = base64.b64decode(b64).decode('utf-8', 'ignore')
print('\nstring is\n' + so)

Convert String to Binary in Python


Code to convert string to binary and binary to string:

def string2bits(s=''):
    return [bin(ord(x))[2:].zfill(8) for x in s]

def bits2string(b=None):
    return ''.join([chr(int(x, 2)) for x in b])

si = input('\nenter text:\n')
b = string2bits(si)
so = bits2string(b)

print('\nyou entered\n' + si)

print('\nbinary is\n' + str(b))

print('\nstring is\n' + so)

TypeError: Can't convert 'bytes' object to str implicitly


Error:

Traceback (most recent call last):
  File "b64-str.py", line 10, in <module>
    print('\nstring is\n' + so)
TypeError: Can't convert 'bytes' object to str implicitly


Problem Code:

so = base64.b64decode(b64)
print('\nstring is\n' + so)


Solution:

so = base64.b64decode(b64).decode('utf-8', 'ignore')
print('\nstring is\n' + so)

AttributeError: module 'json' has no attribute 'dump'

Problem:

Today I was trying to create a json file from Python and then I got this error: "AttributeError: module 'json' has no attribute 'dump'" and I found my solution here: http://stackoverflow.com/a/13612382/4064166

Solution:

So the problem was I named my file as json.py that was causing problems with json module so renaming it solved the issue.

Code:

import json
with open("dump.json", "w") as outfile:
    json.dump({'number':6650, 'strings':'lorem ipsum', 'x':'x', 'y':'y'}, outfile, sort_keys = True, indent=4, ensure_ascii=False)

Screenshot:

Make file read-only in Python

If you want to make file read-only:

import os
from stat import S_IREAD

os.chmod('path/to/file.txt', S_IREAD)


If you want to make file writeable:

import os
from stat import S_IWRITE

os.chmod('path/to/file.txt', S_IWRITE)


Thursday, December 29, 2016

Create PDF from Python

I was googling for a python library that I can use to write strings to a PDF file but mostly I found out programs to copy data from one PDF and write it to another PDF, another programs and solutions I found were for editing existing PDF, it was not what I was looking for.

I wanted to create a PDF from scratch.

After hours of searching finally I found my solution when I arrived on this page by David Fischer and it just uses one module called reportlab.

Code is share via this gist on my GitHub.

TypeError: write() argument must be str, not bytes

Recently I encountered this error: "TypeError: write() argument must be str, not bytes" when trying to write a PDF file from anaconda 3. The solution was to write stream in binary.


Error:

J:\>python pdf.py
Traceback (most recent call last):
  File "pdf.py", line 31, in <module>
    fd.write(buf.getvalue())
TypeError: write() argument must be str, not bytes

Problem code:

# Write the PDF to a file
with open('test.pdf', 'w') as fd:
    fd.write(buf.getvalue())

Solution:

# Write the PDF to a file
with open('test.pdf', 'wb') as fd:
    fd.write(buf.getvalue())

Wednesday, December 28, 2016

Extract all internal and external links from a URL

How it works?
  • This is a Python script which takes complete URLs provided as command line arguments.
  • It then parses a URL in HTML using BeautifulSoup.
  • From the parsed webpage all the anchor hyper-references are extracted and later simple processing is done to sort them in two bins: internal and external.
  • If link contains http or https and the URL is a part of link then it is sorted as internal link otherwise it is external link.
  • All links starting with / or // are internal links.
  • Other tags like javascript, mail and telephone links are ignored.
  • All internal page jumps starting with # are ignored.
  • This link does not provide unique list of internal/external links so if same link is present it will be counted multiple times.
  • It will treat Top-level domain and subdomains as different URL hence it will be counted as external i.e. if you are querying for www.google.com then links with news.google.com and www.google.co.in both will be treated as different domain thus will be counted to external links.


Solve charmap codec error in Python

Error:

'charmap' codec can't encode character '\u2013' in position 112: character maps to <undefined>

Problem code:

        for d in soup.find_all(href=re.compile(url)):
            print(d)

It is not an error from Python but it is because of Windows, in my case there was one character which was not getting encoded properly in command prompt (console) so I had to change the encoding to UTF-8. Solution is pretty simple just type following command.

Solution:

J:\>chcp 65001
Active code page: 65001

CHCP changes the active console code page. 65000 code page is encoded as UTF-7 and 65001 code page is encoded as UTF-8.

Friday, December 09, 2016

YouTube Rewind (2010 - 2016)

#YouTubeRewind

YouTube Rewind is a video series produced and created by YouTube and Portal A Interactive. These videos are an overview and recap of each year's viral videos, events, memes and music. Each year, the number of YouTube celebrities featured in the video, as well as the presentation of the series have increased.

[Source: https://en.wikipedia.org/wiki/YouTube_Rewind]



YouTube Rewind 2010: Year in Review