Posts

Create PDF from Python

Image
I was googling for a python library that I can use to write strings to a PDF file but mostly I found out programs to copy data from one PDF and write it to another PDF, another programs and solutions I found were for editing existing PDF, it was not what I was looking for. I wanted to create a PDF from scratch. After hours of searching finally I found my solution when I arrived on this page by David Fischer and it just uses one module called reportlab . Code is share via this gist on my GitHub.

TypeError: write() argument must be str, not bytes

Image
Recently I encountered this error: "TypeError: write() argument must be str, not bytes" when trying to write a PDF file from anaconda 3. The solution was to write stream in binary. Error: J:\>python pdf.py Traceback (most recent call last):   File "pdf.py", line 31, in <module>     fd.write(buf.getvalue()) TypeError: write() argument must be str, not bytes Problem code: # Write the PDF to a file with open('test.pdf', 'w' ) as fd:     fd.write(buf.getvalue()) Solution: # Write the PDF to a file with open('test.pdf', 'wb' ) as fd:     fd.write(buf.getvalue()) Ref.: http://stackoverflow.com/a/5513856 This Python error happens when binary data is written to a file opened in text mode, which is common when producing a PDF or other binary output. Opening the file in binary mode, for example with the wb flag, lets you write bytes directly and resolves it. Writing text to a file opened in binary mo...

Extract all internal and external links from a URL

Image
How it works? This is a Python script which takes complete URLs provided as command line arguments. It then parses a URL in HTML using BeautifulSoup . From the parsed webpage all the anchor hyper-references are extracted and later simple processing is done to sort them in two bins: internal and external. If link contains http or https and the URL is a part of link then it is sorted as internal link otherwise it is external link. All links starting with / or // are internal links. Other tags like javascript, mail and telephone links are ignored. All internal page jumps starting with # are ignored. This link does not provide unique list of internal/external links so if same link is present it will be counted multiple times. It will treat Top-level domain and subdomains as different URL hence it will be counted as external i.e. if you are querying for www.google.com then links with news.google.com and www.google.co.in both will be treated as different domain thus will be count...

Solve charmap codec error in Python

Image
Error: 'charmap' codec can't encode character '\u2013' in position 112: character maps to <undefined> Problem code:         for d in soup.find_all(href=re.compile(url)):             print(d) It is not an error from Python but it is because of Windows, in my case there was one character which was not getting encoded properly in command prompt (console) so I had to change the encoding to UTF-8. Solution is pretty simple just type following command. Solution: J:\>chcp 65001 Active code page: 65001 CHCP changes the active console code page. 65000 code page is encoded as UTF-7 and 65001 code page is encoded as UTF-8. ref1.: http://stackoverflow.com/a/32383309 ref2.: http://ss64.com/nt/chcp.html This Python error, the charmap codec cannot encode a character, comes from Windows defaulting to a legacy code page that cannot represent certain Unicode characters such a...

YouTube Rewind (2010 - 2016)

Image
#YouTubeRewind YouTube Rewind is a video series produced and created by YouTube and Portal A Interactive. These videos are an overview and recap of each year's viral videos, events, memes and music. Each year, the number of YouTube celebrities featured in the video, as well as the presentation of the series have increased. [Source: https://en.wikipedia.org/wiki/YouTube_Rewind ] YouTube Rewind 2010: Year in Review