Saturday, October 01, 2016

Create JSON from HTML in Perl

Assumption: Your HTML file is not pretty-encoded i.e. all content is contained in a single line.

Code:

#!/usr/bin/perl

use strict;
use warnings;
use utf8;
use JSON;
use autodie;

open my $fh1, '<files.html' or die "Cannot open files.html: $!\n";

my $html = <$fh1>;

$html =~ s/<html><head><title>JSON Viewer<\/title><\/head><body><table border="1"><thead><tr><th width="300">Keys<\/th><th width="500">Values<\/th><\/tr><\/thead><tbody>//g;
$html =~ s/<\/tbody><\/table><\/body><\/html>//g;

my @rows = split("<\/tr>", $html);

my %data;

foreach my $row (@rows)
{
    my @cols = split("<\/td>", $row);
    my $key;
   
    foreach my $col (@cols)
    {
        if (not $col =~ /<br\/>/)
        {
            $col =~ s/<tr><td>//g;
            $key = $col;
        }
        else
        {
            my @values = split("<br\/>", $col);
          
            foreach my $value (@values)
            {
                $value =~ s/<td>//g;
                push @{ $data{$key} }, $value;
            }
        }
    }
}

close($fh1) or die "Cannot close files.html: $!\n";

my $json = encode_json \%data;
open my $fh2,">files.json" or die "open failed <output: files.json>: $!\n";
print $fh2 $json or die "print failed <output: files.json>: $!\n";
close $fh2 or die "close failed <output: files.json>: $!\n";