Just my blog

Blog about everything, mostly about tech stuff I made. Here is the list of stuff I'm using at my blog. Feel free to ask me about implementations.

My Github My LinkedIn

Soft I recommend

Mobaxterm SSH RDP FTP...
Thunderbird Email client
Filezilla FTP client/server
Nirsoft Win utils
Sysinternals Win utils
Pi-Hole AD block by DNS
NUT UPS manager
Rpi MON Raspberry monitoring
Free CAD 3D modelling
Free Commander Far-like filemanager
Bitwarden Password manager

There are no referral links!

Py lib I recommend

Django web framework
celery multi-tasking
celery-beat Celery + Django
celery-results Celery + Django
Pillow Python image lib
wsgi mod Apache + Python
requests best in WEB requests
openpyxl make Excell docs
p4python Perforce + Python
paramiko SSH + Python
pyvmomi ESXi Vcenter + Python

I'm using these libraries so you can ask me about them.

There are no referral links!

Python HTMLParser

How to spent two days if you know nothing about Python:

need parse HTML page code, where VK id and username of every person who shared post stores

with open('test.html', 'r', encoding='utf-8') as content_file:
    read_data = content_file.read()

from html.parser import HTMLParser
import re

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        vk_id = str(attrs)
        for line in vk_id:
            vk = re.findall('/\S+$', vk_id)
        vk_fnd = str(vk)
        if re.search('/\w+\'\)\]', vk_fnd):
            global vk_read
            vk_read = vk_fnd
            for ch in ['/', ')', '[', ']', '"', "'"]:
                if ch in vk_read:
                    vk_read = vk_read.replace(ch, "")
    def handle_data(self, data):
        global vk_name
        vk_name = str(data)
        assert isinstance(data, object)
        for line in vk_name:
            if re.match('\S+\s+\S+$', vk_name):
                print("@{0} - {1}".format(vk_read, vk_name))
                break


parser = MyHTMLParser()
parser.feed(read_data)

Now I know more. First bug and first fix: UnicodeEncodeError: 'charmap' codec can't encode character '\u0406' in position 15: character maps to <undefined>

with open('test.html', 'r', encoding='utf-8') as content_file:
    read_data = content_file.read()
'''
1. Replased error with charset by replase character
'''


from html.parser import HTMLParser
import re, sys

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        vk_id = str(attrs)
        for line in vk_id:
            vk = re.findall('/\S+$', vk_id)
        vk_fnd = str(vk)
        if re.search('/\w+\'\)\]', vk_fnd):
            global vk_read
            vk_read = vk_fnd
            for ch in ['/', ')', '[', ']', '"', "'"]:
                if ch in vk_read:
                    vk_read = vk_read.replace(ch, "")
    def handle_data(self, data):
        global vk_name
        vk_name = str(data)
        for line in vk_name:
            if re.match('\S+\s+\S+$', vk_name):
                for ch in ['\u0456', '\u0406']:
                    if ch in vk_name:
                        vk_name = vk_name.replace(ch, "?")
                print("@{0} - {1}".format(vk_read, vk_name))
                break


parser = MyHTMLParser()
parser.feed(read_data)

UPD2: I have found another one solution for this code, just re-thinking it's logic. That way I use in vk = re.findall('/\S+$', vk_id) can be simplified. I use vk_id = str(attrs) to convert list to string and then found there something matches regex. ! BUT I should just address to value from list ! example:

attrs =&nbsp;[('href', '/id168265578'), ('class', 'like_img_cont')]
print "attrs[1]: ", attrs[1]

Will upgrade this section later.

Modified: 2024 April 21 (Sun) 18:49

Alex 2015 April 29 (Wed) 19:20 python

tech

Tech posts, about installing or setting-up something.

raspberry

Everything related to raspberry

python

Using Python or coding in Python.

linux

octopus

Octopus is a framework for test execution, statistics collection and virtual machine deployment automation.

windows

WIndows OS and related issues and stuff.

REST

REST API for\from different services

Django

Django Web(Server) Framework

virtualization

SQL DB

SQL type database and related issues

project man

Project management tools and thoughts, software and not.

web

Site hostings, web server, web browsing, web developing.

personal

Just personal thoughts, they aren't always readable or adequate

photo

My photos or photo related stuff.