python偶得备忘录

python偶得备忘录
  • 2020.01.14

    利用paramiko模块显示sftp上传、下载进度、剩余时间、传输速率等,sftp的put/get方法提供了callback参数

    
    import time
    import paramiko
    import sys
    import math
    
    def progress_bar(self,transferred, toBeTransferred, suffix=''):
        # t = time.time()
        percent = '{:.2%}'.format(transferred / toBeTransferred)
        timespent = '已用时:' + str(time.time() - self.begint)[0:5] + '秒'
        udspeed = '速率:' + str(transferred / 1024 / (time.time() - self.begint))[0:5] + 'KB/s'
        timeleft = '预计剩余时间:' + str((toBeTransferred - transferred) / transferred * (time.time() - self.begint))[0:5] + '秒'
        sys.stdout.write('\r')
        sys.stdout.write('%s  %d/%d:[%-50s] %s %s %s %s' % (
            suffix, transferred, toBeTransferred, '=' * int(math.floor(transferred * 50 / toBeTransferred)), percent, udspeed, timespent, timeleft))
        sys.stdout.flush()
        if transferred == toBeTransferred:
                sys.stdout.write('\n')
    
    def RemoteRun(self,host,user,pwd,port,commtxt=None,lfile=None,rfile=None,updown=None):
        result = ""
        i = 1
        # 实例化一个transport对象
        transport = paramiko.Transport(host, port)
        # 建立连接
        transport.connect(username=user, password=pwd)
        try:
            # 获取SFTP 对象
            sftp = paramiko.SFTPClient.from_transport(transport)
            self.begint = time.time()
            if updown == 1:
                # 执行下载动作 远程文件路径,本地文件路径
                sftp.get(rfile, lfile,callback=self.progress_bar)
                result = 'download success!'
            else:
                # 执行上传动作 本地文件路径,远程文件路径
                sftp.put(lfile, rfile,callback=self.progress_bar)
                result = 'upload success!'
        except Exception as e:
            result = 'upload error:%s' % repr(e)
            i = 0
        # 关闭连接
        transport.close()
        return i,result
    
  • 基于bs4库HTML的格式输出,如何让页面更友好的显示.prettify()方法:让html代码更友好的输出

    prettify()方法返回str类型

    
    import requests
    from bs4 import BeautifulSoup
    
    r = requests.get("http://python123.io/ws/demo.html")
    demo = r.text
    soup = BeautifulSoup(demo,"html.parser")
    soup.prettify()
    
  • 当爬取做了前端混肴的网页(如CSDN)或在PC浏览器打开微信分享的网页时,使用requests.get()可能获取不到真实完整的网页内容时,可采用selenium方式

    CSDN使用requests.get()获取的网页内容如下所示(部分示意):

    
    var arg1='33FA29F3022C92644090636A74E707E1F8EC9E6D';
    var _0x4818=['\x63\x73\x4b\x48\x77\x71\x4d\x49',
    ......
    function setCookie(name,value){var expiredate=new Date();expiredate.setTime(expiredate.getTime()+(3600*1000));document.cookie=name+"="+value+";expires="+expiredate.toGMTString()+";max-age=3600;path=/";}
    function reload(x) {setCookie("acw_sc__v2", x);document.location.reload();}
    

    采用selenium+Chrome:

    
    from selenium import webdriver
    
    page_url = 'https://blog.csdn.net/m0_37907797/article/details/102759257'
    chrome = webdriver.Chrome()
    chrome.get(page_url)
    html = chrome.page_source
    print(html)
    chrome.close()
    

    可以看到能获取到真正的网页内容了!

  • 使用selenium时去掉浏览器默认的 “chrome正受到自动测试软件的控制”信息栏显示,网上搜索的大部分方法option.add_argument(‘disable-infobars’) 在新版本chrome已废弃,可采用新的方式:option.add_experimental_option(“excludeSwitches”, [‘enable-automation’])

    
    from selenium import webdriver
    
    option = webdriver.ChromeOptions()
    option.add_experimental_option("excludeSwitches", ['enable-automation'])
    page_url = 'https://blog.csdn.net/m0_37907797/article/details/102759257'
    chrome = webdriver.Chrome(chrome_options=option)
    chrome.get(page_url)
    html = chrome.page_source
    print(html)
    chrome.close()
    
  • 2020.04.09

    写入csv文件时避免科学计数:数字类型转为字符串并在其后加上’\t’即可解决

    
    resu_csv = 'test.csv'
    cardnum = 360101198809081721
    cardnum = str(cardnum) + '\t'
    with open(resu_csv,'w') as fp:
         fp.write(cardnum)
    

发表回复

您的电子邮箱地址不会被公开。