-
2020.01.14
利用paramiko模块显示sftp上传、下载进度、剩余时间、传输速率等,sftp的put/get方法提供了callback参数
import time import paramiko import sys import math def progress_bar(self,transferred, toBeTransferred, suffix=''): # t = time.time() percent = '{:.2%}'.format(transferred / toBeTransferred) timespent = '已用时:' + str(time.time() - self.begint)[0:5] + '秒' udspeed = '速率:' + str(transferred / 1024 / (time.time() - self.begint))[0:5] + 'KB/s' timeleft = '预计剩余时间:' + str((toBeTransferred - transferred) / transferred * (time.time() - self.begint))[0:5] + '秒' sys.stdout.write('\r') sys.stdout.write('%s %d/%d:[%-50s] %s %s %s %s' % ( suffix, transferred, toBeTransferred, '=' * int(math.floor(transferred * 50 / toBeTransferred)), percent, udspeed, timespent, timeleft)) sys.stdout.flush() if transferred == toBeTransferred: sys.stdout.write('\n') def RemoteRun(self,host,user,pwd,port,commtxt=None,lfile=None,rfile=None,updown=None): result = "" i = 1 # 实例化一个transport对象 transport = paramiko.Transport(host, port) # 建立连接 transport.connect(username=user, password=pwd) try: # 获取SFTP 对象 sftp = paramiko.SFTPClient.from_transport(transport) self.begint = time.time() if updown == 1: # 执行下载动作 远程文件路径,本地文件路径 sftp.get(rfile, lfile,callback=self.progress_bar) result = 'download success!' else: # 执行上传动作 本地文件路径,远程文件路径 sftp.put(lfile, rfile,callback=self.progress_bar) result = 'upload success!' except Exception as e: result = 'upload error:%s' % repr(e) i = 0 # 关闭连接 transport.close() return i,result
-
基于bs4库HTML的格式输出,如何让页面更友好的显示.prettify()方法:让html代码更友好的输出
prettify()方法返回str类型
import requests from bs4 import BeautifulSoup r = requests.get("http://python123.io/ws/demo.html") demo = r.text soup = BeautifulSoup(demo,"html.parser") soup.prettify()
-
当爬取做了前端混肴的网页(如CSDN)或在PC浏览器打开微信分享的网页时,使用requests.get()可能获取不到真实完整的网页内容时,可采用selenium方式
CSDN使用requests.get()获取的网页内容如下所示(部分示意):
var arg1='33FA29F3022C92644090636A74E707E1F8EC9E6D'; var _0x4818=['\x63\x73\x4b\x48\x77\x71\x4d\x49', ...... function setCookie(name,value){var expiredate=new Date();expiredate.setTime(expiredate.getTime()+(3600*1000));document.cookie=name+"="+value+";expires="+expiredate.toGMTString()+";max-age=3600;path=/";} function reload(x) {setCookie("acw_sc__v2", x);document.location.reload();}
采用selenium+Chrome:
from selenium import webdriver page_url = 'https://blog.csdn.net/m0_37907797/article/details/102759257' chrome = webdriver.Chrome() chrome.get(page_url) html = chrome.page_source print(html) chrome.close()
可以看到能获取到真正的网页内容了!
-
使用selenium时去掉浏览器默认的 “chrome正受到自动测试软件的控制”信息栏显示,网上搜索的大部分方法option.add_argument(‘disable-infobars’) 在新版本chrome已废弃,可采用新的方式:option.add_experimental_option(“excludeSwitches”, [‘enable-automation’])
from selenium import webdriver option = webdriver.ChromeOptions() option.add_experimental_option("excludeSwitches", ['enable-automation']) page_url = 'https://blog.csdn.net/m0_37907797/article/details/102759257' chrome = webdriver.Chrome(chrome_options=option) chrome.get(page_url) html = chrome.page_source print(html) chrome.close()
-
2020.04.09
写入csv文件时避免科学计数:数字类型转为字符串并在其后加上’\t’即可解决
resu_csv = 'test.csv' cardnum = 360101198809081721 cardnum = str(cardnum) + '\t' with open(resu_csv,'w') as fp: fp.write(cardnum)