什么是百度蜘蛛的referer

百度蜘蛛的referer,是指当百度蜘蛛抓取某一个URL的时候,在HTTP头中带的Referer字段。请注意,这个定义和百度最近声明去除Referer中关键词数据没有任何关系。这次讲的是spider发起的HTTP请求,百度而去除的是用户发起的。如果百度蜘蛛抓取百度首页的logo,会发起这样的请求:

GET /img/bd_logo1.png HTTP/1.1
Host: www.baidu.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html
Referer: https://www.baidu.com/

上面Referer字段很明确的表示了他是从www.baidu.com这个页面上发现并抓取了www.baidu.com/img/bd_logo1.png。而大家在服务器访问日志中也应该能看到相应的记录。目前发现只有当百度抓取一个网页的同时,又抓取了网页中的:img、js和css才会带上referer字段。这部分额外的抓取量,应该不会占用百度分配的抓取配额,属于“买1送1”。

对于站长的意义

如果你发现有一批URL(仅限于img,js,css)报错(4xx或者5xx),但是一直找不到入口在哪,也就是说你不明白百度蜘蛛是从哪里发现这些错误URL的。这个字段可以帮助你迅速定位。

举个例子

比如我们的SEO日志分析系统中可以看到,符合下面这种URL Pattern的路径每天有6万到10万的抓取而且全部报404。

/\d{6}/\d{2}/.+

日志统计中的404

从发现问题至今过了1个月,查遍整个网站我也没找到入口。今天偶然仔细查了一下日志,想起了百度蜘蛛的referer,马上就能定位问题了。这些404的URL来自于一套没人维护也没人关注的页面(往往是这样)。收录流量都不错。由于最近公司图片系统更新,图片的URL全部更改了,但这套页面并没有跟着更新。

如果站点没有记录referer怎么办

iis请在这里勾选“cs(Referer)”:

iis-log-1 iis-log-2

apache请参考:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
CustomLog log/access_log combined

Nginx请参考:

log_format combined '$remote_addr - $remote_user [$time_local]  '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

结束语

  • 很多SEO问题并不是立即致命的,所以没有及时解决。流量就像蚂蚁啃大象一样一点一点啃掉了。
  • 系统性的知识积累还是会在关键时刻发挥作用的。
  • 感谢飞鹰对本文的修正。


layout: post title: "[Android Lint] "XXXX" is not translated in "en" (English), "zh" (Chinese)" date: 2015-12-18 14:40:30 comments: true categories: dev

tags: [android]

1.出现问题: 今天打包具有双语的Android工程,在引用中报了一个莫名其妙的错误,如下图:问题出现也就是说在打包导出的时候有错误,再来看一下错误,

查看错误 "XXXX" is not translated in "en" (English), "zh" (Chinese)报的是Lint Warnings错误。

2.问题原因 根据错误信息,是说我没有对string文件进行国际化翻译操作,查看报错位置,原来是当前项目引用的一个Library有国际化操作,包含values-en和values-zh两个文件夹,才导致我到处当前项目的时候报出此错误。

3.临时方法 Eclipse->Windows-> Preferences->Android ->Lint Error Checking的Correctness: Messages ->MissingTranslate

临时解决办法把MissingTranslate的Severity的值改为Warning。

4.最终方法 如果你的项目是国际化,或是双语的,那么在项目中创建values-en和values-zh两个文件夹,然后把中文的string.xml放到values-zh问价夹下,把英文的string.xml放到values-en下。

run.sh样本

#!/bin/bash

HADOOP_HOME=/usr/lib/hadoop
HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce

orderfromid='12345'
srcid='semid1'


DATETIME=`date -d last-day +%Y%m%d`
DATETIME='{20131202,20131203,20131204,20131205,20131206,20131207,20131208}'
input='/data/logs/webserver/website/hotel/'$DATETIME
output='/homedir/username/keyword_convertion/'$srcid'/'$DATETIME
mapper_file='./mapper.py'
reducer_file='./reducer.py'

$HADOOP_HOME/bin/hadoop jar $HADOOP_STREAMING/contrib/streaming/hadoop-streaming.jar \
    -D mapred.text.key.partitioner.options=-k1,1 \
    -D map.output.key.field.separator=' ' \
    -D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
    -input $input \
    -output $output \
    -mapper $mapper_file \
    -file $mapper_file \
    -reducer $reducer_file \
    -file $reducer_file \
    -cmdenv srcid="$srcid" \
    -cmdenv orderfromid="$orderfromid" \
    -numReduceTasks 0

mapper样本

#!/usr/bin/python
import sys
import re
import urllib2
import os
from urlparse import parse_qs,urlparse

orderfromid = os.environ['orderfromid']
srcid = os.environ['srcid']

for line in sys.stdin:
  line = line.strip()
  if line[:1] == '#':
    continue

  try:
    date,time,sitename,computername,s_ip,method,uri_stem,uri_query,port,username,c_ip,cs_version,ua,c_cookies,referer,cs_host,sc_status,sc_substatus,sc_win32_status,sc_bytes,cs_bytes,time_taken = line.split(' ')
  except ValueError:
    continue

  query_pairs = parse_qs(uri_query)

  if 'TableName' in query_pairs:
    TableName = query_pairs['TableName'][0]
    cookies={}
    for cookie in c_cookies.split(';+'):
      kv = cookie.split('=')
      if len(kv) != 2:
        continue
      k,v = kv
      cookies[k]=v

    if 'CookieGuid' not in cookies:
      continue
    CookieGuid = cookies['CookieGuid']
    key = CookieGuid+' '+date+' '+time
    if TableName == 'FlowStatiOrder':
      if 'OrderFrom' not in query_pairs or query_pairs['OrderFrom'][0] != orderfromid:
        continue
      value = "type=o&orderid="+query_pairs['OrderId'][0]
    elif TableName == 'FlowStatiData':
      referer_pairs = urlparse(referer)
      referer_query_pairs = parse_qs(referer_pairs.query)
      if 'srcid' not in referer_query_pairs or 'uuid' not in referer_query_pairs or referer_query_pairs['srcid'][0] != srcid:
        continue
      value = "type=d&uuid="+referer_query_pairs['uuid'][0]
    else:
      continue
  else:
    uri_query_pairs = parse_qs(uri_query)
    if 'srcid' not in uri_query_pairs or 'uuid' not in uri_query_pairs or uri_query_pairs['srcid'][0] != srcid:
      continue
    value = "type=d&uuid="+uri_query_pairs['uuid'][0]
  if 'key' in locals():
    print key+"\t"+value

reducer样本

#!/usr/bin/python
import sys
import re
import urllib2
from urlparse import parse_qs
last_line = None

for line in sys.stdin:
  cookie_date_time,value = line.split("\t")
  cookie,date,time = cookie_date_time.split(" ")
  value_pairs = parse_qs(value)

  if last_line:
    last_cookie_date_time,last_value = last_line.split("\t")
    last_cookie, last_date,last_time = last_cookie_date_time.split(' ')
    last_value_pairs  = parse_qs(last_value)
    if 'uuid' in last_value_pairs:
      last_value_uuid = last_value_pairs['uuid'][0].strip()
    else:
      last_value_uuid = 'xxxxxxxxxxx'

    if 'type' in value_pairs and value_pairs['type'][0].strip() == 'o':
      if last_cookie == cookie:
        print last_value_uuid+"\t"+"1"
      else:
        print last_value_uuid+"\t"+"0"
    else:
      print last_value_uuid+"\t"+"0"

  last_line = line

参考文献

http://www.cnblogs.com/forfuture1978/archive/2010/11/14/1877086.html map-reduce http://hbtc2012.hadooper.cn/ http://www.ibm.com/developerworks/cn/opensource/os-log-process-hadoop/ http://www.ibm.com/developerworks/cn/linux/l-hadoop-3/ http://blog.csdn.net/zhaoyl03/article/details/8657031 https://192.168.9.247:8443 azkaban azkaban http://hive001.hadoop.bjy.elong.com:81/data/input/iis/website/tj/20121218/

调试 中间结果(mapping 后的结果)

-numReduceTasks 0

常用命令

hadoop fs -ls /data/logs/webserver/website/hotel/20131128
hadoop fs -cat /data/logs/webserver/website/hotel/20131128/log.1385589547988

删除文件和目录

hadoop fs -rmr /data/logs/webserver/website/hotel/20131128
hadoop fs -rm /data/logs/webserver/website/hotel/20131128/log.1385589547988

结果合并后导出

hadoop fs -getmerge  /data/logs/webserver/website/hotel/20131128 /homedir/username/output

按范围查询hdfs

hadoop fs -ls /data/logs/webserver/website/hotel/{20131129..20131205}

按范围输入

input /homedir/username/keyword_convertion/searchengine/{20131202,20131203,20131204,20131205,20131206,20131207,20131208}

Android Uri位于android.net包下

1,调web浏览器

Uri myBlogUri = Uri.parse(" http://xxxxx.com ");  
returnIt = new Intent(Intent.ACTION_VIEW, myBlogUri);  

2,地图

Uri mapUri = Uri.parse("geo:38.899533,-77.036476");  
returnIt = new Intent(Intent.ACTION_VIEW, mapUri); 

3,调拨打电话界面

Uri telUri = Uri.parse("tel:100861");  
returnIt = new Intent(Intent.ACTION_DIAL, telUri);  

4,直接拨打电话

Uri callUri = Uri.parse("tel:100861");  
returnIt = new Intent(Intent.ACTION_CALL, callUri);

5,卸载

Uri uninstallUri = Uri.fromParts("package", "xxx", null);  
returnIt = new Intent(Intent.ACTION_DELETE, uninstallUri);

6,安装

Uri installUri = Uri.fromParts("package", "xxx", null);  
returnIt = new Intent(Intent.ACTION_PACKAGE_ADDED, installUri);  

7,播放

Uri playUri = Uri.parse("file:///sdcard/download/everything.mp3");  
returnIt = new Intent(Intent.ACTION_VIEW, playUri); 

8,调用发邮件

Uri emailUri = Uri.parse("mailto:xxxx@gmail.com");  
returnIt = new Intent(Intent.ACTION_SENDTO, emailUri);  

9,发邮件

returnIt = new Intent(Intent.ACTION_SEND);  
String[] tos = { "xxxx@gmail.com" };  
String[] ccs = { "xxxx@gmail.com" };  
returnIt.putExtra(Intent.EXTRA_EMAIL, tos);  
returnIt.putExtra(Intent.EXTRA_CC, ccs);  
returnIt.putExtra(Intent.EXTRA_TEXT, "body");  
returnIt.putExtra(Intent.EXTRA_SUBJECT, "subject");  
returnIt.setType("message/rfc882");  
Intent.createChooser(returnIt, "Choose Email Client");  

10,发短信

Uri smsUri = Uri.parse("tel:100861");  
returnIt = new Intent(Intent.ACTION_VIEW, smsUri);  
returnIt.putExtra("sms_body", "yyyy");  
returnIt.setType("vnd.android-dir/mms-sms");  

11,直接发邮件

Uri smsToUri = Uri.parse("smsto://100861");  
returnIt = new Intent(Intent.ACTION_SENDTO, smsToUri);  
returnIt.putExtra("sms_body", "yyyy");  

12,发彩信

Uri mmsUri = Uri.parse("content://media/external/images/media/23");  
returnIt = new Intent(Intent.ACTION_SEND);  
returnIt.putExtra("sms_body", "yyyy");  
returnIt.putExtra(Intent.EXTRA_STREAM, mmsUri);  
returnIt.setType("image/png");