檔案狀態:    住戶編號:3406983
 孟賢夢 的日記本
快速選單
到我的日記本
看他的最新日記
加入我的收藏
瀏覽我的收藏
國家情報總監 《前一篇 回他的日記本 後一篇》 控制思想莫過於公然編假新聞
 切換閱讀模式  回應  給他日記貼紙   給他愛的鼓勵  檢舉
篇名: 控制言論自由,就算控制思想?
作者: 孟賢夢 日期: 2016.12.14  天氣:  心情:
網路時代控制網頁搜尋資料也是控制思想的方式,由此看來政府組織以納稅人經費不務正業盡是在做些什麼事?
跟蹤搜身搜家已經落伍,現在是非法阻礙網路搜尋!

CNET News Politics and Law
Feds use robots.txt files to stay invisible online. Lame.
Some federal government Web sites, including the Office of the Director of National Intelligence, are trying to remain hidden online by blocking search engines from indexing them. Not only is this lame, but it's a good reason to ignore their robots.txt fi
Declan McCullagh by Declan McCullagh | August 24, 2007 5:00 AM PDT
Follow
9 CommentsFacebook0Twitter0Linked In0
Google +
More+
I noticed, when writing a story on Thursday about the bizarre claims by National Intelligence Director Mike McConnell, that the DNI is trying to hide from search engines. Its robots.txt file says, simply:

That blocks all search engines, including Google, MSN, Yahoo, and so on, from indexing any files at the Office of the Director of National Intelligence's Web site. (Here's some background on the Robots Exclusion Protocol if you're rusty.)
So I figured it would be interesting to see what other fedgov sites did the same. I wrote a quick Perl program to connect to federal government Web sites, check for the presence of a broad robots.txt exclusion, and report the results. By way of disclaimer, it's the same database I used in an article from early 2006, so it's probably a bit out-of-date.
The government sites that mark themselves as entirely off-limits via robots.txt:
http://www.dni.gov/robots.txt
https://gits-sec.treas.gov/robots.txt
http://thomas.loc.gov/robots.txt
http://www.erl.noaa.gov/robots.txt
http://www.nwd.usace.army.mil/robots.txt
http://www.tricare.mil/robots.txt
Some government sites favor one search engine over another (Customs and Border Protection bans all non-governmental search engines except Google; one Army Corps of Engineers site bans Alexa's spider; the Ginnie Mae agency bans Google's image search bot but not, say, Altavista's; the Minority Business Development Agency completely bans all crawlers but Google's; and one Bureau of Reclamation site bans Googlebot v2.1 but allows MSN's bot):
http://cbp.gov/robots.txt
http://www.nad.usace.army.mil/robots.txt
http://www.ginniemae.gov/robots.txt
http://www.mbda.gov/robots.txt
http://www.mp.usbr.gov/
And here are some sites that seem to have had trouble with misbehaving Web crawlers in the past:
http://www.cdc.gov/robots.txt
http://www.glerl.noaa.gov/robots.txt
http://www.usbr.gov/robots.txt
http://www.onr.navy.mil/robots.txt
http://www.senate.gov/robots.txt
http://www.usdoj.gov/robots.txt
Now, I'm the last person to suggest that using robots.txt to cordon off subsets of your Web site is somehow evil. At News.com, we use it to tell search engines not to index our "email story" pages, for instance, and on my own Web site I use it as well. Blocking misbehaving Web crawlers is important and necessary. And robots.txt may be appropriate when a Web site's address changes, which seems to have happened in the case of the National Oceanic and Atmospheric Administration's site in the first chunk of examples above, or when it becomes defunct, which seems to have happened with the Treasury Department's "gits-sec" Web site above.
But why should entire federal offices like the Director of National Intelligence want to remain invisible online? I can think of two reasons: (a) avoiding the situation of posting a report that turned out to be embarrassing and was discovered by Google and (b) letting the Feds modify a file such as a transcript without anyone noticing. (There have been allegations of the Bush administration altering, or at least creatively interpreting, transcripts before. And I've documented how a transcript of a public meeting was surreptitiously deleted -- and then restored.)
Neither situation benefits the public. In fact, I'd say it calls for a friendly amendment to the Robots Exclusion Protocol: Search engines should ignore robots.txt when a government agency is trying to use it to keep its entire Web site hidden from the public.
Topics:Stupidity Tags:search About Declan McCullagh
Declan McCullagh is the chief political correspondent for CNET. Declan previously was a reporter for Time and the Washington bureau chief for Wired and wrote the Taking Liberties section and Other People's Money column for CBS News' Web site.
標籤:
瀏覽次數:13    人氣指數:13    累積鼓勵:0
 切換閱讀模式  回應  給他日記貼紙   給他愛的鼓勵 檢舉
給本文愛的鼓勵:  最新愛的鼓勵
國家情報總監 《前一篇 回他的日記本 後一篇》 控制思想莫過於公然編假新聞
 
給我們一個讚!