在這個項目中,我試圖複製Sergey Brin和Larry Page開發的基於鏈接的排名系統在1998年的論文“ Pagerank引文排名:將秩序列入網絡”。該算法仍然是Google的Web搜索工具的基礎。蘭維爾(Langville)和邁耶(Meyer)在其2004年的《 Pagerank Inside》(Pagerank)的紙張中提供了有關Pagerank的構建和組成部分的其他指導。
我們首先要構建一個馬爾可夫矩陣,每個條目都是“從狀態I到州J的行為”(Langville and Meyer 2004)。該超鏈接矩陣轉化為隨機,不可還原和原始基質。該馬爾可夫矩陣將融合到主要的特徵向量。該向量是Pagerank向量,它指示圖中每個網頁的重要性。為此,BRIN和PAGE使用Power方法,該方法僅存儲每次迭代的先前迭代,並為隨機,不可減至和原始矩陣快速收斂。
訪問pagerank.py文件以查看實現。特別是,power_method和make_personalization_vector函數是實現的核心。 power_method函數實施。該算法已迭代,直到收斂到足夠低的誤差為止,通常為1E-6。由此產生的特徵向量Yeilds Pagerank載體。請注意,由於該函數將功率方法應用於Thex相對稀疏的超鏈接矩陣,將其轉換為隨機,不可減至和原始矩陣,因此該算法的運行時比我們明確定義了後者矩陣。同時,Make_Personalization_Vector方法接受查詢,然後創建一個提及查詢的文章的向量。這允許我們的power_method函數完成最針對性的頁面排名,這些排名最常鏈接到我們已經知道的頁面相關的頁面。
run pagerank.py --data=./lawfareblog.csv.gz --search_query=weapons
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=4.5715e-03 url=www.lawfareblog.com/why-did-you-wait-moral-emptiness-and-drone-strikes
INFO:root:rank=1 pagerank=3.1107e-03 url=www.lawfareblog.com/dc-district-court-dismisses-journalists-drone-lawsuit
INFO:root:rank=2 pagerank=2.0231e-03 url=www.lawfareblog.com/revived-cia-drone-strike-program-comments-new-policy
INFO:root:rank=3 pagerank=1.9667e-03 url=www.lawfareblog.com/us-court-appeals-dc-circuit-dismisses-suit-over-us-drone-strike
INFO:root:rank=4 pagerank=1.1788e-03 url=www.lawfareblog.com/iran-shoots-down-us-drone-domestic-and-international-legal-implications
INFO:root:rank=5 pagerank=1.1620e-03 url=www.lawfareblog.com/slaughterbots-and-other-anticipated-autonomous-weapons-problems
INFO:root:rank=6 pagerank=1.1276e-03 url=www.lawfareblog.com/german-courts-weigh-legal-responsibility-us-drone-strikes
INFO:root:rank=7 pagerank=8.3738e-04 url=www.lawfareblog.com/shift-jsoc-drone-strikes-does-not-mean-cia-has-been-sidelined
INFO:root:rank=8 pagerank=7.8704e-04 url=www.lawfareblog.com/atomwaffen-division-member-pleads-guilty-firearms-charge
INFO:root:rank=9 pagerank=7.8570e-04 url=www.lawfareblog.com/waiving-imminent-threat-test-cia-drone-strikes-pakistan這些結果的“無人機”和“目標”的底部包含了更多示例,這些示例有助於說明新系統的效果,儘管該文件中的所有結果都應用了新的排名系統。
第1部分:測試基本功率方法實施
實現算法後,我們可以在6個網頁圖上測試基本算法,類似於Langville和Meyer 2004中使用的算法。我們獲得以下結果:請注意,已打開詳細標籤,以說明在調試語句中,算法收斂到恆定的Epsilon項1E-6。這些頁面的排名是,第四個URL排名最高,第一個URL是最低的URL。這些結果通過Mike Izbicki的實施來驗證。
[In]: run pagerank.py --data=small.csv.gz --verbose
DEBUG:root:computing indices
DEBUG:root:computing values
DEBUG:root:root=i=0 accuracy=2.7229e-01
DEBUG:root:root=i=1 accuracy=1.3318e-01
DEBUG:root:root=i=2 accuracy=8.1429e-02
DEBUG:root:root=i=3 accuracy=3.7537e-02
DEBUG:root:root=i=4 accuracy=2.4824e-02
DEBUG:root:root=i=5 accuracy=1.2412e-02
DEBUG:root:root=i=6 accuracy=8.0711e-03
DEBUG:root:root=i=7 accuracy=4.3977e-03
DEBUG:root:root=i=8 accuracy=2.7713e-03
DEBUG:root:root=i=9 accuracy=1.5882e-03
DEBUG:root:root=i=10 accuracy=9.7799e-04
DEBUG:root:root=i=11 accuracy=5.7459e-04
DEBUG:root:root=i=12 accuracy=3.4924e-04
DEBUG:root:root=i=13 accuracy=2.0761e-04
DEBUG:root:root=i=14 accuracy=1.2532e-04
DEBUG:root:root=i=15 accuracy=7.4923e-05
DEBUG:root:root=i=16 accuracy=4.5074e-05
DEBUG:root:root=i=17 accuracy=2.6996e-05
DEBUG:root:root=i=18 accuracy=1.6217e-05
DEBUG:root:root=i=19 accuracy=9.7244e-06
DEBUG:root:root=i=20 accuracy=5.8908e-06
DEBUG:root:root=i=21 accuracy=3.5000e-06
DEBUG:root:root=i=22 accuracy=2.0959e-06
DEBUG:root:root=i=23 accuracy=1.2712e-06
INFO:root:rank=0 pagerank=6.8019e-01 url=4
INFO:root:rank=1 pagerank=5.3070e-01 url=6
INFO:root:rank=2 pagerank=4.1114e-01 url=5
INFO:root:rank=3 pagerank=2.0103e-01 url=2
INFO:root:rank=4 pagerank=1.5937e-01 url=3
INFO:root:rank=5 pagerank=1.4440e-01 url=1第2部分:搜索查詢接下來,我們可以使用許多命令行參數來完善我們的搜索。該算法現在將返回與我們的查詢最相關的這些頁面。 --search_query參數接受一個字符串,並將其與每個鏈接進行比較,並過濾出不包括查詢的鏈接。對於此部分,我使用Mike Izbicki教授準備的數據集,該數據集從Defense Blog www.lawfareblog.com上繪製了超鏈接。從這一點開始,我不會使用詳細命令使結果更加簡潔。以下結果與Izbicki教授實施中獲得的結果相似。
如果我們進行搜索查詢“電暈”以查找有關大流行的文章,則根據Pagerank的最相關鏈接:
[In]: run pagerank.py --data=./lawfareblog.csv.gz --search_query=corona
DEBUG:root:computing indices
DEBUG:root:computing values
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=1.0038e-03 url=www.lawfareblog.com/lawfare-podcast-united-nations-and-coronavirus-crisis
INFO:root:rank=1 pagerank=8.9224e-04 url=www.lawfareblog.com/house-oversight-committee-holds-day-two-hearing-government-coronavirus-response
INFO:root:rank=2 pagerank=7.0390e-04 url=www.lawfareblog.com/britains-coronavirus-response
INFO:root:rank=3 pagerank=6.9153e-04 url=www.lawfareblog.com/prosecuting-purposeful-coronavirus-exposure-terrorism
INFO:root:rank=4 pagerank=6.7041e-04 url=www.lawfareblog.com/israeli-emergency-regulations-location-tracking-coronavirus-carriers
INFO:root:rank=5 pagerank=6.6256e-04 url=www.lawfareblog.com/why-congress-conducting-business-usual-face-coronavirus
INFO:root:rank=6 pagerank=6.5046e-04 url=www.lawfareblog.com/congressional-homeland-security-committees-seek-ways-support-state-federal-responses-coronavirus
INFO:root:rank=7 pagerank=6.3620e-04 url=www.lawfareblog.com/paper-hearing-experts-debate-digital-contact-tracing-and-coronavirus-privacy-concerns
INFO:root:rank=8 pagerank=6.1248e-04 url=www.lawfareblog.com/house-subcommittee-voices-concerns-over-us-management-coronavirus
INFO:root:rank=9 pagerank=6.0187e-04 url=www.lawfareblog.com/livestream-house-oversight-committee-holds-hearing-government-coronavirus-response接下來,我們將調整搜索查詢為“特朗普”:
[In]: run pagerank.py --data=./lawfareblog.csv.gz --search_query=trump
DEBUG:root:computing indices
DEBUG:root:computing values
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=5.7826e-03 url=www.lawfareblog.com/trump-asks-supreme-court-stay-congressional-subpeona-tax-returns
INFO:root:rank=1 pagerank=5.2338e-03 url=www.lawfareblog.com/document-trump-revokes-obama-executive-order-counterterrorism-strike-casualty-reporting
INFO:root:rank=2 pagerank=5.1297e-03 url=www.lawfareblog.com/trump-administrations-worrying-new-policy-israeli-settlements
INFO:root:rank=3 pagerank=4.6599e-03 url=www.lawfareblog.com/dc-circuit-overrules-district-courts-due-process-ruling-qasim-v-trump
INFO:root:rank=4 pagerank=4.5934e-03 url=www.lawfareblog.com/donald-trump-and-politically-weaponized-executive-branch
INFO:root:rank=5 pagerank=4.3071e-03 url=www.lawfareblog.com/how-trumps-approach-middle-east-ignores-past-future-and-human-condition
INFO:root:rank=6 pagerank=4.0935e-03 url=www.lawfareblog.com/why-trump-cant-buy-greenland
INFO:root:rank=7 pagerank=3.7591e-03 url=www.lawfareblog.com/oral-argument-summary-qassim-v-trump
INFO:root:rank=8 pagerank=3.4509e-03 url=www.lawfareblog.com/dc-circuit-court-denies-trump-rehearing-mazars-case
INFO:root:rank=9 pagerank=3.4484e-03 url=www.lawfareblog.com/second-circuit-rules-mazars-must-hand-over-trump-tax-returns-new-york-prosecutors最後,我們將搜索查詢更改為“伊朗”:
[In]: run pagerank.py --data=./lawfareblog.csv.gz --search_query=iran
DEBUG:root:computing indices
DEBUG:root:computing values
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=5.1297e-03 url=www.lawfareblog.com/trump-administrations-worrying-new-policy-israeli-settlements
INFO:root:rank=1 pagerank=5.0168e-03 url=www.lawfareblog.com/update-military-commissions-continued-health-issues-recusal-motion-and-new-cell-al-iraqi
INFO:root:rank=2 pagerank=4.5746e-03 url=www.lawfareblog.com/praise-presidents-iran-tweets
INFO:root:rank=3 pagerank=4.4174e-03 url=www.lawfareblog.com/how-us-iran-tensions-could-disrupt-iraqs-fragile-peace
INFO:root:rank=4 pagerank=4.3659e-03 url=www.lawfareblog.com/haftar-attacking-tripoli-us-needs-re-engage-libya
INFO:root:rank=5 pagerank=3.4237e-03 url=www.lawfareblog.com/france-makes-play-try-foreign-fighters-iraq
INFO:root:rank=6 pagerank=2.6928e-03 url=www.lawfareblog.com/cyber-command-operational-update-clarifying-june-2019-iran-operation
INFO:root:rank=7 pagerank=2.2567e-03 url=www.lawfareblog.com/document-sens-kaine-and-young-introduce-bill-revoke-iraq-force-authorizations
INFO:root:rank=8 pagerank=1.9391e-03 url=www.lawfareblog.com/aborted-iran-strike-fine-line-between-necessity-and-revenge
INFO:root:rank=9 pagerank=1.8263e-03 url=www.lawfareblog.com/its-not-only-iraq-and-syria第3部分:對網頁結構的擔憂大多數網站都有很多結構,因為大多數頁面都連接到主頁和其他一些廣泛的頁面,例如www.lawfareblog.com/topics。因為Pagerank確實鏈接排名,所以許多頁面鏈接到的站點經常但並非總是具有更高的評分。如果我們檢查www.lawfareblog.com上最大的Pageranks,我們可以看到其中幾個寬頁出現:
run pagerank.py --data=./lawfareblog.csv.gz
DEBUG:root:computing indices
DEBUG:root:computing values
INFO:root:rank=0 pagerank=2.8741e-01 url=www.lawfareblog.com/about-lawfare-brief-history-term-and-site
INFO:root:rank=1 pagerank=2.8741e-01 url=www.lawfareblog.com/lawfare-job-board
INFO:root:rank=2 pagerank=2.8741e-01 url=www.lawfareblog.com/litigation-documents-resources-related-travel-ban
INFO:root:rank=3 pagerank=2.8741e-01 url=www.lawfareblog.com/subscribe-lawfare
INFO:root:rank=4 pagerank=2.8741e-01 url=www.lawfareblog.com/our-comments-policy
INFO:root:rank=5 pagerank=2.8741e-01 url=www.lawfareblog.com/upcoming-events
INFO:root:rank=6 pagerank=2.8741e-01 url=www.lawfareblog.com/support-lawfare
INFO:root:rank=7 pagerank=2.8741e-01 url=www.lawfareblog.com/snowden-revelations
INFO:root:rank=8 pagerank=2.8741e-01 url=www.lawfareblog.com/topics
INFO:root:rank=9 pagerank=2.8741e-01 url=www.lawfareblog.com/documents-related-mueller-investigation
但是,如果我們想了解博客最受歡迎的內容是什麼,這些頁面通常沒有用。為了收集有關文章的數據,這些數據通常比鏈接的頁面較少,而不是更寬的頁面,我們可以使用--filter_ratio參數。它刪除了“所有頁面都超過指定的口糧”(Izbicki教授)。現在,我們可以估計最重要的文章:
run pagerank.py --data=./lawfareblog.csv.gz --filter_ratio=0.2
DEBUG:root:computing indices
DEBUG:root:computing values
INFO:root:rank=0 pagerank=3.4696e-01 url=www.lawfareblog.com/trump-asks-supreme-court-stay-congressional-subpeona-tax-returns
INFO:root:rank=1 pagerank=2.9521e-01 url=www.lawfareblog.com/livestream-nov-21-impeachment-hearings-0
INFO:root:rank=2 pagerank=2.9040e-01 url=www.lawfareblog.com/opening-statement-david-holmes
INFO:root:rank=3 pagerank=1.5179e-01 url=www.lawfareblog.com/lawfare-podcast-ben-nimmo-whack-mole-game-disinformation
INFO:root:rank=4 pagerank=1.5099e-01 url=www.lawfareblog.com/todays-headlines-and-commentary-1964
INFO:root:rank=5 pagerank=1.5099e-01 url=www.lawfareblog.com/todays-headlines-and-commentary-1963
INFO:root:rank=6 pagerank=1.5071e-01 url=www.lawfareblog.com/lawfare-podcast-week-was-impeachment
INFO:root:rank=7 pagerank=1.4957e-01 url=www.lawfareblog.com/todays-headlines-and-commentary-1962
INFO:root:rank=8 pagerank=1.4367e-01 url=www.lawfareblog.com/cyberlaw-podcast-mistrusting-google
INFO:root:rank=9 pagerank=1.4240e-01 url=www.lawfareblog.com/lawfare-podcast-bonus-edition-gordon-sondland-vs-committee-no-bull
第4部分:eigengaps p barbar矩陣的特徵由
[In]: run pagerank.py --data=./lawfareblog.csv.gz --verbose
[In]: run pagerank.py --data=./lawfareblog.csv.gz --verbose --alpha=0.99999
[In]: run pagerank.py --data=./lawfareblog.csv.gz --verbose --filter_ratio=0.2
[In]: run pagerank.py --data=./lawfareblog.csv.gz --verbose --filter_ratio=0.2 --alpha=0.99999
雖然前三個命令進行了15、11和22次迭代,但最後一個命令進行了684次迭代。這是因為過濾的P矩陣具有較小的特徵,因此在較大的alpha邊界,收斂需要更長的時間。 Alpha值的這種變化也導致不同的Pagerank排名。首先,第三個命令的結果,該命令過濾P圖,但不使用大型alpha:
[In]: run pagerank.py --data=./lawfareblog.csv.gz --verbose --filter_ratio=0.2
INFO:root:rank=0 pagerank=3.4696e-01 url=www.lawfareblog.com/trump-asks-supreme-court-stay-congressional-subpeona-tax-returns
INFO:root:rank=1 pagerank=2.9521e-01 url=www.lawfareblog.com/livestream-nov-21-impeachment-hearings-0
INFO:root:rank=2 pagerank=2.9040e-01 url=www.lawfareblog.com/opening-statement-david-holmes
INFO:root:rank=3 pagerank=1.5179e-01 url=www.lawfareblog.com/lawfare-podcast-ben-nimmo-whack-mole-game-disinformation
INFO:root:rank=4 pagerank=1.5099e-01 url=www.lawfareblog.com/todays-headlines-and-commentary-1964
INFO:root:rank=5 pagerank=1.5099e-01 url=www.lawfareblog.com/todays-headlines-and-commentary-1963
INFO:root:rank=6 pagerank=1.5071e-01 url=www.lawfareblog.com/lawfare-podcast-week-was-impeachment
INFO:root:rank=7 pagerank=1.4957e-01 url=www.lawfareblog.com/todays-headlines-and-commentary-1962
INFO:root:rank=8 pagerank=1.4367e-01 url=www.lawfareblog.com/cyberlaw-podcast-mistrusting-google
INFO:root:rank=9 pagerank=1.4240e-01 url=www.lawfareblog.com/lawfare-podcast-bonus-edition-gordon-sondland-vs-committee-no-bull
然後,第四命令的結果,該命令過濾p圖並使用一個大的alpha:
[In]: run pagerank.py --data=./lawfareblog.csv.gz --verbose --filter_ratio=0.2 --alpha=0.99999
INFO:root:rank=0 pagerank=7.0149e-01 url=www.lawfareblog.com/covid-19-speech-and-surveillance-response
INFO:root:rank=1 pagerank=7.0149e-01 url=www.lawfareblog.com/lawfare-live-covid-19-speech-and-surveillance
INFO:root:rank=2 pagerank=1.0552e-01 url=www.lawfareblog.com/cost-using-zero-days
INFO:root:rank=3 pagerank=3.1755e-02 url=www.lawfareblog.com/lawfare-podcast-former-congressman-brian-baird-and-daniel-schuman-how-congress-can-continue-function
INFO:root:rank=4 pagerank=2.2040e-02 url=www.lawfareblog.com/events
INFO:root:rank=5 pagerank=1.6027e-02 url=www.lawfareblog.com/water-wars-increased-us-focus-indo-pacific
INFO:root:rank=6 pagerank=1.6026e-02 url=www.lawfareblog.com/water-wars-drill-maybe-drill
INFO:root:rank=7 pagerank=1.6023e-02 url=www.lawfareblog.com/water-wars-disjointed-operations-south-china-sea
INFO:root:rank=8 pagerank=1.6020e-02 url=www.lawfareblog.com/water-wars-song-oil-and-fire
INFO:root:rank=9 pagerank=1.6020e-02 url=www.lawfareblog.com/water-wars-sinking-feeling-philippine-china-relations 第1部分:基本功率方法實現個性化向量是通過查詢過濾的替代方法。個性化向量確定哪些網頁最常見於有關查詢本身的頁面。這與以前的--search_query參數不同,該參數確定了整個Pagerank,然後是與查詢相關的最高匹配的過濾器。為了證明這一點,我們可以在使用個性化向量與--search_query參數時比較相同查詢的排名。
[In]: run pagerank.py --data=./lawfareblog.csv.gz --filter_ratio=0.2 --personalization_vector_query=corona
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=6.3213e-01 url=www.lawfareblog.com/covid-19-speech-and-surveillance-response
INFO:root:rank=1 pagerank=6.3211e-01 url=www.lawfareblog.com/lawfare-live-covid-19-speech-and-surveillance
INFO:root:rank=2 pagerank=1.4962e-01 url=www.lawfareblog.com/chinatalk-how-party-takes-its-propaganda-global
INFO:root:rank=3 pagerank=1.1626e-01 url=www.lawfareblog.com/brexit-not-immune-coronavirus
INFO:root:rank=4 pagerank=1.1626e-01 url=www.lawfareblog.com/rational-security-my-corona-edition
INFO:root:rank=5 pagerank=8.8833e-02 url=www.lawfareblog.com/trump-cant-reopen-country-over-state-objections
INFO:root:rank=6 pagerank=8.5443e-02 url=www.lawfareblog.com/prosecuting-purposeful-coronavirus-exposure-terrorism
INFO:root:rank=7 pagerank=8.5443e-02 url=www.lawfareblog.com/britains-coronavirus-response
INFO:root:rank=8 pagerank=7.1883e-02 url=www.lawfareblog.com/lawfare-podcast-united-nations-and-coronavirus-crisis
INFO:root:rank=9 pagerank=6.8968e-02 url=www.lawfareblog.com/house-oversight-committee-holds-day-two-hearing-government-coronavirus-response [In]: run pagerank.py --data=./lawfareblog.csv.gz --filter_ratio=0.2 --search_query=corona
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=8.1320e-03 url=www.lawfareblog.com/house-oversight-committee-holds-day-two-hearing-government-coronavirus-response
INFO:root:rank=1 pagerank=7.7908e-03 url=www.lawfareblog.com/lawfare-podcast-united-nations-and-coronavirus-crisis
INFO:root:rank=2 pagerank=5.2262e-03 url=www.lawfareblog.com/livestream-house-oversight-committee-holds-hearing-government-coronavirus-response
INFO:root:rank=3 pagerank=3.9584e-03 url=www.lawfareblog.com/britains-coronavirus-response
INFO:root:rank=4 pagerank=3.8114e-03 url=www.lawfareblog.com/prosecuting-purposeful-coronavirus-exposure-terrorism
INFO:root:rank=5 pagerank=3.3973e-03 url=www.lawfareblog.com/paper-hearing-experts-debate-digital-contact-tracing-and-coronavirus-privacy-concerns
INFO:root:rank=6 pagerank=3.3633e-03 url=www.lawfareblog.com/cyberlaw-podcast-how-israel-fighting-coronavirus
INFO:root:rank=7 pagerank=3.3557e-03 url=www.lawfareblog.com/israeli-emergency-regulations-location-tracking-coronavirus-carriers
INFO:root:rank=8 pagerank=3.2160e-03 url=www.lawfareblog.com/congress-needs-coronavirus-failsafe-its-too-late
INFO:root:rank=9 pagerank=3.1036e-03 url=www.lawfareblog.com/why-congress-conducting-business-usual-face-coronavirus第2部分:查找相關的文章,但不要提及我們的查詢,個性化向量特別有用,因為它跟踪哪些文章與查詢最相關。這意味著,如果我們想了解間接效果我們的查詢可能會產生,我們仍然可以使用相同的個性化向量搜索查詢。同時,我們可以使用--search_query參數僅介紹這些文章不包括查詢本身。因為減號不會在Gensim模型中,所以這些結果通過而無需搜索相關單詞。我得到以下“電暈”的結果:
[In]: run pagerank.py --data=./lawfareblog.csv.gz --filter_ratio=0.2 --personalization_vector_query=corona --search_query=-corona
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=6.3213e-01 url=www.lawfareblog.com/covid-19-speech-and-surveillance-response
INFO:root:rank=1 pagerank=6.3211e-01 url=www.lawfareblog.com/lawfare-live-covid-19-speech-and-surveillance
INFO:root:rank=2 pagerank=1.4962e-01 url=www.lawfareblog.com/chinatalk-how-party-takes-its-propaganda-global
INFO:root:rank=3 pagerank=8.8833e-02 url=www.lawfareblog.com/trump-cant-reopen-country-over-state-objections
INFO:root:rank=4 pagerank=6.8562e-02 url=www.lawfareblog.com/lawfare-podcast-mom-and-dad-talk-clinical-trials-pandemic
INFO:root:rank=5 pagerank=6.5838e-02 url=www.lawfareblog.com/fault-lines-foreign-policy-quarantined
INFO:root:rank=6 pagerank=6.1389e-02 url=www.lawfareblog.com/limits-world-health-organization
INFO:root:rank=7 pagerank=5.5939e-02 url=www.lawfareblog.com/chinatalk-dispatches-shanghai-beijing-and-hong-kong
INFO:root:rank=8 pagerank=5.4060e-02 url=www.lawfareblog.com/trump-asks-supreme-court-stay-congressional-subpeona-tax-returns
INFO:root:rank=9 pagerank=4.9363e-02 url=www.lawfareblog.com/us-moves-dismiss-case-against-company-linked-ira-troll-farm儘管這些文章主要與冠狀病毒有關,但也有一些更明顯的主題。例如,有一個重要的導彈防禦聽證會排名第9,這可能是在大流行期間發生的,並受到它的影響,但顯然與冠狀病毒直接相關。
第3部分:查找相關的文章,但不要提及我們的查詢(續),我們可以查看以上文章相關的另一個示例,但沒有明確提及查詢。對於“伊朗”,我們得到以下結果:
[In]: run pagerank.py --data=./lawfareblog.csv.gz --filter_ratio=0.2 --personalization_vector_query=iran --search_query=-iran
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=3.9977e-01 url=www.lawfareblog.com/omphalos
INFO:root:rank=1 pagerank=2.4971e-01 url=www.lawfareblog.com/haftar-attacking-tripoli-us-needs-re-engage-libya
INFO:root:rank=2 pagerank=2.4398e-01 url=www.lawfareblog.com/cancellation-algerias-elections-opportunity-democratization
INFO:root:rank=3 pagerank=2.3989e-01 url=www.lawfareblog.com/yemen-houthi-strategy-has-promise-and-risk
INFO:root:rank=4 pagerank=2.3954e-01 url=www.lawfareblog.com/how-trumps-approach-middle-east-ignores-past-future-and-human-condition
INFO:root:rank=5 pagerank=2.1294e-01 url=www.lawfareblog.com/trump-asks-supreme-court-stay-congressional-subpeona-tax-returns
INFO:root:rank=6 pagerank=1.5762e-01 url=www.lawfareblog.com/livestream-nov-21-impeachment-hearings-0
INFO:root:rank=7 pagerank=1.5762e-01 url=www.lawfareblog.com/opening-statement-david-holmes
INFO:root:rank=8 pagerank=1.1823e-01 url=www.lawfareblog.com/trump-administrations-worrying-new-policy-israeli-settlements
INFO:root:rank=9 pagerank=8.0923e-02 url=www.lawfareblog.com/senate-examines-threats-homeland
我認為這是一個有趣的擴展,特別是要了解有關伊朗國際影響力的更多信息。在過去的幾十年中,伊朗在中東和北非的部分地區一直在慢慢發展。在排名最高的文章中,這種影響非常明顯,因為有有關阿爾及利亞,利比亞和以色列的文章。關於伊朗與美國關係的一些文章的出現也說明了2020年初引起的轟炸和緊張局勢。這些結果表明http://www.lawfareblog.com/認為美國關係和中東地區政治是與伊朗有關的一些重要主題。
這裡還有一些突出顯示更新的排名系統的示例:這些結果用於“無人機”:
run pagerank.py --data=./lawfareblog.csv.gz --search_query=drones
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=4.5715e-03 url=www.lawfareblog.com/why-did-you-wait-moral-emptiness-and-drone-strikes
INFO:root:rank=1 pagerank=3.1107e-03 url=www.lawfareblog.com/dc-district-court-dismisses-journalists-drone-lawsuit
INFO:root:rank=2 pagerank=2.0231e-03 url=www.lawfareblog.com/revived-cia-drone-strike-program-comments-new-policy
INFO:root:rank=3 pagerank=1.9667e-03 url=www.lawfareblog.com/us-court-appeals-dc-circuit-dismisses-suit-over-us-drone-strike
INFO:root:rank=4 pagerank=1.7738e-03 url=www.lawfareblog.com/whats-point-charging-foreign-state-linked-hackers
INFO:root:rank=5 pagerank=1.2885e-03 url=www.lawfareblog.com/video-justice-department-announces-indictment-two-chinese-government-hackers
INFO:root:rank=6 pagerank=1.1973e-03 url=www.lawfareblog.com/daisy-chain-associated-forces-potential-use-force-niger-against-al-mourabitoun
INFO:root:rank=7 pagerank=1.1788e-03 url=www.lawfareblog.com/iran-shoots-down-us-drone-domestic-and-international-legal-implications
INFO:root:rank=8 pagerank=1.1620e-03 url=www.lawfareblog.com/slaughterbots-and-other-anticipated-autonomous-weapons-problems
INFO:root:rank=9 pagerank=1.1506e-03 url=www.lawfareblog.com/dhss-joint-task-forces-next-chapter這些結果是針對“目標”:
run pagerank.py --data=./lawfareblog.csv.gz --search_query=targets
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors
INFO:root:rank=0 pagerank=6.0299e-03 url=www.lawfareblog.com/update-military-commissions-big-september-911-case
INFO:root:rank=1 pagerank=5.0168e-03 url=www.lawfareblog.com/update-military-commissions-continued-health-issues-recusal-motion-and-new-cell-al-iraqi
INFO:root:rank=2 pagerank=4.6573e-03 url=www.lawfareblog.com/targeting-al-baghdadi-and-selective-notification-congress-assessing-issues
INFO:root:rank=3 pagerank=3.2598e-03 url=www.lawfareblog.com/federal-judge-dismisses-military-commissions-defendants-8th-amendment-claim
INFO:root:rank=4 pagerank=3.2329e-03 url=www.lawfareblog.com/military-commissions-judge-rules-against-government-privilege-claim-nashiri-case
INFO:root:rank=5 pagerank=3.2204e-03 url=www.lawfareblog.com/week-military-commissions-98-session-kangaroo-lapel-pin-edition
INFO:root:rank=6 pagerank=3.1839e-03 url=www.lawfareblog.com/military-commission-judge-bars-government-using-defendants-statements-fbi-clean-teams-911-case
INFO:root:rank=7 pagerank=3.1759e-03 url=www.lawfareblog.com/last-week-military-commissions-defense-moves-classification-review-process
INFO:root:rank=8 pagerank=3.1533e-03 url=www.lawfareblog.com/court-military-commissions-review-upholds-life-sentence-al-bahlul
INFO:root:rank=9 pagerank=2.9309e-03 url=www.lawfareblog.com/summary-dc-circuit-vacates-military-judges-rulings-al-nashiri