autoscraper 다운로드 - autoscraper 소스 코드 다운로드

autoscraper

기타 소스코드

1.1.14

다운로드

자동 개약 : 파이썬을위한 스마트, 자동, 빠르고 가벼운 웹 스크레이퍼

이 프로젝트는 자동 웹 스크래핑을 위해 스크래핑을 쉽게 만들 수 있도록 만들어졌습니다. 웹 페이지의 URL 또는 HTML 컨텐츠와 해당 페이지에서 스크랩하려는 샘플 데이터 목록을 가져옵니다. 이 데이터는 해당 페이지의 텍스트, URL 또는 HTML 태그 값일 수 있습니다. 스크래핑 규칙을 배우고 비슷한 요소를 반환합니다. 그런 다음이 학습 된 객체를 새 URL과 함께 사용하여 유사한 컨텐츠 또는 해당 새 페이지의 동일한 요소를 얻을 수 있습니다.

설치

Python 3과 호환됩니다.

PIP를 사용하여 git 저장소에서 최신 버전을 설치하십시오.

$ pip install git+https://github.com/alirezamika/autoscraper.git

PYPI에서 설치 :

$ pip install autoscraper

소스에서 설치 :

$ python setup.py install

사용 방법

비슷한 결과를 얻습니다

StackoverFlow 페이지에서 모든 관련 게시물 제목을 가져오고 싶다고 가정 해 봅시다.

 from autoscraper import AutoScraper

url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'

# We can add one or multiple candidates here.
# You can also put urls here to retrieve urls.
wanted_list = [ "What are metaclasses in Python?" ]

scraper = AutoScraper ()
result = scraper . build ( url , wanted_list )
print ( result )

출력은 다음과 같습니다.

[
    'How do I merge two dictionaries in a single expression in Python (taking union of dictionaries)?' , 
    'How to call an external command?' , 
    'What are metaclasses in Python?' , 
    'Does Python have a ternary conditional operator?' , 
    'How do you remove duplicates from a list whilst preserving order?' , 
    'Convert bytes to a string' , 
    'How to get line count of a large file cheaply in Python?' , 
    "Does Python have a string 'contains' substring method?" , 
    'Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?'
]

이제 scraper 객체를 사용하여 모든 stackoverflow 페이지의 관련 주제를 얻을 수 있습니다.

 scraper . get_result_similar ( 'https://stackoverflow.com/questions/606191/convert-bytes-to-a-string' )

정확한 결과를 얻습니다

Yahoo Finance에서 라이브 주가를 긁어 내고 싶다고 가정 해 봅시다 :

 from autoscraper import AutoScraper

url = 'https://finance.yahoo.com/quote/AAPL/'

wanted_list = [ "124.81" ]

scraper = AutoScraper ()

# Here we can also pass html content via the html parameter instead of the url (html=html_content)
result = scraper . build ( url , wanted_list )
print ( result )

페이지의 내용이 동적으로 변경 되므로이 코드를 복사하려면 wanted_list 를 업데이트해야합니다.

사용자 정의 requests 모듈 매개 변수를 전달할 수도 있습니다. 예를 들어 프록시 또는 사용자 정의 헤더를 사용하고 싶을 수도 있습니다.

 proxies = {
    "http" : 'http://127.0.0.1:8001' ,
    "https" : 'https://127.0.0.1:8001' ,
}

result = scraper . build ( url , wanted_list , request_args = dict ( proxies = proxies ))

이제 우리는 모든 상징의 가격을 얻을 수 있습니다.

 scraper . get_result_exact ( 'https://finance.yahoo.com/quote/MSFT/' )

다른 정보도 받고 싶을 수도 있습니다. 예를 들어 시가 총액도 얻으려면 원하는 목록에 추가 할 수 있습니다. get_result_exact 메소드를 사용하면 원하는 목록에서 데이터를 동일한 순서로 검색합니다.

또 다른 예 : 텍스트에 대한 정보, 별 수 및 Github Repo 페이지의 문제에 대한 링크를 긁어냅니다.

 from autoscraper import AutoScraper

url = 'https://github.com/alirezamika/autoscraper'

wanted_list = [ 'A Smart, Automatic, Fast and Lightweight Web Scraper for Python' , '6.2k' , 'https://github.com/alirezamika/autoscraper/issues' ]

scraper = AutoScraper ()
scraper . build ( url , wanted_list )

단순 해요?

모델 저장

이제 내장 모델을 저장하여 나중에 사용할 수 있습니다. 저장하려면 :

 # Give it a file path
scraper . save ( 'yahoo-finance' )

그리고로드하려면 :

 scraper . load ( 'yahoo-finance' )

튜토리얼

보다 진보 된 사용에 대해서는이 요점을 참조하십시오.
자동 개약 및 플라스크 : 5 분 이내에 모든 웹 사이트에서 API를 만듭니다.

문제

모듈을 사용하는 데 문제가있는 경우 문제를 자유롭게 열십시오.

프로젝트를 지원하십시오

행복한 코딩 ♥ ♥zel

확장하다

추가 정보

버전 1.1.14
유형 기타 소스코드
업데이트 시간 2025-02-25
크기 12.55KB
출처 Github

autoscraper

자동 개약 : 파이썬을위한 스마트, 자동, 빠르고 가벼운 웹 스크레이퍼

설치

사용 방법

비슷한 결과를 얻습니다

정확한 결과를 얻습니다

모델 저장

튜토리얼

문제

프로젝트를 지원하십시오

행복한 코딩 ♥ ♥zel

Google Dorks

shepherd

hidusbf

mongo express

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

hidusbf

Google Dorks

shepherd

hidusbf