본문 바로가기
Log/Trouble shoot

shootingstar_실패로그_EC2에 Selenium scraping flask server 띄우기

by RIEM 2023. 4. 14.
728x90

EC2 ubuntu 22.04 접속

git clone https://github.com/thursdaycurry/shootingstar-scraper-flask.git

 

sudo apt-get update
sudo apt-get upgrade
sudo apt install openjdk-8-jre
sudo apt install openjdk-8-jdk
sudo apt install python3-pip
sudo pip3 install flask
sudo pip3 install python-dotenv
sudo pip3 install selenium

문제 발생

flask --app main.py run
Traceback (most recent call last):
  File "/usr/local/bin/flask", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 1050, in main
    cli.main()
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 84, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 911, in run_command
    raise e from None
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 897, in run_command
    app = info.load_app()
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 308, in load_app
    app = locate_app(import_name, name)
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 218, in locate_app
    __import__(module_name)
  File "/home/ubuntu/shootingstar-scraper-flask/main.py", line 18, in <module>
    from scrapers.agent_newyorktimes import getArticlesFromNewyorkTimes, target_url_nyt
  File "/home/ubuntu/shootingstar-scraper-flask/scrapers/agent_newyorktimes.py", line 31, in <module>
    driver = webdriver.Chrome(executable_path=driver_path, options=options)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
Stacktrace:
#0 0x55ffc2526fe3 <unknown>
#1 0x55ffc2265d36 <unknown>
#2 0x55ffc228cf4a <unknown>
#3 0x55ffc228aa9b <unknown>
#4 0x55ffc22ccaf7 <unknown>
#5 0x55ffc22cc11f <unknown>
#6 0x55ffc22c3693 <unknown>
#7 0x55ffc229603a <unknown>
#8 0x55ffc229717e <unknown>
#9 0x55ffc24e8dbd <unknown>
#10 0x55ffc24ecc6c <unknown>
#11 0x55ffc24f64b0 <unknown>
#12 0x55ffc24edd63 <unknown>
#13 0x55ffc24c0c35 <unknown>
#14 0x55ffc2511138 <unknown>
#15 0x55ffc25112c7 <unknown>
#16 0x55ffc251f093 <unknown>
#17 0x7f8e35874b43 <unknown>

 

selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary

크롬 바이너리 파일을 찾지 못하고 있다는 에러 메시지다.

 

sudo pip install selenium==3.141.0
이건 내 프로그램 특성상 생기는 문제인 것으로 추정된다. 내 프로그램은 셀레니움 3.141버전에서 가동되는 프로그램으로 이에 맞게 버전을 설치하지 않으면 에러가 발생한다.

다른 에러 발생

ubuntu@ip-172-31-84-33:~/shootingstar-scraper-flask$ flask --app main.py run
Traceback (most recent call last):
  File "/usr/local/bin/flask", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 1050, in main
    cli.main()
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 84, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 911, in run_command
    raise e from None
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 897, in run_command
    app = info.load_app()
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 308, in load_app
    app = locate_app(import_name, name)
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 218, in locate_app
    __import__(module_name)
  File "/home/ubuntu/shootingstar-scraper-flask/main.py", line 18, in <module>
    from scrapers.agent_newyorktimes import getArticlesFromNewyorkTimes, target_url_nyt
  File "/home/ubuntu/shootingstar-scraper-flask/scrapers/agent_newyorktimes.py", line 31, in <module>
    driver = webdriver.Chrome(executable_path=driver_path, options=options)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/common/service.py", line 81, in start
    raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

또 다른 에러가 발생해다. 크롬 드라이버의 경로 이슈인 것으로 보인다.

# EC2 linux server
driver_path = "/home/ubuntu/shootingstar-scraper/chromedriver_linux64/chromedriver"
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--diable-dev-shm-usage')
options.add_argument('--headless')
driver = webdriver.Chrome(executable_path=driver_path, options=options)

스크래퍼의 드라이버 경로 변수를 확인해봤다. 현재 내 드라이버의 위치는 이렇게 지정되어 있다.

ubuntu@ip-172-31-84-33:~/shootingstar-scraper-flask$ pwd
/home/ubuntu/shootingstar-scraper-flask

 

pwd로 쳐보니 경로가 당연히 다른 이유를 찾았다. 내 어플리케이션의 이름 자체가 바뀌었던 것이다.

# EC2 linux server
driver_path = "/home/ubuntu/shootingstar-scraper-flask/chromedriver_linux64/chromedriver"
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--diable-dev-shm-usage')
options.add_argument('--headless')
driver = webdriver.Chrome(executable_path=driver_path, options=options)

드라이버 경로에서 어플리케이션 부분을 수정해주었다.

또 binary 에러 발생

ubuntu@ip-172-31-84-33:~/shootingstar-scraper-flask$ flask --app main.py run
Traceback (most recent call last):
  File "/usr/local/bin/flask", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 1050, in main
    cli.main()
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 84, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 911, in run_command
    raise e from None
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 897, in run_command
    app = info.load_app()
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 308, in load_app
    app = locate_app(import_name, name)
  File "/usr/local/lib/python3.10/dist-packages/flask/cli.py", line 218, in locate_app
    __import__(module_name)
  File "/home/ubuntu/shootingstar-scraper-flask/main.py", line 18, in <module>
    from scrapers.agent_newyorktimes import getArticlesFromNewyorkTimes, target_url_nyt
  File "/home/ubuntu/shootingstar-scraper-flask/scrapers/agent_newyorktimes.py", line 31, in <module>
    driver = webdriver.Chrome(executable_path=driver_path, options=options)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chrome/webdriver.py", line 76, in __init__
    RemoteWebDriver.__init__(
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary

또 Chrome binary 문제가 발생했다. 이젠 Selenium의 버전 문제가 아닌 것 같다. 너무도 당연하지만 말이다.

// 관련 디펜던시 설치
sudo apt-get update
sudo apt-get install -y unzip xvfb libxi6 libgconf-2-4

wget https://chromedriver.storage.googleapis.com/112.0.5615.49/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/local/bin/
sudo chown root:root /usr/local/bin/chromedriver
sudo chmod +x /usr/local/bin/chromedriver

https://sites.google.com/chromium.org/driver/?pli=1 참고하여 stable 버전인 112.0.5615.49를 설치했다

// 구글 크롬 설치
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f

자 이제 경로를 지정해줘야 하는데, 혹시 몰라서 실행을 했는데..

이건 왜 실행이 되는걸까. 그런데 퍼블릭 주소로 GET 요청을 보내도 반응이 없다. 5000 포트도 열어주었다.

혹시 크롬 옵션 명시 방식의 문제인건가 싶어서 다시 수정해보았다. options -> chrome_options

driver = webdriver.Chrome(executable_path=driver_path, chrome_options=options)

그래도 아무 변화가 없었다고 한다..

 

728x90

댓글