🚀
AppSeed Docs
  • AppSeed
  • Support
  • App Generator
  • Boilerplate Code
    • API Servers
      • API Unified Definition
      • API Server Django
      • API Server Flask
      • API Server Node JS
    • Boilerplate Django Dashboards
    • Boilerplate Django Apps
    • Boilerplate Flask Dashboards
    • Boilerplate Flask Apps
    • Boilerplate Jinja
    • Django Templates
      • Django Template Volt
      • Django Argon Dashboard
      • Django Material Dashboard
  • Developer Tools
    • Django API Generator
    • Django Dynamic API
    • Django Dynamic DataTables
    • Flask Dynamic API
    • Flask Dynamic DataTables
    • DB Introspection Tool
  • PRODUCTS
    • Full-stack
      • Full-Stack React Purity
      • Full-Stack React Material
      • Full-Stack React Datta Able
      • Full-Stack React Berry
    • React
      • React Node JS Berry
      • React Node JS Datta Able
      • React Node JS Datta PRO
      • React Node Material Kit PRO
      • React Firebase Datta Able
      • React Firebase Datta PRO
      • Django React Datta Able
      • Django React Datta PRO
      • Flask React Datta Able
    • Flask Apps
      • Flask Material Kit
      • Flask Material Kit PRO
      • Flask Material Kit 2 PRO
      • Flask Pixel Lite
      • Flask Pixel PRO
      • Flask Soft Design
      • Flask Soft Design PRO
    • Flask Dashboards
      • Flask Adminator
      • Flask AdminLTE
      • Flask Azia Dashboard
      • Flask Argon Dashboard
      • Flask Argon PRO
      • Flask Atlantis Dark
      • Flask Atlantis Dark PRO
      • Flask Black Dashboard
      • Flask Black PRO
      • Flask CoreUI
      • Flask Datta Able
      • Flask Datta Able PRO
      • Flask Datta Able ENH
      • Flask Gradient Able
      • Flask Light Bootstrap
      • Flask Light Blue
      • Flask Material Dashboard
      • Flask Material PRO
      • Flask MaterialPro Lite
      • Flask MaterialPRO
      • Flask Material Admin
      • Flask Star Admin
      • Flask Soft Dashboard
      • Flask Soft Dashboard Pro
      • Flask Shards Dashboard
      • Flask SB Admin
      • Flask Sleek Admin
      • Flask Volt
      • Flask Volt PRO
      • Flask Tabler
    • Django Apps
      • Django Pixel Lite
      • Django Pixel PRO
      • Django Material Kit
      • Django Material Kit PRO
      • Django Material Kit 2 PRO
      • Django Soft Design
      • Django Soft Design PRO
    • Django Dashboards
      • Django Adminator
      • Django AdminLTE
      • Django Azia Dashboard
      • Django Argon Dashboard
      • Django Argon PRO
      • Django Atlantis Dark
      • Django Atlantis Dark PRO
      • Django Black Dashboard
      • Django Black PRO
      • Django CoreUI
      • Django Datta Able
      • Django Datta Able PRO
      • Django Datta Able ENH
      • Django Gradient Able
      • Django Light Bootstrap
      • Django Light Blue
      • Django Material Dashboard
      • Django Material PRO
      • Django MaterialPro Lite
      • Django MaterialPro
      • Django Star Admin
      • Django Soft Dashboard
      • Django Soft Dashboard PRO
      • Django Soft Dashboard ENH
      • Django Shards Dashboard
      • Django SB Admin
      • Django Volt
      • Django Volt PRO
      • Django Tabler
    • Eleventy 11ty
      • Soft UI Design
    • Jinja Templates
      • Jinja Argon Dashboard
      • Jinja AdminLTE
      • Jinja Atlantis Lite
      • Jinja Black Dashboard
      • Jinja Datta Able PRO
      • Jinja Dashkit v3
      • Jinja Light Bootstrap
      • Jinja Material Dashboard
      • Jinja Now UI Dashboard
      • Jinja Paper Dashboard
      • Jinja Pixel Lite
      • Jinja Pixel PRO
      • Jinja SB Admin
      • Jinja Volt Bootstrap 5
      • Jinja Volt PRO
  • Features
    • Flask
      • Extended User Model
  • Free Samples
    • Django Samples
      • Django Create Model
      • Django User Profile
      • Django Sales Charts
      • Django Routing Sample
      • Django API Server
  • Technologies
    • Django
      • Getting Started
      • Django v4 Release
    • FastAPI
      • Getting Started
    • Flask
      • Getting Started
      • Flask v2.0 Release
    • MySql
      • Install MySql on Windows
      • Install MySql on Ubuntu
    • Material UI
      • Getting Started
    • Next JS
      • Getting Started
      • Sample Next.js Project
      • Deploy a Next.js application on Netlify
      • Best Practices for NextJS projects
    • Python
      • Getting Started
    • Supabase
      • Getting Started
  • Deployment
    • Google Cloud Platform
      • Deploy Django on GCP
    • App Platform by DO
      • App Platform - Deploy Django
    • AWS - Amazon Services
      • Deploy Django on AWS
    • Render Deployment Platform
      • Deploy Flask to Render
      • Deploy Django to Render
      • Deploy React to Render
      • Deploy Django & React to Render
  • Content
    • What IS
      • What IS Apache
      • What IS Bulma
      • What IS Boilerplate Code
      • What IS CentOS
      • What IS Django
      • What IS Eleventy
      • What IS Flask
      • What IS Gunicorn
      • What IS Heroku
      • What IS JAMStack
      • What IS Jinja
      • What IS Nginx
      • What IS Panini
      • What IS Pipenv
      • What IS Python
      • What IS React
      • What IS Static Site
      • What IS Svelte
      • What IS Ubuntu
      • What IS uWsgi
      • What IS WSGI
      • Two-Tier Architecture
      • Single-Tier Architecture
      • N-Tier Architecture
    • Getting Started
      • Getting Started with Django
      • Django Templates
      • Getting Started with Flask
      • Getting Started with Next JS
    • How To
      • OAUth - Get Github Secrets
      • OAUth - Get Twitter Secrets
      • Crawl Website in Python
      • Setup CentOS for coding
      • Setup Ubuntu for coding
      • Setup Windows for coding
      • Install Pyhton
      • Install Pyhton 3.8 on Ubuntu
      • Install WSL
      • Install Docker on Windows
      • Getting Started with Python
      • Django Customize Admin UI
      • Django Chart Example
      • Django - Add Debug Toolbar
      • Django MySql Database
      • Django Dockerizing App
      • Django Reset Password
      • Django Extend User Model
      • Django Integrate DRF Library
      • Django Data Tables
      • Django Simple Charts
      • Django Social Login
      • Django Stripe Payments
      • Django HEROKU Deployment
      • Django AWS Deployment
      • Django Google Cloud Deploy
      • Django Azure Deployment
      • Install Flask
      • Code a simple app In Flask
      • Structure a Flask project
      • Flask Dockerizing App
      • Flask Reset Password
      • Flask User Profile
      • Flask Integrate Flask-RestX
      • Flask Data Tables
      • Flask Simple Charts
      • Flask Social Login
      • Flask Stripe Payments
      • Flask Deploy on Apache
      • Flask Deploy on HEROKU
      • Flask AWS Deployment
      • Flask Google Cloud Deploy
      • Flask Azure Deployment
      • How to Implement JWT Authentication in React
    • Tutorials
      • Django - How to Install
      • Django For Beginners
      • Django Request Object
      • Django Forms
      • Django Auth System
      • Flask - Code a simple App
      • Minimal Programming Kit
      • Parse HTML Components
    • Programming Languages
      • Php
      • Python
    • Frameworks
      • Django
      • Flask
    • Design Systems
      • Argon Design System
      • Soft UI Design System
    • Partners
      • BootstrapDash Agency
      • Creative-Tim Agency
      • CodedThemes Agency
      • CoreUI Agency
      • Themesberg Agency
      • WrapPixel Agency
    • Bootstrap Templates
      • Adminator
      • AdminKit
      • AdminLTE
      • Azia Dashboard PRO
      • Atlantis Lite Template
      • Atlantis PRO Template
      • Argon Dashboard Template
      • Argon Dashboard PRO
      • Black Dashboard Template
      • Black Dashboard PRO
      • CoreUI Template
      • Datta Able Template
      • Datta Able PRO Template
      • Light Bootstrap Template
      • Light Bootstrap PRO
      • Material Dashboard Template
      • Material Dashboard PRO
      • Material UI Kit
      • MaterialPro Lite
      • MaterialPro Wpx
      • Now UI Dashboard
      • Now UI Dashboard PRO
      • Paper Dashboard
      • Paper Dashboard PRO
      • Pixel Lite Template
      • Pixel PRO
      • Star Admin Template
      • Soft UI Dashboard
      • Soft UI Dashboard PRO
      • Shards Dashboard
      • Volt Dashboard Template
      • Volt Dashboard Pro Template
      • Tabler Template
    • Bulma CSS Templates
      • Dashkit PRO
      • Dashkit PRO v3
    • React Templates
      • Berry React Dashboard
      • CoreUI React
    • Vue Templates
      • Vuetify Material Dashboard
    • How to Fix
      • Cannot import name 'TextField' from 'wtforms'
      • ImportError: cannot import name 'safe_str_cmp' from 'werkzeug.security'
Powered by GitBook
On this page

Was this helpful?

  1. Content
  2. How To

Crawl Website in Python

Crawl Website in Python

PreviousOAUth - Get Twitter SecretsNextSetup CentOS for coding

Last updated 4 years ago

Was this helpful?

This page explains how to use Python and extract (title) information from a LIVE website. The code provided is fairly simple and to use it we need to be comfortable using a terminal and have basic programming knowledge. Resources and libraries used:

  • A terminal window

  • Python3 installed and accessible via the terminal window

  • PIP, the official Python package manager

  • requests - a popular and simple HTTP library

  • Beautiful Soup - a library used to parse HTML and extract information with ease

  • 10minutes to understand and type the commands

Let's start writing code.

Check Python is installed

Python is installed by default in MacOS and Linux systems and should be downloaded and installed in all Windows versions. Once is properly installed, we can start the Python console by typing python in the terminal.

$ python
Python 3.8.4 (tags/v3.8.4:dfa645a, Jul 13 2020, 16:46:45) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Install libraries

  • - simple HTTP library for Python, built for human beings.

  • - Python library for pulling data out of HTML and XML files.

$ pip install requests
$ pip install BeautifulSoup4

Write code in Python Console

The first step is to import the libraries used in our code:

>>> import requests                        # import the library
>>> from bs4 import BeautifulSoup as bs    # import the library

Once the libraries are imported we can use all helpers exposed. The following code snippet defines a variable that holds the website address and download the page using requests library.

>>> site = 'https://google.com'            # define the website we want to process
>>> page = requests.get( site )            # download the page

At this point, the page should be injected and used via BeautifulSoup4.

>>> soup = bs(page.content, 'html.parser') # Parse the downloaded page with BeautifulSoup
>>> soup.title                             # Print the title   
<title>Google</title>

This simple tutorial should provoke curious minds to search other Python hot topics and try to code more useful things. We will provide a short-list with suggestions:

  • List all images of a web page

  • List the inner links (to other pages, the same domain)

  • List the outer links (external websites)

Links & Resources

- the official website

- this site should make you curious

Join - For support and production-ready starters

Request
Beautiful Soup
Python
Python Cheatsheet
AppSeed