🚀
AppSeed Docs
  • AppSeed
  • Support
  • App Generator
  • Boilerplate Code
    • API Servers
      • API Unified Definition
      • API Server Django
      • API Server Flask
      • API Server Node JS
    • Boilerplate Django Dashboards
    • Boilerplate Django Apps
    • Boilerplate Flask Dashboards
    • Boilerplate Flask Apps
    • Boilerplate Jinja
    • Django Templates
      • Django Template Volt
      • Django Argon Dashboard
      • Django Material Dashboard
  • Developer Tools
    • Django API Generator
    • Django Dynamic API
    • Django Dynamic DataTables
    • Flask Dynamic API
    • Flask Dynamic DataTables
    • DB Introspection Tool
  • PRODUCTS
    • Full-stack
      • Full-Stack React Purity
      • Full-Stack React Material
      • Full-Stack React Datta Able
      • Full-Stack React Berry
    • React
      • React Node JS Berry
      • React Node JS Datta Able
      • React Node JS Datta PRO
      • React Node Material Kit PRO
      • React Firebase Datta Able
      • React Firebase Datta PRO
      • Django React Datta Able
      • Django React Datta PRO
      • Flask React Datta Able
    • Flask Apps
      • Flask Material Kit
      • Flask Material Kit PRO
      • Flask Material Kit 2 PRO
      • Flask Pixel Lite
      • Flask Pixel PRO
      • Flask Soft Design
      • Flask Soft Design PRO
    • Flask Dashboards
      • Flask Adminator
      • Flask AdminLTE
      • Flask Azia Dashboard
      • Flask Argon Dashboard
      • Flask Argon PRO
      • Flask Atlantis Dark
      • Flask Atlantis Dark PRO
      • Flask Black Dashboard
      • Flask Black PRO
      • Flask CoreUI
      • Flask Datta Able
      • Flask Datta Able PRO
      • Flask Datta Able ENH
      • Flask Gradient Able
      • Flask Light Bootstrap
      • Flask Light Blue
      • Flask Material Dashboard
      • Flask Material PRO
      • Flask MaterialPro Lite
      • Flask MaterialPRO
      • Flask Material Admin
      • Flask Star Admin
      • Flask Soft Dashboard
      • Flask Soft Dashboard Pro
      • Flask Shards Dashboard
      • Flask SB Admin
      • Flask Sleek Admin
      • Flask Volt
      • Flask Volt PRO
      • Flask Tabler
    • Django Apps
      • Django Pixel Lite
      • Django Pixel PRO
      • Django Material Kit
      • Django Material Kit PRO
      • Django Material Kit 2 PRO
      • Django Soft Design
      • Django Soft Design PRO
    • Django Dashboards
      • Django Adminator
      • Django AdminLTE
      • Django Azia Dashboard
      • Django Argon Dashboard
      • Django Argon PRO
      • Django Atlantis Dark
      • Django Atlantis Dark PRO
      • Django Black Dashboard
      • Django Black PRO
      • Django CoreUI
      • Django Datta Able
      • Django Datta Able PRO
      • Django Datta Able ENH
      • Django Gradient Able
      • Django Light Bootstrap
      • Django Light Blue
      • Django Material Dashboard
      • Django Material PRO
      • Django MaterialPro Lite
      • Django MaterialPro
      • Django Star Admin
      • Django Soft Dashboard
      • Django Soft Dashboard PRO
      • Django Soft Dashboard ENH
      • Django Shards Dashboard
      • Django SB Admin
      • Django Volt
      • Django Volt PRO
      • Django Tabler
    • Eleventy 11ty
      • Soft UI Design
    • Jinja Templates
      • Jinja Argon Dashboard
      • Jinja AdminLTE
      • Jinja Atlantis Lite
      • Jinja Black Dashboard
      • Jinja Datta Able PRO
      • Jinja Dashkit v3
      • Jinja Light Bootstrap
      • Jinja Material Dashboard
      • Jinja Now UI Dashboard
      • Jinja Paper Dashboard
      • Jinja Pixel Lite
      • Jinja Pixel PRO
      • Jinja SB Admin
      • Jinja Volt Bootstrap 5
      • Jinja Volt PRO
  • Features
    • Flask
      • Extended User Model
  • Free Samples
    • Django Samples
      • Django Create Model
      • Django User Profile
      • Django Sales Charts
      • Django Routing Sample
      • Django API Server
  • Technologies
    • Django
      • Getting Started
      • Django v4 Release
    • FastAPI
      • Getting Started
    • Flask
      • Getting Started
      • Flask v2.0 Release
    • MySql
      • Install MySql on Windows
      • Install MySql on Ubuntu
    • Material UI
      • Getting Started
    • Next JS
      • Getting Started
      • Sample Next.js Project
      • Deploy a Next.js application on Netlify
      • Best Practices for NextJS projects
    • Python
      • Getting Started
    • Supabase
      • Getting Started
  • Deployment
    • Google Cloud Platform
      • Deploy Django on GCP
    • App Platform by DO
      • App Platform - Deploy Django
    • AWS - Amazon Services
      • Deploy Django on AWS
    • Render Deployment Platform
      • Deploy Flask to Render
      • Deploy Django to Render
      • Deploy React to Render
      • Deploy Django & React to Render
  • Content
    • What IS
      • What IS Apache
      • What IS Bulma
      • What IS Boilerplate Code
      • What IS CentOS
      • What IS Django
      • What IS Eleventy
      • What IS Flask
      • What IS Gunicorn
      • What IS Heroku
      • What IS JAMStack
      • What IS Jinja
      • What IS Nginx
      • What IS Panini
      • What IS Pipenv
      • What IS Python
      • What IS React
      • What IS Static Site
      • What IS Svelte
      • What IS Ubuntu
      • What IS uWsgi
      • What IS WSGI
      • Two-Tier Architecture
      • Single-Tier Architecture
      • N-Tier Architecture
    • Getting Started
      • Getting Started with Django
      • Django Templates
      • Getting Started with Flask
      • Getting Started with Next JS
    • How To
      • OAUth - Get Github Secrets
      • OAUth - Get Twitter Secrets
      • Crawl Website in Python
      • Setup CentOS for coding
      • Setup Ubuntu for coding
      • Setup Windows for coding
      • Install Pyhton
      • Install Pyhton 3.8 on Ubuntu
      • Install WSL
      • Install Docker on Windows
      • Getting Started with Python
      • Django Customize Admin UI
      • Django Chart Example
      • Django - Add Debug Toolbar
      • Django MySql Database
      • Django Dockerizing App
      • Django Reset Password
      • Django Extend User Model
      • Django Integrate DRF Library
      • Django Data Tables
      • Django Simple Charts
      • Django Social Login
      • Django Stripe Payments
      • Django HEROKU Deployment
      • Django AWS Deployment
      • Django Google Cloud Deploy
      • Django Azure Deployment
      • Install Flask
      • Code a simple app In Flask
      • Structure a Flask project
      • Flask Dockerizing App
      • Flask Reset Password
      • Flask User Profile
      • Flask Integrate Flask-RestX
      • Flask Data Tables
      • Flask Simple Charts
      • Flask Social Login
      • Flask Stripe Payments
      • Flask Deploy on Apache
      • Flask Deploy on HEROKU
      • Flask AWS Deployment
      • Flask Google Cloud Deploy
      • Flask Azure Deployment
      • How to Implement JWT Authentication in React
    • Tutorials
      • Django - How to Install
      • Django For Beginners
      • Django Request Object
      • Django Forms
      • Django Auth System
      • Flask - Code a simple App
      • Minimal Programming Kit
      • Parse HTML Components
    • Programming Languages
      • Php
      • Python
    • Frameworks
      • Django
      • Flask
    • Design Systems
      • Argon Design System
      • Soft UI Design System
    • Partners
      • BootstrapDash Agency
      • Creative-Tim Agency
      • CodedThemes Agency
      • CoreUI Agency
      • Themesberg Agency
      • WrapPixel Agency
    • Bootstrap Templates
      • Adminator
      • AdminKit
      • AdminLTE
      • Azia Dashboard PRO
      • Atlantis Lite Template
      • Atlantis PRO Template
      • Argon Dashboard Template
      • Argon Dashboard PRO
      • Black Dashboard Template
      • Black Dashboard PRO
      • CoreUI Template
      • Datta Able Template
      • Datta Able PRO Template
      • Light Bootstrap Template
      • Light Bootstrap PRO
      • Material Dashboard Template
      • Material Dashboard PRO
      • Material UI Kit
      • MaterialPro Lite
      • MaterialPro Wpx
      • Now UI Dashboard
      • Now UI Dashboard PRO
      • Paper Dashboard
      • Paper Dashboard PRO
      • Pixel Lite Template
      • Pixel PRO
      • Star Admin Template
      • Soft UI Dashboard
      • Soft UI Dashboard PRO
      • Shards Dashboard
      • Volt Dashboard Template
      • Volt Dashboard Pro Template
      • Tabler Template
    • Bulma CSS Templates
      • Dashkit PRO
      • Dashkit PRO v3
    • React Templates
      • Berry React Dashboard
      • CoreUI React
    • Vue Templates
      • Vuetify Material Dashboard
    • How to Fix
      • Cannot import name 'TextField' from 'wtforms'
      • ImportError: cannot import name 'safe_str_cmp' from 'werkzeug.security'
Powered by GitBook
On this page
  • The process
  • Resources

Was this helpful?

  1. Content
  2. Tutorials

Parse HTML Components

This page explains how to parse and extract information from a page (local or remote).

PreviousMinimal Programming KitNextProgramming Languages

Last updated 3 years ago

Was this helpful?

Parsing HTML and extract the relevant information is something we can use in many contexts: scan a page for a price change, extract a component, detect the broken links .. etc.

AppSeed, in particular, uses HTML parsing for two things:

  • Page structure detection

  • Component extraction

For newcomers, AppSeed uses automation tools to convert lifeless UI Kits into simple starters generated in many frameworks and patterns. For instance, this open-source design - provided by Themesberg has been translated to and using HTML parsing as the first phase of the translation process.

Required libraries and tools

  • - interpreter

  • - a well-known parsing library

  • - used to compensate BS4 limitations

The process

The flow explained in this article will execute a few simple steps:

  • Load the HTML content - this can be done from a local file or using a LIVE website

  • Analyze the page and extract XPATH expression for a component

  • Use Lxml library to extract the HTML

  • Format the component and save it on disk

Install libraries via PIP

$ pip install requests 
$ pip install lxml
$ pip install beautifulsoup4

From this point, all the code is typed using a python console

$ python [ENTER]
>>>

Load the content from local file

>>> f = open('./app/templates/index.html','r')
>>> html_page = f.read()
>>> import requests
>>> page = requests.get('https://demo.themesberg.com/pixel-lite/index.html')
>>> html_page = page.content

At this point html_page variable contains the entire HTML content (string type) and we can use it in BS4 or Lxml to extract the components. To visualize the page structure we can use browser tools:

The target component will be extracted using an XPATH expression provided by the browser:

//*[@id="features"]

To extract the component, this XPATH expression will be used in Lxml library to isolate the code.

>>> from lxml import html
>>> html_dom = html.fromstring( html_page )
>>> component = html_dom.xpath( '//*[@id="features"]' )
  

To extract the raw HTML from the component object we need to use tostring helper exposed by Lxml library:

>>> from lxml.etree import tostring
>>> component_html = tostring( component[0] )

The next step is to call Beautiful soup and prettify the HML for saving on disk

>>> from bs4 import BeautifulSoup as bs
>>> soup = bs( component_html )
>>> soup.prettify()

The component is fully extracted and parsable:

  <section class="section section-lg pb-0" id="features">
   <div class="container">
    <div class="row">
     
     ...
     
     <div class="col-12 col-md-4">
      <div class="icon-box text-center mb-5 mb-md-0">
       <div class="icon icon-shape icon-lg bg-white shadow-lg border-light rounded-circle icon-secondary mb-3">
        <span class="fas fa-box-open">
        </span>
       </div>
       <h2 class="my-3 h5">
        80 components
       </h2>
       <p class="px-lg-4">
        Beatifully crafted and creative components made with great care for each pixel
       </p>
      </div>
     </div>
     
     ...
     
     </div>
    </div>
   </div>
  </section>

The rendered version:

Resources

Load content from remote HTML file (the )

- related article published on StackOverflow

- the right way (with sample)

- StackOverflow article

Pixel Lite
Flask
Django
Python
Beautiful Soup
Lxml
LIVE sample
Use XPath in Beautiful Soup
Web Scraping
How to get the content from Lxml object
HTML Parser - Target Component Inspection.
HTML Parser - Extracted Component.