PHP Web Scraping
This list contains PHP libraries related to web scraping and data processing
PHP Web Scraping
Web-scraping Frameworks
HTML/XML Parsing
Text processing
Specific Formats Processing
Natural Language Processing
Browser automation and emulation
Cloud Computing
URL Manipulation
Web Content Extracting
DNS Resolving
Computer Vision
API Clients
Other PHP Lists
Guzzle - A comprehensive HTTP client.
Buzz - Another HTTP client.
Requests - A simple HTTP library.
HTTPFul - A chainable HTTP client.
Goutte - A simple web scraper.
PHP Spider - A comprehensive web spider.
Web-Scraping Frameworks
Crawler - (crwlr) - Library for Rapid (Web) Crawler and Scraper Development
Roach - It is port of the popular Scrapy package for Python. Include adapter to Laravel and Symfony
HTML/XML Parsing
HTML5 PHP - An HTML5 parser and serializer library.
QueryPath - a jQuery-like library for working with XML and HTML documents in PHP. It now contains support for HTML5 via the HTML5-PHP project.
DiDOM - super fast HTML parser (because it was build on top of plain PHP).
PHPScraper - an highly opinionated web-interface.
DomCrawler - (Symfony) - The DomCrawler component eases DOM navigation for HTML and XML documents.
Text Processing
Libraries for parsing and manipulating plain texts.
ANSI to HTML5 - An ANSI to HTML5 converter library.
Patchwork UTF-8 - A portable library for working with UTF-8 strings.
Hoa String - Another UTF-8 string library.
Stringy - A string manipulation library with multibyte support.
Color Jizz - A library for manipulating and converting colours.
Text - A text manipulation library.
Flux - A regular expression building library.
Device Detector - Another library for parsing user agent strings.
Mobile-Detect - A lightweight PHP class for detecting mobile devices (including tablets).
UA Parser - A library for parsing user agent strings.
Unites of measure
ByteUnits - A library to parse, format and convert byte units in binary and metric systems.
PHP Units of Measure - A library for converting between units of measure.
PHP Conversion - Another library for converting between units of measure.
Phone number
LibPhoneNumber for PHP - A PHP implementation of Google's phone number handling library.
Specific Formats Processing
Libraries for parsing and manipulating specific text formats.
CSV - A CSV data manipulation library.
PHPWord - A library for working with Microsoft Word documents.
PHPExcel - A library for working with Microsoft Excel documents.
PHPPowerPoint - A library for working with Microsoft PowerPoint documents.
ExcelAnt - A library for manipulating Microsoft Excel documents.
PHP Markdown - A Markdown parser.
CommonMark PHP - A Markdown parser which supports the full CommonMark spec.
Parsedown - Another Markdown parser.
Ciconia - Another Markdown parser that supports Github flavoured Markdown.
Cebe Markdown - An fast and extensible Markdown parser.
Decoda - A lightweight lexical string parser for BBCode styled markup.
JsonMapper - A library that maps nested JSON structures onto PHP classes.
vobject - The VObject library allows you to easily parse and manipulate iCalendar and vCard objects.
File Type Detection
Hoa Mime - Another MIME detection library.
Canal - A library to determine internet media types.
Apache MIME Types - A library that parses Apache MIME types.
GeoJSON - A GeoJSON implementation.
Natural Language Processing
Libraries for working with human languages.
PHP NlpTools - Natural Language Processing Tools in PHP
nlpTools - Natural Language Processing Toolkit for PHP
Browser automation and emulation
php-webdriver - A php client for webdriver.
PHP PhantomJS - Execute PhantomJS commands through PHP
Mink - universal API for multiple browser emulators (selenium, zombie.js, goutte)
Spork - A process forking library.
Libraries for asynchronous networking programming.
React - An event driven non-blocking I/O library.
Rx.PHP - A reactive extension library.
Hoa EventSource - An event source library.
Evenement - An event dispatcher library.
Event - An event library with a focus on domain events.
Broadway - An event source and CQRS library.
Pheanstalk - A Beanstalkd client library.
PHP AMQP - A pure PHP AMQP library.
Thumper - A RabbitMQ pattern library.
Bernard - A multibackend abstraction library.
Cloud Computing
Libraries for parsing email.
Email Reply Parser - An email reply parser library.
Email Validator - A small email address validation library.
URL Manipulation
Libraries for parsing URLs.
Purl - A URL manipulation library.
PHP Domain Parser - A domain suffix parser library.
Uri (The PHP League) - A simple URL manipulation library (PSR-7 compatible).
Url (crwlr) - Swiss Army knife for urls.
Web Content Extracting
Youtube-Downloader - PHP script for downloading videos from youtube; also parsing youtube feed into RSS enclosures for podcatchers
Libraries for working with WebSocket.
Ratchet - A web socket library.
Hoa WebSocket - Another web socket library. - Yet another web socket library.
DNS Resolving
Net_DNS2 - Native PHP DNS Resolver and Updater
Computer Vision
OpenCV-for-PHP - An OpenCV binding for PHP
Other PHP lists
Last updated