PHP Web Scraping
This list contains PHP libraries related to web scraping and data processing
PHP Web Scraping
Network
Web-scraping Frameworks
HTML/XML Parsing
Text processing
Specific Formats Processing
Natural Language Processing
Browser automation and emulation
Multiprocessing
Queue
Cloud Computing
Email
URL Manipulation
Web Content Extracting
Asynchronous
WebSocket
DNS Resolving
Computer Vision
Geocoding
API Clients
Other PHP Lists
Network
Guzzle - A comprehensive HTTP client.
Buzz - Another HTTP client.
Requests - A simple HTTP library.
HTTPFul - A chainable HTTP client.
Goutte - A simple web scraper.
PHP Spider - A comprehensive web spider.
Web-Scraping Frameworks
Crawler - (crwlr) - Library for Rapid (Web) Crawler and Scraper Development
Roach - It is port of the popular Scrapy package for Python. Include adapter to Laravel and Symfony
HTML/XML Parsing
HTML5 PHP - An HTML5 parser and serializer library.
QueryPath - a jQuery-like library for working with XML and HTML documents in PHP. It now contains support for HTML5 via the HTML5-PHP project.
DiDOM - super fast HTML parser (because it was build on top of plain PHP).
PHPScraper - an highly opinionated web-interface.
DomCrawler - (Symfony) - The DomCrawler component eases DOM navigation for HTML and XML documents.
Text Processing
Libraries for parsing and manipulating plain texts.
General
ANSI to HTML5 - An ANSI to HTML5 converter library.
Patchwork UTF-8 - A portable library for working with UTF-8 strings.
Hoa String - Another UTF-8 string library.
Stringy - A string manipulation library with multibyte support.
Color Jizz - A library for manipulating and converting colours.
Text - A text manipulation library.
Flux - A regular expression building library.
User-agent
Device Detector - Another library for parsing user agent strings.
Mobile-Detect - A lightweight PHP class for detecting mobile devices (including tablets).
UA Parser - A library for parsing user agent strings.
Unites of measure
ByteUnits - A library to parse, format and convert byte units in binary and metric systems.
PHP Units of Measure - A library for converting between units of measure.
PHP Conversion - Another library for converting between units of measure.
Phone number
LibPhoneNumber for PHP - A PHP implementation of Google's phone number handling library.
Specific Formats Processing
Libraries for parsing and manipulating specific text formats.
CSV
CSV - A CSV data manipulation library.
Office
PHPWord - A library for working with Microsoft Word documents.
PHPExcel - A library for working with Microsoft Excel documents.
PHPPowerPoint - A library for working with Microsoft PowerPoint documents.
ExcelAnt - A library for manipulating Microsoft Excel documents.
Markdown
PHP Markdown - A Markdown parser.
CommonMark PHP - A Markdown parser which supports the full CommonMark spec.
Parsedown - Another Markdown parser.
Ciconia - Another Markdown parser that supports Github flavoured Markdown.
Cebe Markdown - An fast and extensible Markdown parser.
BBCode
Decoda - A lightweight lexical string parser for BBCode styled markup.
JSON
JsonMapper - A library that maps nested JSON structures onto PHP classes.
vCard
vobject - The VObject library allows you to easily parse and manipulate iCalendar and vCard objects.
File Type Detection
Hoa Mime - Another MIME detection library.
Canal - A library to determine internet media types.
Apache MIME Types - A library that parses Apache MIME types.
GeoJSON
GeoJSON - A GeoJSON implementation.
Natural Language Processing
Libraries for working with human languages.
PHP NlpTools - Natural Language Processing Tools in PHP
nlpTools - Natural Language Processing Toolkit for PHP
Browser automation and emulation
php-webdriver - A php client for webdriver.
PHP PhantomJS - Execute PhantomJS commands through PHP
Mink - universal API for multiple browser emulators (selenium, zombie.js, goutte)
Multiprocessing
Spork - A process forking library.
Asynchronous
Libraries for asynchronous networking programming.
React - An event driven non-blocking I/O library.
Rx.PHP - A reactive extension library.
Hoa EventSource - An event source library.
Evenement - An event dispatcher library.
Event - An event library with a focus on domain events.
Broadway - An event source and CQRS library.
Queue
Pheanstalk - A Beanstalkd client library.
PHP AMQP - A pure PHP AMQP library.
Thumper - A RabbitMQ pattern library.
Bernard - A multibackend abstraction library.
Cloud Computing
TODO
Email
Libraries for parsing email.
Email Reply Parser - An email reply parser library.
Email Validator - A small email address validation library.
URL Manipulation
Libraries for parsing URLs.
Purl - A URL manipulation library.
PHP Domain Parser - A domain suffix parser library.
Uri (The PHP League) - A simple URL manipulation library (PSR-7 compatible).
Url (crwlr) - Swiss Army knife for urls.
Web Content Extracting
Video
Youtube-Downloader - PHP script for downloading videos from youtube; also parsing youtube feed into RSS enclosures for podcatchers
WebSocket
Libraries for working with WebSocket.
Ratchet - A web socket library.
Hoa WebSocket - Another web socket library.
Elephant.io - Yet another web socket library.
DNS Resolving
Net_DNS2 - Native PHP DNS Resolver and Updater
Computer Vision
OpenCV-for-PHP - An OpenCV binding for PHP
Geocoding
Other PHP lists
Last updated