A lightweight PHP library for parsing XML and HTML documents into traversable object structures. Built on top of PHP's DOMDocument with enhanced error handling and validation support.
- PHP 8.3+
- ext-dom
- ext-xml
- ext-libxml
composer require matecat/xml-dom-parseruse Matecat\XmlParser\XmlParser;
$xml = '<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<message>Hello!</message>
</note>';
$parsed = XmlParser::parse($xml);
// Access elements
echo $parsed[0]->tagName; // "to"
echo $parsed[0]->inner_html[0]->text; // "Tove"use Matecat\XmlParser\XmlParser;
$fragment = '<tag id="1">Content</tag><tag id="2">More content</tag>';
$parsed = XmlParser::parse($fragment, isXmlFragment: true);
echo $parsed[0]->attributes['id']; // "1"
echo $parsed[1]->inner_html[0]->text; // "More content"use Matecat\XmlParser\HtmlParser;
$html = '<html><head><title>Test</title></head><body><div>Content</div></body></html>';
$parsed = HtmlParser::parse($html);
echo $parsed[0]->inner_html[0]->tagName; // "head"
echo $parsed[0]->inner_html[1]->tagName; // "body"For more control over XML loading and validation:
use Matecat\XmlParser\XmlDomLoader;
use Matecat\XmlParser\Config;
// Basic loading
$dom = XmlDomLoader::load($xmlContent);
// With configuration
$config = new Config(
setRootElement: 'root', // Wrap content in a root element
allowDocumentType: false, // Reject DOCTYPE declarations
xmlOptions: LIBXML_NONET, // libxml options
schemaOrCallable: '/path/to/schema.xsd' // XSD validation
);
$dom = XmlDomLoader::load($xmlContent, $config);Validate XML against an XSD schema:
use Matecat\XmlParser\XmlDomLoader;
use Matecat\XmlParser\Config;
$config = new Config(schemaOrCallable: '/path/to/schema.xsd');
$dom = XmlDomLoader::load($xml, $config);Or use a custom validation callable:
use Matecat\XmlParser\Config;
use Matecat\XmlParser\XmlDomLoader;
use DOMDocument;
$validator = function (DOMDocument $dom, bool $internalErrors): bool {
// Custom validation logic
return $dom->getElementsByTagName('required-element')->length > 0;
};
$config = new Config(schemaOrCallable: $validator);
$dom = XmlDomLoader::load($xml, $config);Each parsed element is an object with the following properties:
| Property | Type | Description |
|---|---|---|
node |
string |
The raw XML/HTML of the element |
tagName |
string |
The element's tag name |
attributes |
array |
Key-value pairs of attributes |
text |
string|null |
Text content (for text nodes) |
self_closed |
bool|null |
Whether the element is self-closing |
has_children |
bool|null |
Whether the element has child nodes |
inner_html |
ArrayObject |
Child elements |
The library throws specific exceptions for different error conditions:
XmlParsingException- XML syntax errors or validation failuresInvalidXmlException- XML is well-formed but invalid against schemaDomDependecyMissingException- Required PHP extensions are missing
use Matecat\XmlParser\XmlParser;
use Matecat\XmlParser\Exception\XmlParsingException;
try {
$parsed = XmlParser::parse('<invalid><xml>');
} catch (XmlParsingException $e) {
echo "Parse error: " . $e->getMessage();
}This project is licensed under the MIT License.