Skip to content

matecat/xml-dom-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status license Packagist Quality Gate Status Coverage Reliability Rating Maintainability Rating

Matecat XML DOM Parser

A lightweight PHP library for parsing XML and HTML documents into traversable object structures. Built on top of PHP's DOMDocument with enhanced error handling and validation support.

Requirements

  • PHP 8.3+
  • ext-dom
  • ext-xml
  • ext-libxml

Installation

composer require matecat/xml-dom-parser

Usage

Parsing XML

use Matecat\XmlParser\XmlParser;

$xml = '<?xml version="1.0"?>
<note>
    <to>Tove</to>
    <from>Jani</from>
    <message>Hello!</message>
</note>';

$parsed = XmlParser::parse($xml);

// Access elements
echo $parsed[0]->tagName;              // "to"
echo $parsed[0]->inner_html[0]->text;  // "Tove"

Parsing XML Fragments

use Matecat\XmlParser\XmlParser;

$fragment = '<tag id="1">Content</tag><tag id="2">More content</tag>';

$parsed = XmlParser::parse($fragment, isXmlFragment: true);

echo $parsed[0]->attributes['id'];     // "1"
echo $parsed[1]->inner_html[0]->text;  // "More content"

Parsing HTML

use Matecat\XmlParser\HtmlParser;

$html = '<html><head><title>Test</title></head><body><div>Content</div></body></html>';

$parsed = HtmlParser::parse($html);

echo $parsed[0]->inner_html[0]->tagName;  // "head"
echo $parsed[0]->inner_html[1]->tagName;  // "body"

Using XmlDomLoader Directly

For more control over XML loading and validation:

use Matecat\XmlParser\XmlDomLoader;
use Matecat\XmlParser\Config;

// Basic loading
$dom = XmlDomLoader::load($xmlContent);

// With configuration
$config = new Config(
    setRootElement: 'root',        // Wrap content in a root element
    allowDocumentType: false,      // Reject DOCTYPE declarations
    xmlOptions: LIBXML_NONET,      // libxml options
    schemaOrCallable: '/path/to/schema.xsd'  // XSD validation
);

$dom = XmlDomLoader::load($xmlContent, $config);

Schema Validation

Validate XML against an XSD schema:

use Matecat\XmlParser\XmlDomLoader;
use Matecat\XmlParser\Config;

$config = new Config(schemaOrCallable: '/path/to/schema.xsd');
$dom = XmlDomLoader::load($xml, $config);

Or use a custom validation callable:

use Matecat\XmlParser\Config;
use Matecat\XmlParser\XmlDomLoader;
use DOMDocument;

$validator = function (DOMDocument $dom, bool $internalErrors): bool {
    // Custom validation logic
    return $dom->getElementsByTagName('required-element')->length > 0;
};

$config = new Config(schemaOrCallable: $validator);
$dom = XmlDomLoader::load($xml, $config);

Parsed Element Structure

Each parsed element is an object with the following properties:

Property Type Description
node string The raw XML/HTML of the element
tagName string The element's tag name
attributes array Key-value pairs of attributes
text string|null Text content (for text nodes)
self_closed bool|null Whether the element is self-closing
has_children bool|null Whether the element has child nodes
inner_html ArrayObject Child elements

Exception Handling

The library throws specific exceptions for different error conditions:

  • XmlParsingException - XML syntax errors or validation failures
  • InvalidXmlException - XML is well-formed but invalid against schema
  • DomDependecyMissingException - Required PHP extensions are missing
use Matecat\XmlParser\XmlParser;
use Matecat\XmlParser\Exception\XmlParsingException;

try {
    $parsed = XmlParser::parse('<invalid><xml>');
} catch (XmlParsingException $e) {
    echo "Parse error: " . $e->getMessage();
}

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published