class UriParser (View source)

Provides a RFC 3986 compliant solution to URL parsing.

UriParser provides a method for parsing URLs that accurately complies with the RFC specification. Unlike the built function parse_url(), the parser in this library is based on the ABNF definition of the generic URI syntax. In other words, this library does not allow any kind of invalid URLs and parses them exactly as defined in the specification.

While the intention of this library is to provide an accurate implementation for URL parsing, it possible to use this library for parsing any kind of valid URIs, since the parsing is simply based on the generic URI syntax. Some of the features are simply more suited to dealing with URLs. The parser, however, does not provide any additional validation based on the URI scheme.

While the RFC specification does not allow UTF-8 characters in URIs, these are still commonly used, especially in user input. To accommodate this fact, the parser provides two additional compatibility modes that permit UTF-8 in some of the URI components in addition to providing a simple support for international domain names.

Constants

MODE_RFC3986

Parsing mode that conforms strictly to the RFC 3986 specification

MODE_UTF8

Parsing mode that allows UTF-8 characters in some URI components

MODE_IDNA2003

Parsing mode that also converts international domain names to ascii

Methods

__construct()

Creates a new instance of UriParser.

setMode(int $mode)

Sets the parsing mode.

Uri|null
parse(string $uri)

Parses the URL using the generic URI syntax.

Details

__construct()

Creates a new instance of UriParser.

setMode(int $mode)

Sets the parsing mode.

The parser supports three different parsing modes as indicated by the available parsing mode constants. The modes are as follows:

  • MODE_RFC3986 adheres strictly to the RFC specification and does not allow any non ascii characters in the URIs. This is the default mode.

  • MODE_UTF8 allows UTF-8 characters in the user information, path, query and fragment components of the URI. These characters will be converted to appropriate percent encoded sequences.

  • MODE_IDNA2003 also allows UTF-8 characters in the domain name and converts the international domain name to ascii according to the IDNA 2003 standard.

Parameters

int $mode One of the parsing mode constants

Uri|null parse(string $uri)

Parses the URL using the generic URI syntax.

This method returns the Uri instance constructed from the components parsed from the URL. The URL is parsed using either the absolute URI pattern or the relative URI pattern based on which one matches the provided string. If the URL cannot be parsed as a valid URI, null is returned instead.

Parameters

string $uri The URL to parse

Return Value

Uri|null The parsed URL or null if the URL is invalid