XML Parsers¶
-
template<typename
_Handler
, typename_Config
= sax_parser_default_config>
classorcus
::
sax_parser
: public orcus::sax::parser_base¶ Template-based sax parser that doesn’t use function pointer for callbacks for better performance, especially on large XML streams.
Public Functions
-
sax_parser
(const char *content, const size_t size, handler_type &handler)¶
-
sax_parser
(const char *content, const size_t size, bool transient_stream, handler_type &handler)¶
-
~sax_parser
()¶
-
void
parse
()¶
-
-
template<typename
_Handler
>
classorcus
::
sax_ns_parser
¶ SAX based XML parser with proper namespace handling.
Public Functions
-
sax_ns_parser
(const char *content, const size_t size, xmlns_context &ns_cxt, handler_type &handler)¶
-
sax_ns_parser
(const char *content, const size_t size, bool transient_stream, xmlns_context &ns_cxt, handler_type &handler)¶
-
~sax_ns_parser
()¶
-
void
parse
()¶
-
-
template<typename
_Handler
>
classorcus
::
sax_token_parser
¶ XML parser that tokenizes element and attribute names while parsing.
Public Functions
-
sax_token_parser
(const char *content, const size_t size, const tokens &_tokens, xmlns_context &ns_cxt, handler_type &handler)¶
-
sax_token_parser
(const char *content, const size_t size, bool transient_stream, const tokens &_tokens, xmlns_context &ns_cxt, handler_type &handler)¶
-
~sax_token_parser
()¶
-
void
parse
()¶
-
Parser Handlers¶
-
class
orcus
::
sax_handler
¶ Public Functions
-
void
doctype
(const orcus::sax::doctype_declaration ¶m)¶ Called when a doctype declaration <!DOCTYPE … > is encountered.
- Parameters
param
: struct containing doctype declaration data.
-
void
start_declaration
(const orcus::pstring &decl)¶ Called when <?… is encountered, where the ‘…’ may be an arbitraray dentifier. One common declaration is <?xml which is typically given at the start of an XML stream.
- Parameters
decl
: name of the identifier.
-
void
end_declaration
(const orcus::pstring &decl)¶ Called when the closing tag (>) of a <?… ?> is encountered.
- Parameters
decl
: name of the identifier.
-
void
start_element
(const orcus::sax::parser_element &elem)¶ Called at the start of each element.
- Parameters
elem
: information of the element being parsed.
-
void
end_element
(const orcus::sax::parser_element &elem)¶ Called at the end of each element.
- Parameters
elem
: information of the element being parsed.
-
void
characters
(const orcus::pstring &val, bool transient)¶ Called when a segment of a text content is parsed. Each text content is a direct child of an element, which may have multiple child contents when the element also has a child element that are direct sibling to the text contents or the text contents are splitted by a comment.
- Parameters
val
: value of the text content.transient
: when true, the text content has been converted and is stored in a temporary buffer due to presence of one or more encoded characters, in which case the passed text value needs to be either immediately converted to a non-text value or be interned within the scope of the callback.
-
void
attribute
(const orcus::sax::parser_attribute &attr)¶ Called upon parsing of an attribute of an element. Note that when the attribute’s transient flag is set, the attribute value is stored in a temporary buffer due to presence of one or more encoded characters, and must be processed within the scope of the callback.
- Parameters
attr
: struct containing attribute information.
-
void
-
class
orcus
::
sax_ns_handler
¶
-
class
orcus
::
sax_token_handler
¶ Public Functions
-
void
declaration
(const orcus::xml_declaration_t &decl)¶ Called immediately after the entire XML declaration has been parsed.
- Parameters
decl
: struct containing the attributes of the XML declaration.
-
void
start_element
(const orcus::xml_token_element_t &elem)¶ Called at the start of each element.
- Parameters
elem
: struct containing the element’s information as well as all the attributes that belong to the element.
-
void
end_element
(const orcus::xml_token_element_t &elem)¶ Called at the end of each element.
- Parameters
elem
: struct containing the element’s information as well as all the attributes that belong to the element.
-
void
characters
(const orcus::pstring &val, bool transient)¶ Called when a segment of a text content is parsed. Each text content is a direct child of an element, which may have multiple child contents when the element also has a child element that are direct sibling to the text contents or the text contents are splitted by a comment.
- Parameters
val
: value of the text content.transient
: when true, the text content has been converted and is stored in a temporary buffer due to presence of one or more encoded characters, in which case the passed text value needs to be either immediately converted to a non-text value or be interned within the scope of the callback.
-
void
Namespace¶
-
class
orcus
::
xmlns_repository
¶ Central XML namespace repository that stores all namespaces that are used in the current session.
Public Functions
-
xmlns_repository
()¶
-
~xmlns_repository
()¶
-
void
add_predefined_values
(const xmlns_id_t *predefined_ns)¶ Add a set of predefined namespace values to the repository.
- Parameters
predefined_ns
: predefined set of namespace values. This is a null-terminated array of xmlns_id_t. This xmlns_repository instance will assume that the instances of these xmlns_id_t values will be available throughout its life cycle; caller needs to ensure that they won’t get deleted before the corresponding xmlns_repository instance is deleted.
-
xmlns_context
create_context
()¶
-
xmlns_id_t
get_identifier
(size_t index) const¶ Get XML namespace identifier from its numerical index.
- Return
valid namespace identifier, or XMLNS_UNKNOWN_ID if not found.
- Parameters
index
: numeric index of namespace.
-
std::string
get_short_name
(xmlns_id_t ns_id) const¶
-
std::string
get_short_name
(size_t index) const¶
-
-
class
orcus
::
xmlns_context
¶ XML namespace context. A new context should be used for each xml stream since the namespace keys themselves are not interned. Don’t hold an instance of this class any longer than the life cycle of the xml stream it is used in.
An empty key value is associated with a default namespace.
Public Functions
-
xmlns_context
(const xmlns_context &r)¶
-
~xmlns_context
()¶
-
xmlns_id_t
push
(const pstring &key, const pstring &uri)¶
-
xmlns_id_t
get
(const pstring &key) const¶ Get the currnet namespace identifier for a specified namespace alias.
- Return
current namespace identifier associated with the alias.
- Parameters
key
: namespace alias to get the current namespace identifier for.
-
size_t
get_index
(xmlns_id_t ns_id) const¶ Get a unique index value associated with a specified identifier. An index value is guaranteed to be unique regardless of contexts.
- Return
index value associated with the identifier.
- Parameters
ns_id
: a namespace identifier to obtain index for.
-
std::string
get_short_name
(xmlns_id_t ns_id) const¶ Get a ‘short’ name associated with a specified identifier. A short name is a string value conveniently short enough for display purposes, but still guaranteed to be unique to the identifier it is associated with.
Note that the xmlns_repository class has method of the same name, and that method works identically to this method.
- Return
short name for the specified identifier.
- Parameters
ns_id
: a namespace identifier to obtain short name for.
-
pstring
get_alias
(xmlns_id_t ns_id) const¶ Get an alias currently associated with a given namespace identifier.
- Return
alias name currently associted with the given namespace identifier, or an empty string if the given namespace is currently not associated with any aliases.
- Parameters
ns_id
: namespace identifier.
-
std::vector<xmlns_id_t>
get_all_namespaces
() const¶
-
void
dump
(std::ostream &os) const¶
-