eXtensible Markup Language Explanations

According to Abbreviationfinder, XML stands for eXtensible Markup Language.

Rules for XML documents

The XML specifications require a “parser” [1] that rejects documents that do not meet the basic rules. Many HTML parsers accept complicated or unsound markup by making “guesswork” about what the document is trying to say.

To avoid the mix of structures found in HTML documents, the creators of XML decided to reinforce the language in terms of the structure of the document, leaving the way it is displayed to other technologies.

For XML there are three types of documents: invalid, valid and well formed.

  • Invalid documents do not follow the syntax rules defined by the XML specification. If a developer has defined rules of what that document can contain in a DTD (Document Type Definition) or Schema, and the document does not follow them, that document is invalid.
  • Valid documents follow both the XML syntax rules and the rules defined in their own DTD or Schema.
  • Well-formed document follows XML syntax rules, but does not have a Schema or DTD.

In the definition of whether or not the document is properly conformed, there are definitions in which a decision must be made and the method is going to be used: DTD or Schema.

Document Type Definition or DTD: defines the elements that can appear in an XML document, the order in which they can appear, how they can be nested and other basic details of the structure of the XML document. DTDs are part of the original XML specification and are very similar to SGML DTDs.

XML Schema (XML Schema): You can define all the document structures that could be defined with DTD and also, you can define data types and rules much more complicated than those that can be done with DTD. The W3C developed the XML Schema specification two years after the original XML specification.

DTD example:

<! ELEMENT address (name, street, city, state, zip-code)>
<! ELEMENT first name (title? First name, last name)>
<! ELEMENT title (#PCDATA)>
<! ELEMENT first name (#PCDATA)>
<! ELEMENT surname (#PCDATA)>
<! ELEMENT street (#PCDATA)>
<! ELEMENT city (#PCDATA)>
<! ELEMENT state (#PCDATA)>
<! ELEMENT zip-code (#PCDATA)>

As can be seen in the example, the DTD defines the structure of the XML document without leaving room for doubt as to what place each of the data it contains occupies, which is determined through the elements and attributes.

Example XML Schema (Schema):

<? xml version = “1.0” encoding = “UTF-8″?>
<xsd: schema xmlns: xsd = ” http://www.w3.org/2001/XMLSchema “>
<xsd: element name = “address” >
<xsd: complexType>
<xsd: sequence>
<xsd: element ref = “name” />
<xsd: element ref = “street” />
<xsd: element ref = “city” />
<xsd: element ref = “status” />
<xsd: element ref = “zip-code” />
</ xsd: sequence>
</ xsd: complexType>
</ xsd: element>

<xsd: element name = “name”>
<xsd: complexType>
<xsd: sequence>
<xsd: element ref = “title” minOccurs = “0” />
<xsd: element ref = “name” />
<xsd: element ref = “last name” />
</ xsd: sequence>
</ xsd: complexType>
</ xsd: element>

<xsd: element name = “title” type = “xsd: string” />
<xsd: element name = “name” type = “xsd: string” />
<xsd: element name = “last name” type = “xsd: string “/>
<xsd: element name =” street “type =” xsd: string “/>
<xsd: element name =” city “type =” xsd: string “/>

<xsd: element name = “status”>

<xsd: simpleType>
<xsd: restriction base = “xsd: string”>
<xsd: length value = “2” />
</ xsd: restriction>
</ xsd: simpleType>
</ xsd: element>

<xsd: element name = “zip-code”>
<xsd: simpleType>
<xsd: restriction base = “xsd: string”>
<xsd: pattern value = “[0-9] {5} (- [0-9 ] {4})? “/>
</ Xsd: restriction>
</ xsd: simpleType>
</ xsd: element>

</ xsd: schema>

The schema is much larger in length than the DTD, but it clearly expresses which documents will be valid in the XML scope, since the characteristics and restrictions of the data that can be entered are added.

Characteristics

  • It allows the creation of your own labels and allows you to assign attributes to the labels.
  • Structure and layout in an XML document are completely separate.
  • XML is stored in text format (not binary) which makes the documents directly understandable. That is, the documents have a structure that is understandable by both computers and people.
  • Each document includes metadata about itself, which makes it easier for search engines on the web, since they will return more adequate and precise answers.
  • It allows exportability to other data publication formats (HTML, PDF, RTF rich text, among others).
  • XML is an open standard not subject to any type of license
  • XML allows internationalization, that is, it can work with any set of characters, including the UNICODE character set (utf-8).
  • XML uses specific generation rules and therefore documents are easily actionable.
  • XML allows information to be shared between systems or heterogeneous data sources, for example, web pages, different databases, among others.

XML in content management

The term content management is used to refer to the application of a series of techniques and tools for the coding, storage and distribution of publications in digital format.

It is in this area where the use of markup languages ​​- (initially SGML and later XML) – has been a constant due to its open nature, the independence of providers and specific hardware / software platforms, and the possibility of reusing the same content. in multiple products and publications.

The content management application integrates tools for maintaining and managing a website, easily updating the contents without having to know the details of the HTML encoding or the physical location of the pages on the web server. Features that these applications typically incorporate include:

  • Maintenance of the physical and logical structure of the site
  • Creating new content and editing existing content using templates, usually through a web browser.
  • Automatic maintenance of site navigation and hyperlinks between pages.
  • Approval, review and validation of the contents until they are made public on the website.
  • Validity periods of the contents.
  • Control of changes and revisions.
  • Share content between different pages (connected pages).

The existence on the web of the dynamic Pages constitutes an element of pressure regarding the use of XML, since they are generated from a presentation format defined by the user’s needs, but always maintaining the content.

XML can be used as a content storage base through native XML databases, which store and manage a collection of XML documents without performing any type of prior transformation.

In this model, the XML document is the main information storage unit. Documents are stored with a textual nature, which implies flexibility.

XML can also be used as a model for metadata representation. The advantages over other alternatives are found in its orientation towards the Internet, the ease of its exchange and subsequent processing using a single common syntax, and the option of combining and interleaving the metadata within the full text of the documents.

However, this requires the need for an indexing and retrieval system that allows to discriminate documents based on the content of specific elements or attributes.

Finally, XML is a means of exchange and integration of content due to its facilitating role in the process. Al is not only a format to encode texts and documents, but a set of specifications to establish the way in which a text can be processed and presented.

Specifications such as XSLT, DOM or XPath make it possible to process XML documents based on different vocabularies and through various programming languages ​​(Visual Basic, Java, etc.), using a common, standard and clearly documented model.

The possibility of obtaining XML documents through the network and processing them with ease for their integration in repositories and databases or their visualization as part of a website, offers extreme flexibility and opens the doors to any type of integration.

References

  1. go back up↑ Parser is a piece of code that tries to read a document and interpret its content.
  2. go back up↑ This practice is usually known by the Anglo-Saxon term “single sourcing”, and refers to the processes that allow obtaining different publications or information products from a single content repository through automated procedures.

eXtensible Markup Language