Introduction to XML workshop

Introduction to XML workshop
5 Oct 2005
TTU Advanced Technology Learning Center
Taught by Bosah Chukwuogo

All markup languages have similar characteristics

XML is eXtensible Markup Language
- extensible: capable of being extended
- markup language: a system for marking or tagging data to indicate its logical structure
- it is just a markup language

XML marks data to indicate its logical structure



Handouts, PowerPoint Slides


Student1, Student2, Student3, etc

HTML is built like XML

1. XML was born out of the need to to describe data in clear text so that it can be interpreted across different languages and environments, and easily transported
2. Since this data must work in different incompatible platforms, it must independently describe itself
3. SGML was used by book publishers to markup books, that is used in theses, without SGML or another formatting language you can’t create those massive encyclopedias
4. XML is now a universal standard

Example 1 showing logical structure for describing a list of authors

< ?xml version="1.0" ?>


Wesley
Albert
Fryer


Alexander
Radford
Fryer

As long as a file has the XML declaration at the front, it is an XML file (regardless of extension)
- declaration can only appear once in a file, and it must be closed

Elements in an XML file are just entities: markup entities

Everything has to be properly tagged: opening and closing tags, to be compliant with XML standards
- this is not negotiable

Child and parent relationships apply to the entities that you are creating
- this is like indenting in unordered and ordered lists in a word processor

In this example:
Jim
This is opening tag – element value – closing tag

XML elements can also have attributes, provides additional information about the elements. Example:

< ?xml version=”1.0”?>


Chevrolet
Corvette

White 1999


Dodge
Ram

Red 2000

These attributes play a very important role when you are describing complex relationships

Before you just had websites that had information, people had to come to your website and see what you had in there
- idea behind RSS, each time your website changes, you publish an XML site with information about what has changed on your website
- so I don’t have to always go to your website to check it for changes
- you are pushing out the content to users
- XML is the driving force making the blogging revolution possible
- XML is a way to structure data in certain predefined formats
XML uses tags to contain and structure data

XML is platform independent, language independent, can be sent via FTP, HTTP, etc.

Attributes have to have values, this is incorrect:

This is correct

Certain characters cannot be used anywhere
- like an expression: 4 > 5
- XML cannot recognize that because the character is special
- You need to encode it as special characters

XML has a tree structure, inevitably it results in a tree structure (parent tags, child tags, etc)

Child tags indicate ownership of attributes / characteristics
In a diagram of an XML file’s tree structure, each circule in the diagram is called a node, a node-set is a node and its descendants (tree-branch)
- some people refer to nodes at the same level are siblings
- some people call child nodes descendants

XPath is a way of identifying exactly where you are in the XML tree
- everything in an XML file has a unique location
- - allows you to pinpoint the exact location of data
- Like breadcrumbs in a hierarchically organized website
- Author/publisher/address/street

XPath is a query language like SQL that lets you run queries to XML files
- XPath differentiates between author/name and author/publisher/name
- XPath is a specification or a standards
- There are several XPath software tools that let you query XML files, both open source and closed source

XPath will let you specify complex queries and find anything in an XML file
- XML files can get very big
- Even RSS feeds get huge, it is too hard to visually inspect them
- Need some kind of tool to help navigate and work with XML files

Name spaces: An XML file can have a name space
- allows you stay in your space of names, what your definitions are
- don’t worry about these, they don’t come up as much in real life

Have element nodes, attribute nodes, text nodes, comment nodes, and namespace nodes

Another way to close tags is with a forward slash, the XML reader looks backward

Alternative format:

Issue of interoperability is huge with XML
- you have to define what your file means
- attributes must always have open and closed quotes
- browsers tend to forgive these kinds of things, but XML does not
- make sure your XML as well as HTML is always well formed

Tag names cannot contain spaces
- certain characters are not allowed: ampersand (&) must be replaced with &

XML is case sensitive
- to the computer and >boB/> is different

Example you see of needed ampersand code is ©

There are multiple ways to represent information contained in a table in an XML file
- bottom line is if the people consuming your information understand it

My question: Can Excel export as XML?

What does this all mean?
DTD: Document Type Definition
- if I consume your XML data, I have to know what it means so we can exchange data
- this was an early attempt allowing parties to agree on a format for XML exchange
- this is really out of date now
- you could validate XML documents

There is a difference between validation and having good syntax
- if you get an XML file and it means what you want it to mean, then it is valid
- terms are used as a synonym at places like the W3C validator

DTD test Validating XML files verifies
- proper nesting has occurred
- all required tagas are present and accounted for
- specific units of information are of the correct type and fall within the specified legal values

XML that passes the DTD test is valid with respect to that DTD test
Shortfalls of DTDS
- can just work with dates up to 2000, there were certain dates you couldn’t represent
- time span representations were difficult / impossible

XML schemas are a more advanced version of DTDs with several advantages
- provides support for namespaces, to help resolve conflicts in tag names
- richer datayptes than DTDs
- user defined types called archetypes
- allowance for attribute grouping, many attributes often go together

Schema is just a way to validate an XML file

XML is a cousin to HTML and can be formatted into XMLT

XSLT style sheet controls the presentation

Also should put encoding attribute in an XML file so a browser knows what to do with your XML (what language it is in)

Also use stylesheet specification for formatting

XSLT helps you control formatting of content

XML + XSLT Stylesheet = HTML or XML or WML (wireless markup language for cell phones)

HTML output is dynamic based on what data is contained in the XML string

XSLT stylesheets are XML documents and conform to all the properties of XML

On this day..

© Creative Commons License