User:Miriya52/week9

Refer to project glossary for acronyms and definitions. Week 9: January 30 - February 3

Objectives

 * Parse a basic GIFT question to the pyslet data structure

Summary

 * Read through the pyslet structures and parser and took notes to get high level view of how data percolates through the structures. Read tutorial on XML format to better understand the pyslet XML structures and terminology.
 * Revised some code to be able to parse the question title. The elements after the question title are still not parsing correctly.

Progress Update

 * I read a tutorial on XML format, and finally understand the XML terminology better, in terms of how the XML format is designed for a tree structure, elements vs attributes, entity references, namespaces. This helped tremendously for me to think how GIFT maps on to this data structure.
 * Now I can look at this QTI example and identify all the different XML components.

Simplifying the pyslet GIFT format for the most basic GIFT question

 * Only use elements. Ignore attributes.  See XML elements vs attributes.  The general rule of thumb is that metadata is stored as attributes, and data stored as elements.  So as the GIFT parser develops, may add metadata in attributes, but for now, it's simpler to only use elements.
 * XML is case sensitive, but irrelevant for GIFT. The GIFT format symbols do not rely on case-sensitive keywords.
 * Ignore ID references (see bottom of XML attributes)
 * In namespaces, ignore name prefixes for now.
 * Ignore entity references for escaping GIFT format symbols in text content. Table for future feature.
 * Ignore namespaces for now, which seem to be used for a unique identifier for elements.
 * Ignore httprequest objects, used to request data from a web server.
 * Well Formed GIFT Documents, based on Well Formed XML Documents
 * GIFT documents must have a root element
 * GIFT elements closing tags are new line '\n'. (Not sure if this is always true, but will assume this for now)
 * GIFT format symbols do not need to consider case sensitivity.
 * GIFT elements have a specific structure
 * GIFT validator will syntax-check the GIFT text
 * Valid GIFT document is a "Well Formed" GIFT document, which also conforms to the rules of a Document Type Definition, either DTD or schema. It may be possible to include this in the GIFT structures, by defining a DTD file that defines the structure of a GIFT document.  Try to ignore "valid documents" and DTDs for now; assume syntax of  input text conforms to rules.  DTD is primarily used for interchanging data and verifying data from an external source, or verifying your own data.  Since I'm in the process of development, best practice is to wait until the specification is stable before adding a document definition.
 * Determine CDATA (unparsed character data) for GIFT, which are illegal characters that are the GIFT format symbols. There should be a way to escape special characters, so need to look into how CDATA for GIFT is handled.  Everything inside the CDATA section is ignored by the parser.

Basic GIFT question with a possible XML representation
Here is an example of a basic GIFT question with a possible XML representation. Once I wrote out the XML representation, the tree structure was obvious.

Basic GIFT question
//Comment line





=A correct answer

~Wrong answer1


 * 1) A response to wrong answer1

}

Possible XML representation
Comment line Question title Question  A correct answer   Wrong answer1 A response to wrong answer1 


 * , ,, , and have text content.
 * ,, , and  are elements.

Possible issues

 * Unclear whether there can be multiple wrongResponse elements in the responses element. May need to have a single wrongReponse element with multiple value# elements.  Might just use one wrongResponse for now.

GIFT Document Object Model (DOM) representation
Everything in GIFT document is a node: the entire document is a document node, and every GIFT element is an element node.

The root node holds the child nodes:, , , and nodes.

The node holds the child nodes:  and .

The text of an element node is stored in a text node. So the element node holds a text node with the value "Comment line". "Comment line" is not the value of the element.

Possible GIFT DTD
<! DOCTYPE root [ <!ELEMENT comment (#PCDATA) > <!ELEMENT questionTitle (#PCDATA) > <!ELEMENT question (#PCDATA)> <!ELEMENT responses (correctResponse, wrongResponse)> <!ELEMENT correctResponse (value) > <!ELEMENT wrongResponse (value, feedback) > <!ELEMENT value (#PCDATA) > <!ELEMENT feedback(#PCDATA) > ] >

Question title is parsed correctly into structures
The unittest test_element is working. The parser can parse the comment, and then the question title into the structures. The question is still not being parsed correctly.

Future To-Do List
Once parsing a basic question works, next steps are:
 * Handle GIFT format symbols (//, ::, {, }, =, ~, #) need to be handled as escape characters if it shows up in the text. Perhaps handled as an "entity reference", context manager.  XML Entity References are used to handle XML format symbols that show up in text data.  Perhaps using DTD for entity declaration.
 * Add DTD/Schema for verification of "valid" documents.

Plan for next week

 * Continue working on parser to parse basic question
 * Live work session with Jayvdb on Thursday to get to midterm deliverable.