User:Miriya52/week8

From mediawiki.org

Refer to project glossary for acronyms and definitions.
Week 8: January 23 - 27

Objectives[edit]

  • Implement functionality for reading in .txt file to parser. Pass test_gift_structures: (test_string(), test_read_string(), test_read_file))
  • Add more types to parser

Summary[edit]

I added some more unittests and method implementation to pass those tests. I am also still figuring out how the pyslet xml components interacts with the structures, and how to adapt it for the gift format. In trying to trace from the parser to the structures, I found that I need to take some time to actually figure out how the classes and methods interact to populate the structures. Also, which classes and methods are still relevant for the gift format.

The issue is that xml [example] has a declaration and each tag has a similar notation. The QTI format has more types of elements, many that does not have a representation in GIFT. The GIFT format specifies unique escape characters to represent questions, with a significantly different syntax from QTI.

The goal is for Document.read() to take a file or string and parse it properly.

I added logic to gift.parse_content() to parse a basic GIFT format. However, this is insufficient for actually inserting these elements into the namespace and tree structure. xml.parser.parse_stag() addresses parsing the declaration for a xml file. A GIFT file does not have a declaration, but I may need to implement this method for the GIFT format to make it compatible for the tree structure. I am also working on which functionality in xml.parser.parse_element() is necessary for the GIFT parser and structures.

Here is a list of classes I'm looking at in the pyslet.xml structures and parsers, and notes in progress on how it can be adapted to GIFT:

  • structures.Document(Node): base class for documents, root is Element object. When parsing a file or stream, call Document.read()
    • structures.Document.get_element_class(): derived classes override this method to enable parser to create instances of custom classes based on document context and element name.
    • structures.Document.add_child(): creates root element of document
  • structures.ElementType: element type definitions (EMPTY, ANY, MIXED, ELEMENT_CONTENT, SGMLCDATA)
    • SGML is standard generalized markup language which includes xml and html. GIFT is not SGML, but do I still need to create a parallel element of this type for the GIFT format?
    • Parameters of this class are: entity, name, content_type, content_model, particle_map
    • ElementType.build_model(): build internal structures; checks if content_type is ELEMENT_CONTENT or MIXED. (I need to determine what is the difference between ELEMENT_CONTENT and MIXED?)
  • structures.GIFTDTD: document type declaration container for entity, element, and attribute declarations
    • GIFT does not have declarations, but do I still need to implement a parallel of this class for GIFT?

Here is a list of the other classes and methods that I still need to figure out how it should be adapted to GIFT:

  • structures.Element(Node): base class to represent all GIFT elements, with unknown content models or require no special processing. It's possible that since GIFT is simple, it can be completely represented with Elements.
  • parser.parse_document(): This is definitely a key method for Document.read() to work. What in here do I need to keep and what can be ignored?
  • parser.parse_element(): element is added as child of current element using Node.add_child(). This is also a key method for Document.read() to work. I have it implemented for gift.parser, but ignoring the SGML, DTD, and stag methods. It is not working, and I need to figure out if those ignored parts need to be included.
  • NameParticle and ContentParticle. How do these classes fit into the data model and is it still needed for GIFT?

Code status: Travis CI not passing right now, as I am working on document.read() to pass parser tests.

Progress Update[edit]

While debugging functionality for reading in .txt file to parser, found that imported text is showing up in parser by the int representation.[edit]

  File "/Users/annaliao/repos/pyslet/pyslet/gift/structures.py", line 1968, in read_from_entity
    parser.parse_document(self)
  File "/Users/annaliao/repos/pyslet/pyslet/gift/parser.py", line 512, in parse_document
    self.parse_element()
  File "/Users/annaliao/repos/pyslet/pyslet/gift/parser.py", line 673, in parse_element
    self.parse_content()
  File "/Users/annaliao/repos/pyslet/pyslet/gift/parser.py", line 749, in parse_content
    raise GIFTFatalError("parse_content: unexpected character, %s" % self.the_char)
pyslet.gift.parser.GIFTFatalError: parse_content: unexpected character, 47

For example:

>>> parser.the_char
47
>>> parser.next_char()
>>> parser.the_char
47
>>> parser.next_char
>>> parser.the_char
72
>>> parser.next_char()
>>> parser.the_char
101
>>> parser.next_char()
>>> parser.the_char
108

unicode conversion from file to parser.
'/': https://unicode-table.com/en/#002F, &#47
'H': https://unicode-table.com/en/#0048, &#72
'e': https://unicode-table.com/en/#0065, &#101
'l': https://unicode-table.com/en/#006C, &#108

Fixed this issue by casting chr(<int>) to get the unicode str representation of the int.

if isinstance(self.the_char, int):
      self.the_char = chr(self.the_char)

Not a good long term solution. Need to find where this is happening when reading in a file, or have a function for converting.

STATUS: test_gift_structures.test_string() and test_gift_structures.test_read_file() are the unittests for this functionality. These tests are not passing yet.

structures.escape_char_data[edit]

There are special control characters for the GIFT format, including '//', '::', '{', '=', '~'. These need to be addressed when parsing a GIFT formatted text. Specifically, when the quiz author intentionally wants it included in the text, and when it is an invalid use in the text.

Parser objects parse entities[edit]

From the reference XMLParser for gift.parser, I decided I need to revisit the GIFTentity unittests to make sure these structures objects are working as expected. GIFTentity objects act as context managers.

Also, added in some Document unittests for reading from a file. And test_gift_parser unittests.

Plan for next week[edit]

  • Continue to investigate the existing pyslet.xml components and how it interacts with the data structures. See how it adapts to GIFT.
  • Work on getting unittests with Document.read() to pass