DocumentID: ECMA-376/Part2/AnnexA
Title: ECMA-376, Part2: Annex A. Resolving Unicode Strings to Part Names
Extracted-From: ECMA-376 Office Open XML File Formats, 1st Edition / December 2006
Warning: Coverted to HTML format by a script known to have bugs

Navigation:

Annex A: Resolving Unicode Strings to Part Names

Package clients might use Unicode strings for referencing parts in a package. [Example: Values of xsd:anyURI data type within XML markup are Unicode strings. end example]

This annex specifies how such Unicode strings shall be resolved to part names.

The diagram below illustrates the conversion path from the Unicode string to a part name. The numbered arcs identify string transformations.

Figure A--1. Strings are converted to part names for referencing parts

image43

A Unicode string representing a URI can be passed to the producer or consumer. The producing or consuming application shall convert the Unicode string to a URI. If the URI is a relative reference, the application shall resolve it using the base URI of the part, which is expressed using the pack scheme, to the URI of the referenced part. [M1.33]

The process for resolving a Unicode string to a part name follows Arcs [1-2], [2-3], and [3-4].

A.1 Creating an IRI from a Unicode String

With reference to Arc [1-2] in Figure A--1, a Unicode string is converted to an IRI by percent-encoding each ASCII character that does not belong to the set of reserved or unreserved characters as defined in RFC 3986.

A.2 Creating a URI from an IRI

With reference to Arc [2-3] in Figure A--1, an IRI is converted to a URI by converting non-ASCII characters as defined in Step 2 in §3.1 of RFC 3987

If a consumer converts the URI back into an IRI, the conversion shall be performed as specified in §3.2 of RFC 3987. [M1.34]

A.3 Resolving a Relative Reference to a Part Name

If the URI reference obtained in §A.2 is a URI, it is resolved in the regular way, that is, with no package-specific considerations. Otherwise, if the URI reference is a relative reference, it is resolved (with reference to Arc [3-4] in Figure A--1) as follows:

  1. Percent-encode each open bracket ([) and close bracket (]).
  2. Percent-encode each percent (%) character that is not followed by a hexadecimal notation of an octet value.
  3. Un-percent-encode each percent-encoded unreserved character.
  4. Un-percent-encode each forward slash (/) and back slash (\).
  5. Convert all back slashes to forward slashes.
  6. If present in a segment containing non-dot (".") characters, remove trailing dot (".") characters from each segment.
  7. Replace each occurrence of multiple consecutive forward slashes (/) with a single forward slash.
  8. If a single trailing forward slash (/) is present, remove that trailing forward slash.
  9. Remove complete segments that consist of three or more dots.
  10. Resolve the relative reference against the base URI of the part holding the Unicode string, as it is defined in §5.2 of RFC 3986. The path component of the resulting absolute URI is the part name.

A.4 String Conversion Examples

[Example:

Examples of Unicode strings converted to IRIs, URIs, and part names are shown below:

Unicode string

IRI

URI

Part name

/a/b.xml

/a/b.xml

/a/b.xml

/a/b.xml

/a/ц.xml

/a/ц.xml

/a/%D1%86.xml

/a/%D1%86.xml

/%41/%61.xml

/%41/%61.xml

/%41/%61.xml

/A/a.xml

/%25XY.xml

/%25XY.xml

/%25XY.xml

/%25XY.xml

/%XY.xml

/%XY.xml

/%25XY.xml

/%25XY.xml

/%2541.xml

/%2541.xml

/%2541.xml

/%2541.xml

/../a.xml

/../a.xml

/../a.xml

/a.xml

/./ц.xml

/./ц.xml

/./%D1%86.xml

/%D1%86.xml

/%2e/%2e/a.xml

/%2e/%2e/a.xml

/%2e/%2e/a.xml

/a.xml

\a.xml

%5Ca.xml

%5Ca.xml

/a.xml

\%41.xml

%5C%41.xml

%5C%41.xml

/A.xml

/%D1%86.xml

/%D1%86.xml

/%D1%86.xml

/%D1%86.xml

\%2e/a.xml

%5C%2e/a.xml

%5C%2e/a.xml

/a.xml

end example]


Converted to HTML format by ooxmlspec2html 0.1, a Perl script provided by OpenISO.org.