Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow access to other XML docs in docx file like the header and footer #73

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yjukaku
Copy link

@yjukaku yjukaku commented Oct 16, 2019

This adds support for retrieving all of the header and footer documents embedded in the docx file, as well as the numbering docs.

This is based on the work in #22 and #42.

It also closes #49 and #32

@yjukaku yjukaku force-pushed the add-other-doc-access branch from ad67fc8 to 990de9c Compare October 22, 2019 20:21
@yjukaku yjukaku force-pushed the add-other-doc-access branch from 990de9c to 46beee1 Compare October 22, 2019 20:24
@fercreek
Copy link

@chrahunt we need this solution from @yjukaku

@yjukaku
Copy link
Author

yjukaku commented Oct 8, 2020

👋 Is there anything holding up this PR from merging? Anything we can do to help?

@nathanvda
Copy link
Contributor

This PR would solve a problem I am currently encoutering (namely: setting a bookmark in a header). I am willing to help to get this PR merged, what is holding this back?

@satoryu
Copy link
Member

satoryu commented Jun 29, 2021

There is a conflict file now.

@yjukaku do you have time to resolve the conflict?

@nathanvda
Copy link
Contributor

So I was trying if I could get it working, I see the main difference now is that for Office365 files we have to either try document.xml and if that does not exist, use document2.xml.

So I created a local version, where more inline with the current code, instead of iterating over DOCUMENT_PATHS I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

But when trying to adapt the update method accordingly, I noticed we only update the word/document.xml regardless of the source (leaving the document2.xml as is?) and I am not sure if that is ok/a problem? Can I ignore that for now?

@yjukaku
Copy link
Author

yjukaku commented Jun 29, 2021

I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

I was trying to DRY the code with the DOCUMENT_PATHS hash, but if that's not needed 🤷‍♂️ .

Can I ignore that for now?

I personally would expect the document file name to be the same as the original when updated. It appears the better way to find the proper document name would be to check the file [Content Types].xml in the zip, then look for an Override tag in that XML file that has a ContentType attribute with the value application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml. That will tell us exactly which file is the "main" one, and a similar method can be used for the headers, footers, numbering, styles, etc.

See http://officeopenxml.com/anatomyofOOXML.php under Content Types

Here's a sample [Content Types].xml:

<?xml version="1.0" encoding="UTF-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
    <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/_rels/document.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
  <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
  <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
  <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/>
  <Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>
  <Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
  <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
  <Override PartName="/customXml/_rels/item1.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml"/>
  <Override PartName="/customXml/item1.xml" ContentType="application/xml"/>
  <Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/>
  <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
  <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>

nathanvda added a commit to nathanvda/docx that referenced this pull request Jul 1, 2021
Inspired by PR ruby-docx#73
we adapted the code to work on top of the current
state.
@aunghtain
Copy link

Can we merge this PR as well? I need access to numbering and header/footer. Thanks.

@panozzaj
Copy link

Thanks, the proposed change seems good at a high level to me. (I'm not affiliated with the project, just someone who has started using the library.) This would be helpful for one case I saw today where the important text information we wanted was in the document footer. Right now that information is inaccessible.

I wouldn't want to delay this PR, but what do you think about adding the header or footer contents to methods like .text on documents? Maybe it could take the contents of any headers and put that at the top of the document text, and the contents of the footers at the end. That way document.text would truly give you all of the text of the document.

FeminismIsAwesome pushed a commit to FeminismIsAwesome/docx that referenced this pull request Jul 30, 2023
inspired from ruby-docx#73 but stripped down to just the header to see if that might be more amenable to get in.

Also because of the TODO note in the update function, only supports reading these files, not updating them.
@aunghtain
Copy link

Any update on this? I've been waiting for it for more than a year now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace Header/Footer bookmarks doesn't work
6 participants