Skip to content

Creating Langchain Documents from Strings

Problem Statement

When working with Langchain's document processing capabilities, you may need to create a Document object directly from a string variable in Python. This common requirement isn't clearly documented, and developers often encounter errors when attempting to use these custom documents in Langchain chains.

The core issue manifests when:

  1. Creating a document from a string using the Document class
  2. Attempting to use this document in a load_qa_chain operation
  3. Encountering the error: AttributeError: 'tuple' object has no attribute 'page_content'

This occurs despite apparently correct document creation:

python
doc = Document(page_content="text", metadata={"source": "local"})
print(type(doc))  # Valid Document type
print(doc.page_content)  # Correct content

Solutions

<code-group> <code-block title="LangChain >= v0.1.11 (New Structure)"> ```python from langchain_core.documents import Document

doc = Document( page_content="Your text content here", metadata={"source": "local"} )

</code-block>

<code-block title="LangChain < v0.1.11 (Legacy)">
```python
from langchain.docstore.document import Document

doc = Document(
    page_content="Your text content here",
    metadata={"source": "local"}
)
</code-block> </code-group>

IMPORTANT: LangChain Restructuring

LangChain has been split into separate packages (langchain-core, langchain-community, langchain-text-splitters). The new structure is backwards-compatible, but for new projects, import from langchain_core.documents.

Solution 2: Create Documents for Chains

Langchain chains require lists of documents for input_documents, even with single documents:

python
from langchain_core.documents import Document

# Create document
doc = Document(
    page_content="Financial report Q4 2023...",
    metadata={"source": "internal"}
)

# Correct usage: Wrap in list
chain({"input_documents": [doc], "human_input": query})

Why the Error Occurs

The 'tuple' object has no attribute 'page_content' error happens when:

  • You pass a single Document object instead of a list
  • The chain tries to unpack your document as if it were a tuple of multiple documents

Solution 3: Creating Multiple Documents

When working with multiple strings:

python
from langchain.text_splitter import CharacterTextSplitter
from langchain_core.documents import Document

# For predefined texts
texts = ["First text", "Second text"]
metadatas = [{"source": "doc1"}, {"source": "doc2"}]

documents = []
for text, meta in zip(texts, metadatas):
    documents.append(Document(page_content=text, metadata=meta))

# Alternatively, using built-in method
documents = CharacterTextSplitter().create_documents(texts, metadatas=metadatas)

Additional Considerations

Using Documents with Memory Chains

Ensure your prompt template contains the required variables with document data:

python
template = """You are a financial analyst analyzing:
{context}  <-- Documents inserted here
{chat_history}
Human: {human_input}
Analyst:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input", "context"],
    template=template
)

memory = ConversationBufferMemory(
    memory_key="chat_history",
    input_key="human_input"
)

chain = load_qa_chain(
    llm,
    chain_type="stuff",
    memory=memory,
    prompt=prompt
)

Best Practices

  1. Include Metadata: Always include source information
  2. Check List Structure: Verify input_documents is a list
  3. Version Compatibility: Use correct imports for your LangChain version
  4. Text Management: For large texts, use text splitters:
    python
    from langchain_text_splitters import CharacterTextSplitter
    
    splitter = CharacterTextSplitter(chunk_size=1000)
    documents = splitter.create_documents([long_text])

Troubleshooting Checklist

  1. ❌ Getting AttributeError: 'tuple' object has no attribute 'page_content'
    → Wrap your document in a list: [doc]
  2. ImportError for langchain.docstore.document
    → Use from langchain_core.documents import Document instead
  3. ❌ Documents not appearing in context
    → Verify all variables in prompt template match chain input
  4. ❌ Unexpected document processing
    → Use LangChain's create_documents method for proper formatting

By following these patterns, you can reliably create Langchain documents from strings and integrate them with processing chains without encountering the tuple attribute error.