Creating Langchain Documents from Strings
Problem Statement
When working with Langchain's document processing capabilities, you may need to create a Document
object directly from a string variable in Python. This common requirement isn't clearly documented, and developers often encounter errors when attempting to use these custom documents in Langchain chains.
The core issue manifests when:
- Creating a document from a string using the
Document
class - Attempting to use this document in a
load_qa_chain
operation - Encountering the error:
AttributeError: 'tuple' object has no attribute 'page_content'
This occurs despite apparently correct document creation:
doc = Document(page_content="text", metadata={"source": "local"})
print(type(doc)) # Valid Document type
print(doc.page_content) # Correct content
Solutions
Solution 1: Correct Document Creation (Recommended)
<code-group> <code-block title="LangChain >= v0.1.11 (New Structure)"> ```python from langchain_core.documents import Documentdoc = Document( page_content="Your text content here", metadata={"source": "local"} )
</code-block>
<code-block title="LangChain < v0.1.11 (Legacy)">
```python
from langchain.docstore.document import Document
doc = Document(
page_content="Your text content here",
metadata={"source": "local"}
)
IMPORTANT: LangChain Restructuring
LangChain has been split into separate packages (langchain-core
, langchain-community
, langchain-text-splitters
). The new structure is backwards-compatible, but for new projects, import from langchain_core.documents
.
Solution 2: Create Documents for Chains
Langchain chains require lists of documents for input_documents
, even with single documents:
from langchain_core.documents import Document
# Create document
doc = Document(
page_content="Financial report Q4 2023...",
metadata={"source": "internal"}
)
# Correct usage: Wrap in list
chain({"input_documents": [doc], "human_input": query})
Why the Error Occurs
The 'tuple' object has no attribute 'page_content'
error happens when:
- You pass a single Document object instead of a list
- The chain tries to unpack your document as if it were a tuple of multiple documents
Solution 3: Creating Multiple Documents
When working with multiple strings:
from langchain.text_splitter import CharacterTextSplitter
from langchain_core.documents import Document
# For predefined texts
texts = ["First text", "Second text"]
metadatas = [{"source": "doc1"}, {"source": "doc2"}]
documents = []
for text, meta in zip(texts, metadatas):
documents.append(Document(page_content=text, metadata=meta))
# Alternatively, using built-in method
documents = CharacterTextSplitter().create_documents(texts, metadatas=metadatas)
Additional Considerations
Using Documents with Memory Chains
Ensure your prompt template contains the required variables with document data:
template = """You are a financial analyst analyzing:
{context} <-- Documents inserted here
{chat_history}
Human: {human_input}
Analyst:"""
prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"],
template=template
)
memory = ConversationBufferMemory(
memory_key="chat_history",
input_key="human_input"
)
chain = load_qa_chain(
llm,
chain_type="stuff",
memory=memory,
prompt=prompt
)
Best Practices
- Include Metadata: Always include source information
- Check List Structure: Verify
input_documents
is a list - Version Compatibility: Use correct imports for your LangChain version
- Text Management: For large texts, use text splitters:python
from langchain_text_splitters import CharacterTextSplitter splitter = CharacterTextSplitter(chunk_size=1000) documents = splitter.create_documents([long_text])
Troubleshooting Checklist
- ❌ Getting
AttributeError: 'tuple' object has no attribute 'page_content'
→ Wrap your document in a list:[doc]
- ❌
ImportError
forlangchain.docstore.document
→ Usefrom langchain_core.documents import Document
instead - ❌ Documents not appearing in context
→ Verify all variables in prompt template match chain input - ❌ Unexpected document processing
→ Use LangChain'screate_documents
method for proper formatting
By following these patterns, you can reliably create Langchain documents from strings and integrate them with processing chains without encountering the tuple attribute error.