Please enter the password to view the content.
Maintain the original segmentation approach while calculating the “interference rate” for each paragraph (the degree to which titles/outlines impact retrieval results). Apply special processing to high-interference paragraphs to reduce inappropriate influence of titles and outlines on retrieval results.
// 1. Maintain single material library index
materialLibrary = createVectorIndex(allParagraphs) // Complete format: "title\noutline\ncontent"
// 2. Retrieval implementation
function query(userInput):
initialResults = materialLibrary.retrieve(userInput, top_k=10) // Retrieve more for re-ranking
// Calculate interference rates
for result in initialResults:
title, outline, content = separateComponents(result.text)
interferenceRate = calculateInterferenceRate(title + outline, content, userInput)
result.interferenceRate = interferenceRate
interferenceThreshold = 1.3
for result in initialResults:
if result.interferenceRate > interferenceThreshold:
result.adjustedScore = result.originalScore / result.interferenceRate
else:
result.adjustedScore = result.originalScore
sortedResults = sortBy(adjustedScore)(initialResults)
return sortedResults.top(5)
Example paragraph (Segment 3):
***
Split:
Calculation (example assumed values):
User Query: "Enterprise Management System Standards\n2. Position Requirements"
Result | Original Score | Description |
---|---|---|
Segment 4 | 0.88 | Directly related to “position requirements” |
Segment 3 | 0.84 | Related to position conditions |
Segment 2 | 0.76 | General provisions |
Endpoint: POST http://example.com:8080/ai/chat/autoReference
Request Body:
{
"searchWord": "Artificial Intelligence Technology\n2. Development History of AI",
"interference": 1.01
}
Response (Excerpt):
{
"success": true,
"data": [
{
"title+outline+content": [{ "score": 0.8789253, "id": "2149793922940857", "content": "..." }],
"content": [{ "score": 0.866995, "id": "21497939229408532", "content": "..." }],
"interferenceCalculation": [{ "interferenceRatio": 1.0090, "fullContentScore": 0.87149847 }],
"finalTop5": [{ "score": 0.86370814, "id": "21497939229408515" }]
}
]
}
Maintain the original data storage structure while dynamically processing paragraphs during retrieval to solve short paragraph issues and improve semantic completeness.
User Query: Enterprise management system standards position requirements detailed description
Identify original text sections (illustrative) and merge:
Final retrieval ranking:
Utilize parent-child segment structure to build hierarchical indexing: parent segments (merged short segments) serve as high-level entry points, while child segments retain fine-grained information.
getAllChildSegments
parentSegmentSet = []
mergedIDs = set()
threshold = 200
for i in range(len(childSegments)):
p = childSegments[i]
if p.wordCount < threshold:
group = [p]
j = i + 1
while j < len(childSegments) and topicRelated(p, childSegments[j]) and childSegments[j].wordCount < threshold:
group.append(childSegments[j]); j += 1
if len(group) > 1:
parentText = merge(group) // Only first segment retains title+outline
parentSegmentSet.add(parentText)
markAsMerged
i = j - 1
retrievalIndexUse: parentSegments + unmergedChildSegments
Parent Segment = Child Segment 1 (title+outline+content) + “\n” + Child Segment 2 (content) + …
Threshold 200 words: Segment 7 and Segment 8 merge → Parent Segment A.
Structure:
Vector Index:
retrievalVectorSet = { Segment1..6, ParentSegmentA, Segment9 }
parentChildMapping = { ParentSegmentA: [Segment7, Segment8] }
Retrieval Example: "Enterprise Management System Standards\n1. Term Regulations"
Child segment difference strategy:
Preprocessing optimization: Complete segmentation, layout analysis, and semantic merging before indexing to improve index foundation quality.
Approach | Modification Scope | Implementation Timeline | Advantages | Risks/Costs |
---|---|---|---|---|
Approach 1 | Post-retrieval re-ranking | Short | Quick deployment, low intrusion | Threshold tuning depends on experiments |
Approach 2 | Dynamic retrieval process reorganization | Medium | Balance completeness and real-time performance | Additional runtime computation |
Approach 3 | Add hierarchical indexing | Medium-long | Multi-granularity retrieval flexibility | Complexity of maintaining dual indexes |
Approach 4 | Data re-indexing | Long | Highest foundation quality | Large transformation scope, high deployment cost |
Note: All example scores in this document are hypothetical values used solely to illustrate computational processes.