Skip to content

Commit ea5e63e

Browse files
authored
Merge pull request HKUDS#164 from MrGidea/main
Multiple file types support input
2 parents 3e99d3f + dbeceac commit ea5e63e

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

β€ŽREADME.mdβ€Ž

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ This repository hosts the code of LightRAG. The structure of this code is based
2222
</div>
2323

2424
## πŸŽ‰ News
25+
- [x] [2024.10.29]πŸŽ―πŸŽ―πŸ“’πŸ“’Multi-file types are now supported by `textract`.
2526
- [x] [2024.10.20]πŸŽ―πŸŽ―πŸ“’πŸ“’We’ve added a new feature to LightRAG: Graph Visualization.
2627
- [x] [2024.10.18]πŸŽ―πŸŽ―πŸ“’πŸ“’We’ve added a link to a [LightRAG Introduction Video](https://youtu.be/oageL-1I0GE). Thanks to the author!
2728
- [x] [2024.10.17]πŸŽ―πŸŽ―πŸ“’πŸ“’We have created a [Discord channel](https://discord.gg/mvsfu2Tg)! Welcome to join for sharing and discussions! πŸŽ‰πŸŽ‰
@@ -285,6 +286,19 @@ with open("./newText.txt") as f:
285286
rag.insert(f.read())
286287
```
287288

289+
### Multi-file Type Support
290+
291+
The `testract` supports reading file types such as TXT, DOCX, PPTX, CSV, and PDF.
292+
293+
```python
294+
import textract
295+
296+
file_path = 'TEXT.pdf'
297+
text_content = textract.process(file_path)
298+
299+
rag.insert(text_content.decode('utf-8'))
300+
```
301+
288302
### Graph Visualization
289303

290304
<details>
@@ -863,3 +877,6 @@ archivePrefix={arXiv},
863877
primaryClass={cs.IR}
864878
}
865879
```
880+
881+
882+

0 commit comments

Comments
 (0)