AI-Assisted Data Room File Organizer

Posted 3 weeks ago

Worldwide

Summary

Create an AI-Assisted Static Data Room Organizer and Retrieval Package for Real Estate Development Documents. I need a contractor to build a practical tool that can take one or more master “Inbox” folders of mixed project documents and produce a clean, static, data-room-style folder system and retrieval package for a real estate development / infrastructure project. This is not just automated file sorting. The final deliverable must be a static access and retrieval system that includes organized folders, copied files, a searchable local index, a master manifest, confidence scoring, classification explanations, duplicate detection, and a review queue. The system should allow me to drop all files into a master folder, run the tool, and receive a complete organized data-room package. Required outcome: * Organized folder and subfolder hierarchy. * Original files preserved and not modified. * Files copied into assigned output folders. * CSV/Excel master manifest. * Searchable local HTML index that opens without a server. * Confidence score for each classification. * Short explanation of why each file was classified that way. * Exact and near-duplicate report. * “Needs Review” queue for uncertain files. * Audit/error log showing how each file was processed. * Editable taxonomy so the folder structure can be revised and rerun. An LLM or equivalent semantic reasoning layer is required. This cannot be only a keyword sorter. The system must synthesize file names, extracted text, headings, dates, parties, document type, and context to determine the correct primary folder. The LLM should return assigned folder, confidence score, short rationale, key supporting terms/entities, and review-needed flag. Core workflow: 1. Accept one or more input folders with files/subfolders. 2. Extract text and metadata from each file. 3. OCR scanned PDFs and image-based documents where possible. 4. Classify documents into the defined data-room taxonomy using AI/LLM reasoning. 5. Copy each file into the correct output folder without altering originals. 6. Optionally rename copied files using a clean naming convention. 7. Generate the manifest, searchable HTML index, duplicate report, review queue, and audit log. 8. Allow manual corrections and rerun. Required file types: PDF including scanned PDFs, Word .doc/.docx, Excel .xls/.xlsx, PowerPoint .ppt/.pptx, images .jpg/.png/.tif, text files, and .msg/.eml emails if feasible. Initial folder hierarchy should be editable but include: 00_Admin_Index 01_Project_Overview 02_Land_Control_PSA 03_Title_Survey_ALTA 04_Zoning_Land_Use_Local_Approvals 05_Environmental_RCRA_BRAC_FOSET 06_Wetlands_Streams_USACE 07_Floodplain_Drainage_Stormwater 08_Geotechnical_Soils 09_Civil_Engineering_Site_Planning 10_Power_Utility_AEP_SWEPCO 11_BTM_Generation_BESS_Energy 12_Natural_Gas 13_Water_Wastewater 14_Fiber_Telecom 15_Permitting_FAST41_Federal_State 16_Vendors_Proposals_Budgets 17_Capital_Markets_Investor_Materials 18_Correspondence 19_Unclassified_Review_Queue Each main folder may include subfolders such as: * Agreements / Correspondence / Reports / Maps_Exhibits / Proposals / Cost_Estimates / Permits / Review_Needed, as applicable. The taxonomy should be stored in an editable JSON, YAML, CSV, or Excel config file. Preferred architecture: Local-first or hybrid. Use local extraction, OCR, duplicate detection, and embeddings where possible. Use AI/API classification only where needed or approved. If API use is required, disclose exactly what text/metadata is sent externally and allow local-only or limited-text classification. Possible technologies: Python, Tesseract OCR, PyMuPDF/pdfplumber, python-docx, openpyxl, python-pptx, sentence-transformers, FAISS/Chroma, OpenAI API or other LLM classifier, Streamlit, Flask, or simple desktop GUI. Minimum UI: Command-line tool with clear setup instructions and config file. Preferred UI: Simple local browser or desktop app where I can select input folder, select output folder, choose/edit taxonomy, run classification, view progress, open the review queue, export manifest, generate the static HTML index, and rerun after corrections. Security requirements: * Do not modify originals. * Do not upload documents to third-party cloud services unless explicitly enabled. * Do not retain client documents. * Do not store API keys in plain text. * Use dummy/sample files for testing unless real files are specifically approved. Deliverables: Working tool/app/script, source code, editable taxonomy file, organized folder output generator, CSV/Excel manifest, searchable local HTML index, duplicate report, Needs Review report, error/audit log, installation instructions, user guide, and demo using sample files. Acceptance criteria: The tool successfully processes a mixed master folder, classifies files using LLM/semantic reasoning, copies them into the folder hierarchy, preserves originals, creates a searchable static index, produces an audit-friendly manifest, detects duplicates, and flags uncertain files for review. This is not intended to be a full enterprise data room SaaS platform. I need a practical, reliable static document organization and retrieval tool that prepares a usable real estate development data-room package from a large batch of documents.

  • $200.00

    Fixed-price
  • Intermediate
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
Microsoft Excel
Data Entry
Nice-to-have skills
Python
Microsoft Word
Activity on this job
  • Proposals:10 to 15
  • Last viewed by client:2 weeks ago
  • Hires:
    1
  • Interviewing:
    4
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Mar 18, 2013
  • United States
    Pace11:10 AM
  • $28K total spent
    450 hires, 29 active
  • 1,013 hours

Explore similar jobs on Upwork

Regional Web3 Data Specialist (Slovakia)Hourly‐ Posted 1 week ago
Data Entry
Data Mining
Web3
Blockchain
Data Analysis
Data Cleaning
Social Media Lead Generation
Data Entry
Lead Generation
List Building
Online Research
Google Search
Data Collection
Data Scraping
Data Mining
Prospect List
Prospect Research
Lead Capture
B2B Lead Generation
Data Migration

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo