Friday, October 3, 2025

Bulk Landlord Import: Migrating Legacy Data Without Manual Entry

Paul (Founder)
Development
Cat observing data migration process and CSV parsing operations on developer's screen

Agencies adopting new property management systems face a daunting challenge: transferring hundreds or thousands of existing landlord records from spreadsheets, legacy databases, or previous software. Manual re-entry isn't feasible—it's time-consuming, error-prone, and delays adoption. Bulk import transforms migration from weeks of data entry to minutes of validation and confirmation.

Week 40 enhanced LetAdmin's landlord CSV import system with intelligent name parsing, duplicate detection, URL download support for cloud-stored spreadsheets, and multi-tenant scoping ensuring imports land in the correct agency databases. This article explores how we built import functionality that handles messy real-world data whilst maintaining data integrity and multi-tenant isolation.

What Your Team Will Notice

The landlord import interface accepts CSV files either through direct upload or by providing a URL to a hosted spreadsheet. This flexibility accommodates various workflows: staff can upload local CSV exports from previous systems, or they can provide Google Sheets sharing links, Dropbox URLs, or other cloud-hosted spreadsheet locations.

The import process validates data before committing anything to the database. After uploading or specifying a URL, the system displays a preview showing how many landlords will be created, which records have potential issues (missing required fields, invalid email formats), and which appear to be duplicates of existing landlords.

Duplicate detection prevents accidentally creating multiple records for the same landlord. The system checks names and email addresses against existing records, flagging matches: "Jane Smith (jane@example.com) appears to match existing landlord #1234." Staff can choose to skip duplicates, update existing records, or create anyway if the match is a false positive.

For agencies managing multiple branches or operating in multi-tenant deployments, imports scope to specific agencies. Administrators select which agency should receive imported landlords, ensuring data lands in the correct tenant boundary and preventing cross-agency contamination.

Once validated, clicking "Import" creates landlord records in bulk. The system reports success counts and any failures: "Successfully imported 247 landlords. Skipped 12 duplicates. Failed 3 records due to validation errors (see details)." This transparency allows staff to quickly identify and address any import issues.

Under the Bonnet: CSV Parsing and Validation

The import task handles various CSV formats with flexible column mapping:

namespace :landlords do
  desc "Import landlords from CSV file or URL"
  task :import, [:source, :agency_id] => :environment do |_t, args|
    source = args[:source]
    agency_id = args[:agency_id]

    agency = Agency.find(agency_id)
    ActsAsTenant.current_tenant = agency

    csv_data = fetch_csv_data(source)
    imported_count = 0
    skipped_count = 0
    failed_records = []

    CSV.parse(csv_data, headers: true) do |row|
      landlord_params = extract_landlord_params(row)

      # Check for duplicates
      if duplicate_exists?(landlord_params)
        skipped_count += 1
        next
      end

      # Create landlord
      landlord = Landlord.new(landlord_params.merge(agency: agency))

      if landlord.save
        imported_count += 1
      else
        failed_records << { row: row, errors: landlord.errors.full_messages }
      end
    end

    puts "Import completed:"
    puts "  Successfully imported: #{imported_count}"
    puts "  Skipped (duplicates): #{skipped_count}"
    puts "  Failed: #{failed_records.count}"

    if failed_records.any?
      puts "\nFailed records:"
      failed_records.each do |record|
        puts "  #{record[:row].to_h.values.join(', ')}"
        puts "    Errors: #{record[:errors].join(', ')}"
      end
    end
  end
end

This task structure separates concerns: data fetching, parsing, validation, and creation happen in distinct steps, making the logic easier to test and debug.

Intelligent Name Parsing

Landlord data arrives in various formats. Some CSVs have separate first_name and last_name columns; others have full_name fields requiring parsing. Some include titles (Mr, Mrs, Dr); others don't. The import system handles these variations intelligently:

def extract_landlord_params(row)
  # Try separate first/last name columns first
  if row['first_name'].present? && row['last_name'].present?
    {
      landlord_type: 'individual',
      title: extract_title(row['title']),
      first_name: row['first_name'].strip,
      last_name: row['last_name'].strip,
      email: row['email']&.strip&.downcase,
      phone: row['phone']&.strip,
      mobile: row['mobile']&.strip,
      address_line_1: row['address_line_1'],
      address_line_2: row['address_line_2'],
      town: row['town'],
      county: row['county'],
      postcode: row['postcode']&.strip&.upcase
    }
  # Fall back to full_name parsing
  elsif row['full_name'].present?
    parsed = parse_full_name(row['full_name'])
    {
      landlord_type: 'individual',
      title: parsed[:title],
      first_name: parsed[:first_name],
      last_name: parsed[:last_name],
      email: row['email']&.strip&.downcase,
      phone: row['phone']&.strip,
      mobile: row['mobile']&.strip,
      address_line_1: row['address_line_1'],
      address_line_2: row['address_line_2'],
      town: row['town'],
      county: row['county'],
      postcode: row['postcode']&.strip&.upcase
    }
  else
    raise "Missing required name fields: row must have 'first_name' and 'last_name', or 'full_name'"
  end
end

def parse_full_name(full_name)
  parts = full_name.strip.split(/\s+/)

  # Check if first part is a title
  title = nil
  if parts.first.match?(/^(Mr|Mrs|Miss|Ms|Dr|Prof)\.?$/i)
    title = parts.shift.gsub('.', '')
  end

  # Remaining parts: first name(s) and last name
  if parts.length >= 2
    last_name = parts.pop
    first_name = parts.join(' ')
  else
    # Single name - use as last name
    first_name = nil
    last_name = parts.first
  end

  {
    title: title,
    first_name: first_name,
    last_name: last_name
  }
end

This parsing handles common patterns: "Mr John Smith" extracts title, first name, and last name; "Jane Smith" extracts first and last names without title; "Dr Sarah Jane Williams" correctly identifies compound first names.

The parser isn't perfect—names are culturally diverse and complex patterns (multiple surnames, mononyms, hyphenated names) can confound automated parsing. But it handles common UK name formats reliably, with edge cases addressable through manual correction after import.

Duplicate Detection Logic

Creating duplicate landlord records causes confusion: which record is current? Where should new properties be associated? Duplicate detection prevents this by checking existing records before creation:

def duplicate_exists?(landlord_params)
  # Check for exact email match
  if landlord_params[:email].present?
    existing_by_email = Landlord.find_by(email: landlord_params[:email])
    return existing_by_email if existing_by_email
  end

  # Check for name match
  if landlord_params[:first_name].present? && landlord_params[:last_name].present?
    existing_by_name = Landlord.where(
      first_name: landlord_params[:first_name],
      last_name: landlord_params[:last_name]
    ).first
    return existing_by_name if existing_by_name
  end

  # No duplicates found
  nil
end

This logic prioritises email matching (most unique identifier) before falling back to name matching. It returns the existing landlord if found, allowing callers to decide whether to skip, update, or create despite the match.

Future enhancements might include fuzzy name matching (detecting "Jon Smith" as duplicate of "John Smith"), phone number matching, or address comparison for higher confidence duplicate detection.

URL Download Support

Agencies often maintain landlord data in Google Sheets for easy collaboration. Requiring export-download-upload cycles creates friction. URL download support allows importing directly from sharing links:

def fetch_csv_data(source)
  if source.start_with?('http://', 'https://')
    download_csv_from_url(source)
  elsif File.exist?(source)
    File.read(source)
  else
    raise "Source must be a valid file path or URL: #{source}"
  end
end

def download_csv_from_url(url)
  # Convert Google Sheets URLs to CSV export format
  url = convert_google_sheets_url(url) if url.include?('docs.google.com/spreadsheets')

  uri = URI.parse(url)
  response = Net::HTTP.get_response(uri)

  unless response.is_a?(Net::HTTPSuccess)
    raise "Failed to download CSV from #{url}: #{response.code} #{response.message}"
  end

  response.body
end

def convert_google_sheets_url(url)
  # Convert sharing URL to CSV export URL
  # From: https://docs.google.com/spreadsheets/d/SHEET_ID/edit...
  # To:   https://docs.google.com/spreadsheets/d/SHEET_ID/export?format=csv

  if url.match(%r{/spreadsheets/d/([a-zA-Z0-9-_]+)})
    sheet_id = Regexp.last_match(1)
    "https://docs.google.com/spreadsheets/d/#{sheet_id}/export?format=csv"
  else
    url
  end
end

This converter transforms Google Sheets sharing URLs into CSV export URLs automatically. Staff can copy the sharing link from Google Sheets without understanding export mechanics.

The downloader uses standard Ruby HTTP libraries to fetch remote content. For production deployments, this might be enhanced with timeout handling, redirect following, authentication support for private URLs, and retry logic for transient network failures.

Multi-Tenant Import Scoping

In multi-tenant deployments, imports must scope to specific agencies. The task accepts an agency_id parameter and sets tenant context before processing:

task :import, [:source, :agency_id] => :environment do |_t, args|
  agency = Agency.find(args[:agency_id])
  ActsAsTenant.current_tenant = agency

  # All Landlord queries and creations now automatically scope to this agency
  CSV.parse(csv_data, headers: true) do |row|
    landlord = Landlord.new(landlord_params.merge(agency: agency))
    landlord.save
  end
end

Setting ActsAsTenant.current_tenant ensures all database operations respect tenant boundaries. Duplicate checks query only the current agency's landlords; new records associate with the specified agency automatically.

This pattern prevents cross-tenant data contamination whilst simplifying import code—no need to manually add agency scoping to every query and record creation.

Testing Import Logic

Comprehensive tests verify import functionality with various data formats and edge cases:

RSpec.describe "Landlord CSV Import" do
  describe "name parsing" do
    it "parses full names with titles" do
      row = { 'full_name' => 'Mr John Smith', 'email' => 'john@example.com' }
      params = extract_landlord_params(row)

      expect(params[:title]).to eq('Mr')
      expect(params[:first_name]).to eq('John')
      expect(params[:last_name]).to eq('Smith')
    end

    it "handles compound first names" do
      row = { 'full_name' => 'Dr Sarah Jane Williams', 'email' => 'sarah@example.com' }
      params = extract_landlord_params(row)

      expect(params[:title]).to eq('Dr')
      expect(params[:first_name]).to eq('Sarah Jane')
      expect(params[:last_name]).to eq('Williams')
    end

    it "handles names without titles" do
      row = { 'full_name' => 'Jane Smith', 'email' => 'jane@example.com' }
      params = extract_landlord_params(row)

      expect(params[:title]).to be_nil
      expect(params[:first_name]).to eq('Jane')
      expect(params[:last_name]).to eq('Smith')
    end
  end

  describe "duplicate detection" do
    let(:agency) { create(:agency) }

    before do
      ActsAsTenant.current_tenant = agency
    end

    it "detects duplicates by email" do
      existing = create(:landlord, email: 'john@example.com', agency: agency)
      params = { email: 'john@example.com', first_name: 'John', last_name: 'Smith' }

      duplicate = duplicate_exists?(params)
      expect(duplicate).to eq(existing)
    end

    it "detects duplicates by name" do
      existing = create(:landlord, first_name: 'Jane', last_name: 'Smith', agency: agency)
      params = { first_name: 'Jane', last_name: 'Smith' }

      duplicate = duplicate_exists?(params)
      expect(duplicate).to eq(existing)
    end

    it "does not detect duplicates across agencies" do
      other_agency = create(:agency)
      create(:landlord, email: 'john@example.com', agency: other_agency)

      params = { email: 'john@example.com', first_name: 'John', last_name: 'Smith' }
      duplicate = duplicate_exists?(params)

      expect(duplicate).to be_nil
    end
  end

  describe "URL downloading" do
    it "converts Google Sheets URLs to CSV export format" do
      sheets_url = 'https://docs.google.com/spreadsheets/d/ABC123/edit#gid=0'
      export_url = convert_google_sheets_url(sheets_url)

      expect(export_url).to eq('https://docs.google.com/spreadsheets/d/ABC123/export?format=csv')
    end

    it "downloads CSV data from valid URLs" do
      stub_request(:get, "https://example.com/landlords.csv")
        .to_return(status: 200, body: "first_name,last_name\nJohn,Smith\n")

      csv_data = fetch_csv_data('https://example.com/landlords.csv')
      expect(csv_data).to include("John,Smith")
    end
  end
end

These tests ensure import logic remains correct as the system evolves, particularly when adding support for new CSV formats or enhancing duplicate detection.

What's Next

The CSV import foundation enables sophisticated migration workflows: import preview interfaces showing detailed validation before committing, import templates downloadable from the system with correct column headers and example data, import history tracking showing who imported which records when, and incremental import support allowing periodic synchronisation with external systems.

Future enhancements might include field mapping configuration (allowing agencies to specify which CSV columns map to which landlord fields), transformation rules (applying consistent formatting or corrections during import), and import scheduling (automatically importing from URLs on regular schedules for ongoing synchronisation).

By building flexible CSV import infrastructure with intelligent parsing, duplicate detection, and multi-tenant scoping, LetAdmin transforms landlord data migration from a manual bottleneck into a streamlined process, accelerating agency onboarding and reducing adoption friction.