Agencies adopting new property management systems face a daunting challenge: transferring hundreds or thousands of existing landlord records from spreadsheets, legacy databases, or previous software. Manual re-entry isn't feasible—it's time-consuming, error-prone, and delays adoption. Bulk import transforms migration from weeks of data entry to minutes of validation and confirmation.
Week 40 enhanced LetAdmin's landlord CSV import system with intelligent name parsing, duplicate detection, URL download support for cloud-stored spreadsheets, and multi-tenant scoping ensuring imports land in the correct agency databases. This article explores how we built import functionality that handles messy real-world data whilst maintaining data integrity and multi-tenant isolation.
What Your Team Will Notice
The landlord import interface accepts CSV files either through direct upload or by providing a URL to a hosted spreadsheet. This flexibility accommodates various workflows: staff can upload local CSV exports from previous systems, or they can provide Google Sheets sharing links, Dropbox URLs, or other cloud-hosted spreadsheet locations.
The import process validates data before committing anything to the database. After uploading or specifying a URL, the system displays a preview showing how many landlords will be created, which records have potential issues (missing required fields, invalid email formats), and which appear to be duplicates of existing landlords.
Duplicate detection prevents accidentally creating multiple records for the same landlord. The system checks names and email addresses against existing records, flagging matches: "Jane Smith (jane@example.com) appears to match existing landlord #1234." Staff can choose to skip duplicates, update existing records, or create anyway if the match is a false positive.
For agencies managing multiple branches or operating in multi-tenant deployments, imports scope to specific agencies. Administrators select which agency should receive imported landlords, ensuring data lands in the correct tenant boundary and preventing cross-agency contamination.
Once validated, clicking "Import" creates landlord records in bulk. The system reports success counts and any failures: "Successfully imported 247 landlords. Skipped 12 duplicates. Failed 3 records due to validation errors (see details)." This transparency allows staff to quickly identify and address any import issues.
Under the Bonnet: CSV Parsing and Validation
The import task handles various CSV formats with flexible column mapping:
namespace :landlords do
desc "Import landlords from CSV file or URL"
task :import, [:source, :agency_id] => :environment do |_t, args|
source = args[:source]
agency_id = args[:agency_id]
agency = Agency.find(agency_id)
ActsAsTenant.current_tenant = agency
csv_data = fetch_csv_data(source)
imported_count = 0
skipped_count = 0
failed_records = []
CSV.parse(csv_data, headers: true) do |row|
landlord_params = extract_landlord_params(row)
# Check for duplicates
if duplicate_exists?(landlord_params)
skipped_count += 1
next
end
# Create landlord
landlord = Landlord.new(landlord_params.merge(agency: agency))
if landlord.save
imported_count += 1
else
failed_records << { row: row, errors: landlord.errors.full_messages }
end
end
puts "Import completed:"
puts " Successfully imported: #{imported_count}"
puts " Skipped (duplicates): #{skipped_count}"
puts " Failed: #{failed_records.count}"
if failed_records.any?
puts "\nFailed records:"
failed_records.each do |record|
puts " #{record[:row].to_h.values.join(', ')}"
puts " Errors: #{record[:errors].join(', ')}"
end
end
end
end
This task structure separates concerns: data fetching, parsing, validation, and creation happen in distinct steps, making the logic easier to test and debug.
Intelligent Name Parsing
Landlord data arrives in various formats. Some CSVs have separate first_name and last_name columns; others have full_name fields requiring parsing. Some include titles (Mr, Mrs, Dr); others don't. The import system handles these variations intelligently:
def extract_landlord_params(row)
# Try separate first/last name columns first
if row['first_name'].present? && row['last_name'].present?
{
landlord_type: 'individual',
title: extract_title(row['title']),
first_name: row['first_name'].strip,
last_name: row['last_name'].strip,
email: row['email']&.strip&.downcase,
phone: row['phone']&.strip,
mobile: row['mobile']&.strip,
address_line_1: row['address_line_1'],
address_line_2: row['address_line_2'],
town: row['town'],
county: row['county'],
postcode: row['postcode']&.strip&.upcase
}
# Fall back to full_name parsing
elsif row['full_name'].present?
parsed = parse_full_name(row['full_name'])
{
landlord_type: 'individual',
title: parsed[:title],
first_name: parsed[:first_name],
last_name: parsed[:last_name],
email: row['email']&.strip&.downcase,
phone: row['phone']&.strip,
mobile: row['mobile']&.strip,
address_line_1: row['address_line_1'],
address_line_2: row['address_line_2'],
town: row['town'],
county: row['county'],
postcode: row['postcode']&.strip&.upcase
}
else
raise "Missing required name fields: row must have 'first_name' and 'last_name', or 'full_name'"
end
end
def parse_full_name(full_name)
parts = full_name.strip.split(/\s+/)
# Check if first part is a title
title = nil
if parts.first.match?(/^(Mr|Mrs|Miss|Ms|Dr|Prof)\.?$/i)
title = parts.shift.gsub('.', '')
end
# Remaining parts: first name(s) and last name
if parts.length >= 2
last_name = parts.pop
first_name = parts.join(' ')
else
# Single name - use as last name
first_name = nil
last_name = parts.first
end
{
title: title,
first_name: first_name,
last_name: last_name
}
end
This parsing handles common patterns: "Mr John Smith" extracts title, first name, and last name; "Jane Smith" extracts first and last names without title; "Dr Sarah Jane Williams" correctly identifies compound first names.
The parser isn't perfect—names are culturally diverse and complex patterns (multiple surnames, mononyms, hyphenated names) can confound automated parsing. But it handles common UK name formats reliably, with edge cases addressable through manual correction after import.
Duplicate Detection Logic
Creating duplicate landlord records causes confusion: which record is current? Where should new properties be associated? Duplicate detection prevents this by checking existing records before creation:
def duplicate_exists?(landlord_params)
# Check for exact email match
if landlord_params[:email].present?
existing_by_email = Landlord.find_by(email: landlord_params[:email])
return existing_by_email if existing_by_email
end
# Check for name match
if landlord_params[:first_name].present? && landlord_params[:last_name].present?
existing_by_name = Landlord.where(
first_name: landlord_params[:first_name],
last_name: landlord_params[:last_name]
).first
return existing_by_name if existing_by_name
end
# No duplicates found
nil
end
This logic prioritises email matching (most unique identifier) before falling back to name matching. It returns the existing landlord if found, allowing callers to decide whether to skip, update, or create despite the match.
Future enhancements might include fuzzy name matching (detecting "Jon Smith" as duplicate of "John Smith"), phone number matching, or address comparison for higher confidence duplicate detection.
URL Download Support
Agencies often maintain landlord data in Google Sheets for easy collaboration. Requiring export-download-upload cycles creates friction. URL download support allows importing directly from sharing links:
def fetch_csv_data(source)
if source.start_with?('http://', 'https://')
download_csv_from_url(source)
elsif File.exist?(source)
File.read(source)
else
raise "Source must be a valid file path or URL: #{source}"
end
end
def download_csv_from_url(url)
# Convert Google Sheets URLs to CSV export format
url = convert_google_sheets_url(url) if url.include?('docs.google.com/spreadsheets')
uri = URI.parse(url)
response = Net::HTTP.get_response(uri)
unless response.is_a?(Net::HTTPSuccess)
raise "Failed to download CSV from #{url}: #{response.code} #{response.message}"
end
response.body
end
def convert_google_sheets_url(url)
# Convert sharing URL to CSV export URL
# From: https://docs.google.com/spreadsheets/d/SHEET_ID/edit...
# To: https://docs.google.com/spreadsheets/d/SHEET_ID/export?format=csv
if url.match(%r{/spreadsheets/d/([a-zA-Z0-9-_]+)})
sheet_id = Regexp.last_match(1)
"https://docs.google.com/spreadsheets/d/#{sheet_id}/export?format=csv"
else
url
end
end
This converter transforms Google Sheets sharing URLs into CSV export URLs automatically. Staff can copy the sharing link from Google Sheets without understanding export mechanics.
The downloader uses standard Ruby HTTP libraries to fetch remote content. For production deployments, this might be enhanced with timeout handling, redirect following, authentication support for private URLs, and retry logic for transient network failures.
Multi-Tenant Import Scoping
In multi-tenant deployments, imports must scope to specific agencies. The task accepts an agency_id parameter and sets tenant context before processing:
task :import, [:source, :agency_id] => :environment do |_t, args|
agency = Agency.find(args[:agency_id])
ActsAsTenant.current_tenant = agency
# All Landlord queries and creations now automatically scope to this agency
CSV.parse(csv_data, headers: true) do |row|
landlord = Landlord.new(landlord_params.merge(agency: agency))
landlord.save
end
end
Setting ActsAsTenant.current_tenant ensures all database operations respect tenant boundaries. Duplicate checks query only the current agency's landlords; new records associate with the specified agency automatically.
This pattern prevents cross-tenant data contamination whilst simplifying import code—no need to manually add agency scoping to every query and record creation.
Testing Import Logic
Comprehensive tests verify import functionality with various data formats and edge cases:
RSpec.describe "Landlord CSV Import" do
describe "name parsing" do
it "parses full names with titles" do
row = { 'full_name' => 'Mr John Smith', 'email' => 'john@example.com' }
params = extract_landlord_params(row)
expect(params[:title]).to eq('Mr')
expect(params[:first_name]).to eq('John')
expect(params[:last_name]).to eq('Smith')
end
it "handles compound first names" do
row = { 'full_name' => 'Dr Sarah Jane Williams', 'email' => 'sarah@example.com' }
params = extract_landlord_params(row)
expect(params[:title]).to eq('Dr')
expect(params[:first_name]).to eq('Sarah Jane')
expect(params[:last_name]).to eq('Williams')
end
it "handles names without titles" do
row = { 'full_name' => 'Jane Smith', 'email' => 'jane@example.com' }
params = extract_landlord_params(row)
expect(params[:title]).to be_nil
expect(params[:first_name]).to eq('Jane')
expect(params[:last_name]).to eq('Smith')
end
end
describe "duplicate detection" do
let(:agency) { create(:agency) }
before do
ActsAsTenant.current_tenant = agency
end
it "detects duplicates by email" do
existing = create(:landlord, email: 'john@example.com', agency: agency)
params = { email: 'john@example.com', first_name: 'John', last_name: 'Smith' }
duplicate = duplicate_exists?(params)
expect(duplicate).to eq(existing)
end
it "detects duplicates by name" do
existing = create(:landlord, first_name: 'Jane', last_name: 'Smith', agency: agency)
params = { first_name: 'Jane', last_name: 'Smith' }
duplicate = duplicate_exists?(params)
expect(duplicate).to eq(existing)
end
it "does not detect duplicates across agencies" do
other_agency = create(:agency)
create(:landlord, email: 'john@example.com', agency: other_agency)
params = { email: 'john@example.com', first_name: 'John', last_name: 'Smith' }
duplicate = duplicate_exists?(params)
expect(duplicate).to be_nil
end
end
describe "URL downloading" do
it "converts Google Sheets URLs to CSV export format" do
sheets_url = 'https://docs.google.com/spreadsheets/d/ABC123/edit#gid=0'
export_url = convert_google_sheets_url(sheets_url)
expect(export_url).to eq('https://docs.google.com/spreadsheets/d/ABC123/export?format=csv')
end
it "downloads CSV data from valid URLs" do
stub_request(:get, "https://example.com/landlords.csv")
.to_return(status: 200, body: "first_name,last_name\nJohn,Smith\n")
csv_data = fetch_csv_data('https://example.com/landlords.csv')
expect(csv_data).to include("John,Smith")
end
end
end
These tests ensure import logic remains correct as the system evolves, particularly when adding support for new CSV formats or enhancing duplicate detection.
What's Next
The CSV import foundation enables sophisticated migration workflows: import preview interfaces showing detailed validation before committing, import templates downloadable from the system with correct column headers and example data, import history tracking showing who imported which records when, and incremental import support allowing periodic synchronisation with external systems.
Future enhancements might include field mapping configuration (allowing agencies to specify which CSV columns map to which landlord fields), transformation rules (applying consistent formatting or corrections during import), and import scheduling (automatically importing from URLs on regular schedules for ongoing synchronisation).
By building flexible CSV import infrastructure with intelligent parsing, duplicate detection, and multi-tenant scoping, LetAdmin transforms landlord data migration from a manual bottleneck into a streamlined process, accelerating agency onboarding and reducing adoption friction.
