Data Structures · #5 of 11

Strings + Text Processing

Immutability, slicing, formatting, and methods

Why it matters

Most interview problems involve text. String mastery saves time.

The idea

Strings are immutable sequences. Slicing produces NEW strings; you never modify in place. f"" strings are the modern way to format.

Try it

Slicing — s[start:stop:step]:

s = "abcdefghij"
print(s[2:5])       # 'cde'
print(s[:4])        # 'abcd'
print(s[-3:])       # 'hij'
print(s[::2])       # 'acegi'
print(s[::-1])      # 'jihgfedcba'  (reverse)
not loaded

Common methods — split, strip, join, replace, find:

raw = "  Hello,  World,  Python  "
parts = [p.strip() for p in raw.split(",")]
print(parts)
print(",".join(parts))
print("python" in raw.lower())
not loaded

Formatting — f-strings beat .format() and % every time:

name, score = "Ada", 0.875
print(f"{name:>10} | {score:6.1%}")
print(f"{name!r:<10} | binary: {42:08b}")

# Multi-line + expressions
print(f"""
sum:  {2 + 3}
list: {[i*i for i in range(5)]}
""")
not loaded

String methods you must know

| Category | Methods | What they do | | --- | --- | --- | | Clean | strip, lstrip, rstrip | remove whitespace | | Case | lower, upper, title, capitalize | change case | | Search | find, index, count | locate and count | | Match | startswith, endswith | prefix/suffix checks | | Split/Join | split, rsplit, splitlines, 'sep'.join(...) | tokenizing and joining | | Replace | replace | substitution | | Check | isalnum, isalpha, isdigit, isnumeric, isspace | validation |

name = "  Ada Lovelace  "
print(name.strip().lower())
"data-science".split("-")
"-".join(["a", "b"])

Palindrome normalization pattern

A reusable recipe: keep only alphanumeric characters and lowercase them, then compare against the reverse.

def normalize(s: str) -> str:
    return "".join(ch.lower() for ch in s if ch.isalnum())

Quick check

Mini drills

Do's and don'ts

Going deeper — bytes vs str

Text and binary are different types in Python 3:

  • str = text (Unicode)
  • bytes = raw binary

Encode to go from text to bytes, decode to come back:

b = "hello".encode("utf-8")
text = b.decode("utf-8")

Common mistakes

Key takeaways