正则表达式实战

What — 什么是正则表达式

正则表达式（Regular Expression，简称 Regex/RegExp）是一种文本模式匹配工具，用特殊语法描述字符串的组成规则。JavaScript 中通过 RegExp 对象或字面量 /pattern/flags 使用。

核心语法速查

语法	含义	示例
`.`	任意字符（除换行）	`/a.c/` 匹配 abc、a1c
`\d`	数字 [0-9]	`/\d+/` 匹配 123
`\w`	单词字符 [a-zA-Z0-9_]	`/\w+/` 匹配 hello_1
`\s`	空白字符	`/\s+/` 匹配空格、Tab
`^`	行首	`/^Hello/`
`$`	行尾	`/world$/`
`*`	0 次或多次	`/ab*c/`
`+`	1 次或多次	`/ab+c/`
`?`	0 次或 1 次	`/ab?c/`
`{n,m}`	n 到 m 次	`/\d{3,4}/`
`[abc]`	字符集	`/[aeiou]/`
`[^abc]`	排除字符集	`/[^0-9]/`
`(abc)`	捕获组	`/(ab)+/`
`(?:abc)`	非捕获组	`/(?:ab)+/`
`(?=x)`	正向前瞻	`/\d(?=px)/`
`(?!x)`	负向前瞻	`/\d(?!px)/`
`	`	或
`\`	转义	`/\.com/`

修饰符（Flags）

Flag	含义
`g`	全局匹配（找所有匹配）
`i`	忽略大小写
`m`	多行模式（^/$ 匹配行首行尾）
`s`	dotAll（`.` 匹配换行符）
`u`	Unicode 模式
`y`	粘连模式（从 lastIndex 开始匹配）

Why — 为什么需要掌握正则

1. 表单验证

邮箱、手机号、身份证号等格式验证，正则最直接。

2. 文本处理

日志解析、模板替换、代码转换等文本操作场景。

3. 面试高频

正则是前端面试的常客，几乎每场面试都会涉及。

How — 实战场景

1. 常用验证正则

// 邮箱
const emailReg = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

// 手机号（中国大陆）
const phoneReg = /^1[3-9]\d{9}$/

// 身份证号（18位）
const idCardReg = /^\d{17}[\dXx]$/

// URL
const urlReg = /^https?:\/\/[\w\-]+(\.[\w\-]+)+[/#?]?.*$/

// 中文
const chineseReg = /^[一-龥]+$/

// 密码强度（8位以上，含大小写字母和数字）
const strongPwdReg = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d!@#$%^&*]{8,}$/

// IPv4
const ipv4Reg = /^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$/

// 日期 YYYY-MM-DD
const dateReg = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/

// HTML 标签
const htmlTagReg = /<([a-z][a-z0-9]*)\b[^>]*>(.*?)<\/\1>/gi

2. 字符串方法配合正则

// test：是否匹配
/^1[3-9]\d{9}$/.test('13800138000')  // true

// match：提取匹配
'2024-01-15'.match(/(\d{4})-(\d{2})-(\d{2})/)
// ['2024-01-15', '2024', '01', '15', index: 0, groups: undefined]

// matchAll：全局提取（返回迭代器）
const str = 'key1=val1; key2=val2; key3=val3'
for (const match of str.matchAll(/(\w+)=(\w+)/g)) {
  console.log(match[1], match[2])  // key1 val1, key2 val2, key3 val3
}

// replace：替换
'hello world'.replace(/world/, 'JavaScript')  // 'hello JavaScript'

// replaceAll：全局替换
'aabbcc'.replaceAll(/b/g, 'x')  // 'aaxxcc'

// replace with callback
'price: 100, tax: 20'.replace(/\d+/g, (match) => {
  return `$${Number(match).toFixed(2)}`
})
// 'price: $100.00, tax: $20.00'

// split：分割
'one,two;three|four'.split(/[,;|]/)  // ['one', 'two', 'three', 'four']

// search：查找位置
'hello world'.search(/world/)  // 6

3. 命名捕获组

// (?<name>pattern) 命名捕获组
const dateStr = '2024-01-15'
const dateMatch = dateStr.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/)

dateMatch.groups.year   // '2024'
dateMatch.groups.month  // '01'
dateMatch.groups.day    // '15'

// 命名捕获组在 replace 中使用
dateStr.replace(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/, '$<day>/$<month>/$<year>')
// '15/01/2024'

4. 前瞻与后顾

// 正向前瞻 (?=)：匹配后面跟着 x 的位置
'1px 2em 3px 4rem'.match(/\d+(?=px)/g)   // ['1', '3']

// 负向前瞻 (?!)：匹配后面不跟着 x 的位置
'1px 2em 3px 4rem'.match(/\d+(?!px)/g)   // ['2', '4']

// 正向后顾 (?<=)：匹配前面是 x 的位置（ES2018）
'¥100 $200 ¥300'.match(/(?<=¥)\d+/g)      // ['100', '300']

// 负向后顾 (?<!)：匹配前面不是 x 的位置
'¥100 $200 ¥300'.match(/(?<!¥)\d+/g)      // ['200']

实战：密码强度校验：

function checkPasswordStrength(password) {
  const checks = {
    length: password.length >= 8,
    lowercase: /[a-z]/.test(password),
    uppercase: /[A-Z]/.test(password),
    digit: /\d/.test(password),
    special: /[!@#$%^&*]/.test(password),
  }

  const score = Object.values(checks).filter(Boolean).length

  return {
    score,
    level: score < 3 ? 'weak' : score < 5 ? 'medium' : 'strong',
    checks,
  }
}

5. 模板引擎

// 简易模板引擎
function template(str, data) {
  return str.replace(/\{\{(\w+)\}\}/g, (match, key) => {
    return data[key] ?? match
  })
}

template('Hello, {{name}}! You are {{age}} years old.', { name: 'Alice', age: 25 })
// 'Hello, Alice! You are 25 years old.'

// 嵌套属性模板
function deepTemplate(str, data) {
  return str.replace(/\{\{([\w.]+)\}\}/g, (match, path) => {
    return path.split('.').reduce((obj, key) => obj?.[key], data) ?? match
  })
}

deepTemplate('{{user.name}} - {{user.email}}', {
  user: { name: 'Alice', email: 'alice@example.com' }
})
// 'Alice - alice@example.com'

6. CSS 单位转换

// px 转 rem
function pxToRem(css, baseFontSize = 16) {
  return css.replace(/(\d+(?:\.\d+)?)px/g, (match, value) => {
    return `${(Number(value) / baseFontSize).toFixed(4)}rem`
  })
}

pxToRem('.card { padding: 16px; font-size: 14px; }')
// '.card { padding: 1.0000rem; font-size: 0.8750rem; }'

7. 日志解析

// 解析 Nginx 日志
const logLine = '192.168.1.1 - - [10/Jan/2024:13:55:36 +0800] "GET /api/users HTTP/1.1" 200 1234'

const logReg = /^(\S+) - - \[([^\]]+)\] "(\w+) (\S+) HTTP\/[\d.]+" (\d+) (\d+)/
const match = logLine.exec(logLine)

if (match) {
  const [, ip, time, method, path, status, size] = match
  console.log({ ip, time, method, path, status, size })
}

8. 性能优化

// ❌ 每次创建新正则（循环中）
for (const item of items) {
  if (/pattern/.test(item)) { ... }
}

// ✅ 预编译正则
const pattern = /pattern/
for (const item of items) {
  if (pattern.test(item)) { ... }
}

// ❌ 灾难性回溯（嵌套量词）
/(a+)+b/.test('aaaaaaaaaaaaaaaaaaaac')  // 可能卡死

// ✅ 避免嵌套量词，使用原子组或占有量词
/(?>a+)+b/.test('aaaaaaaaaaaaaaaaaaaac')  // 原子组（部分引擎支持）

// ❌ 过于宽泛的正则
/.*<\/div>/

// ✅ 更精确的匹配
/[^<]*<\/div>/

常见问题与踩坑

问题	原因	解决方案
`.` 不匹配换行	默认 `.` 不匹配 `\n`	使用 `s` 标志或 `[\s\S]`
`^$` 不匹配多行	默认单行模式	使用 `m` 标志
全局匹配重复问题	`g` 标志的 `lastIndex` 不重置	每次匹配前重置 `reg.lastIndex = 0`
灾难性回溯	嵌套量词导致指数级回溯	避免嵌套量词、设置超时
贪婪 vs 懒惰	量词默认贪婪（匹配最多）	加 `?` 变懒惰：`.*?`
Unicode 匹配失败	默认按 16 位匹配	使用 `u` 标志

最佳实践

预编译：循环外的正则提到模块顶层。
避免贪婪：用 .*? 替代 .*，防止过度匹配。
使用命名组：(?<name>...) 比 $1 可读性更好。
测试用例：复杂正则一定要写测试，覆盖边界情况。
能不用正则就不用：string.includes()、string.startsWith() 等方法更简单可读。

面试题

1. 贪婪匹配和懒惰匹配有什么区别？如何切换？

答：贪婪匹配（默认）尽可能多地匹配字符，如 /<.*>/ 匹配 <div>content</div> 整个字符串（.* 一直匹配到最后一个 >）。懒惰匹配在量词后加 ?，尽可能少地匹配，如 /<.*?>/ 只匹配 <div>。切换：在 *、+、?、{n,m} 后加 ? 即从贪婪变懒惰。实际中提取 HTML 标签、引号内字符串等场景必须用懒惰匹配。

2. 正则表达式中的前瞻（Lookahead）和后顾（Lookbehind）有什么区别？

答：前瞻检查匹配位置之后的内容，后顾检查之前的内容。前瞻：(?=x) 正向前瞻（后面必须是 x）、(?!x) 负向前瞻（后面不能是 x）。后顾：(?<=x) 正向后顾（前面必须是 x）、(?<!x) 负向后顾（前面不能是 x）。关键区别：前瞻和后顾都是零宽断言——它们不消耗字符，匹配结果中不包含前瞻/后顾的内容。例如 /(?<=¥)\d+/ 匹配 ¥100 中的 100，不包含 ¥。后顾是 ES2018 新增，Safari 16.4 之前不支持。

3. 如何用正则实现千分位分隔（1234567 → 1,234,567）？

答：两种方式：

// 方式一：前瞻 + 单词边界
'1234567'.replace(/\B(?=(\d{3})+$)/g, ',')  // '1,234,567'

// 解释：\B 匹配非单词边界（数字之间），(?=(\d{3})+$) 正向前瞻确保后面是 3 的倍数个数字直到结尾

// 方式二：toLocalString
(1234567).toLocaleString()  // '1,234,567'

方式一解析：从右往左每 3 位插入逗号。\B 确保不在开头插入，(?=(\d{3})+$) 确保当前位置到结尾有 3 的倍数个数字。

4. 正则表达式的 lastIndex 属性有什么作用？什么情况下需要注意？

答：lastIndex 是带 g 或 y 标志的正则对象的属性，指定下一次匹配的起始位置。注意点：(1) exec() 和 test() 在 g 模式下会更新 lastIndex；(2) 如果同一段代码中多次调用 test()，lastIndex 会在每次匹配后移动，导致第二次调用从上次结束位置开始；(3) 如果匹配失败，lastIndex 重置为 0。常见 bug：用 reg.test(str) 做 if 判断，但 reg 带有 g 标志，第二次调用可能返回 false。解决方案：不需要全局匹配时不加 g，或每次调用前 reg.lastIndex = 0。

5. 如何用正则去除字符串中的 HTML 标签？

答：

// 基础版
str.replace(/<[^>]*>/g, '')

// 处理自闭合标签和属性中的 >
str.replace(/<[^>]*(?:>|$)/g, '')

// 更安全的版本（处理 script 和 style 内容）
function stripHtml(html) {
  return html
    .replace(/<script[\s\S]*?<\/script>/gi, '')
    .replace(/<style[\s\S]*?<\/style>/gi, '')
    .replace(/<[^>]*>/g, '')
    .trim()
}

注意：正则解析 HTML 不可靠（注释、CDATA、属性中的 > 等），生产环境建议用 DOMParser。

6. 命名捕获组有什么好处？与编号捕获组对比。

答：命名捕获组 (?<name>...) 给每个捕获组命名，通过 match.groups.name 访问。好处：(1) 可读性——match.groups.year 比 match[1] 语义清晰；(2) 可维护性——正则中间插入新组时，编号会变，命名不受影响；(3) replace 中使用——$<name> 替代 $1，模板更清晰。编号捕获组的问题：当正则有 5 个捕获组时，match[3] 是哪个？需要数括号。命名组直接 match.groups.month。ES2018+ 支持。

7. 什么是灾难性回溯？如何避免？

答：灾难性回溯（Catastrophic Backtracking）是正则引擎在回溯时进行指数级尝试导致的性能问题。典型模式：嵌套量词如 /(a+)+b/，当输入是 aaaaaaaaaaaac（末尾不是 b），引擎会尝试所有可能的 a+ 分割方式（2^n 种），导致 CPU 卡死。避免方法：(1) 避免嵌套量词——/a+b/ 替代 /(a+)+b/；(2) 使用原子组 (?>...)——匹配后不回溯（部分引擎支持）；(3) 设置正则超时——某些语言支持，JS 需手动实现；(4) 限制输入长度——截断过长输入。

8. 如何实现一个简易的 Markdown 解析器？

答：分步用正则替换，从粗到细：

function parseMarkdown(md) {
  let html = md

  // 代码块（先处理，避免被其他规则影响）
  html = html.replace(/```(\w*)\n([\s\S]*?)```/g, '<pre><code class="$1">$2</code></pre>')

  // 行内代码
  html = html.replace(/`([^`]+)`/g, '<code>$1</code>')

  // 标题
  html = html.replace(/^### (.+)$/gm, '<h3>$1</h3>')
  html = html.replace(/^## (.+)$/gm, '<h2>$1</h2>')
  html = html.replace(/^# (.+)$/gm, '<h1>$1</h1>')

  // 粗体和斜体
  html = html.replace(/\*\*(.+?)\*\*/g, '<strong>$1</strong>')
  html = html.replace(/\*(.+?)\*/g, '<em>$1</em>')

  // 链接
  html = html.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2">$1</a>')

  // 图片
  html = html.replace(/!\[([^\]]*)\]\(([^)]+)\)/g, '<img src="$2" alt="$1" />')

  // 无序列表
  html = html.replace(/^- (.+)$/gm, '<li>$1</li>')
  html = html.replace(/(<li>.*<\/li>\n?)+/g, '<ul>$&</ul>')

  // 段落
  html = html.replace(/^(?!<[huplo])(.+)$/gm, '<p>$1</p>')

  return html
}

注意：这是简化版，生产环境用 marked / remark 等成熟库。