23 - 正則進階

📄目錄

23 - 正則進階📄目錄1 匹配分組1.1 |　匹配左右任意一個表達式（常用）1.2 (ab)　將括號中字符作為一個分組（常用）1.3 \num　匹配分組num匹配到的字符串（經常在匹配標籤時被使用）1.4 (?P<name>)　分組起別名1.5 (?P=name)　引用別名為name分組匹配到的字符串案例：匹配網址2 高級用法2.1 search()　掃描整個字符串，並返回第一個成功匹配的對象2.2 findall()　從頭到尾匹配，找到所有匹配成功的數據，返回一個列表2.3 sub()　sub(pattern, repl, string, count)2.4 split()　split(pattern, string, maxsplit)2.5 總結3 貪婪與非貪婪3.1 貪婪匹配（默認）：在滿足匹配時，匹配盡可能長的字符串3.2 非貪婪匹配：在滿足匹配時，匹配盡可能短的字符串，使用?來表示非貪婪匹配4 原生字符串　r取消轉義導航連結：

1 匹配分組

1.1 `|`　匹配左右任意一個表達式（常用）


xxxxxxxxxx
res = re.match("abc|def","def") # 匹配不到左邊的就匹配右邊的
print(res.group())
res1 = re.match(".|\d","1234")
print(res1.group())

輸出結果：


xxxxxxxxxx
def
1

1.2 `(ab)`　將括號中字符作為一個分組（常用）


xxxxxxxxxx
res = re.match("\w*@(163|126|qq|gmail).com" ,"[email protected]")
print(res.group())

輸出結果：


xxxxxxxxxx
163@163.com

1.3 `\num`　匹配分組num匹配到的字符串（經常在匹配標籤時被使用）


xxxxxxxxxx
# res = re.match("<(\w*)>\w*</\\1>" ,"<html>login</html>")
res = re.match(r"<(\w*)>\w*</\1>" ,"<html>login</html>")    # r：取消跳脫字元／轉義字符（Escape Character）
print(res.group())
res1 = re.match(r"<(\w*)><(\w*)>\w*</\2></\1>" ,"<html><body>login</body></html>")
print(res1.group())

輸出結果：


xxxxxxxxxx
<html>login</html>
<html><body>login</body></html>

1.4 `(?P<name>)`　分組起別名

1.5 `(?P=name)`　引用別名為name分組匹配到的字符串


xxxxxxxxxx
res = re.match(r"<(?P<L1>\w*)><(?P<L2>\w*)>\w*</(?P=L2)></(?P=L1)>" ,"<html><body>login</body></html>")
print(res.group())

輸出結果：


xxxxxxxxxx
<html><body>login</body></html>

案例：匹配網址


xxxxxxxxxx
li = ["www.google.com","www.python.org","www.jetbrains.com","www.bearbelly.uk"]
for i in li:
    res = re.match("www\.\w*\.(com|org)",i)
    if res:
        print(res.group())
    else:
        print(f'{i}這個網址有錯誤')

輸出結果：


xxxxxxxxxx
www.google.com
www.python.org
www.jetbrains.com
www.bearbelly.uk這個網址有錯誤

❗ 注意?:的功能

用(?: )非捕獲組避免多餘 group


xxxxxxxxxx
res = re.match("(www\.\w*\.(?:com|org))","www.google.com")
print(res.group())

輸出結果：


xxxxxxxxxx
www.google.com

⚠️ 注意res.group()和res.groups()的差異

2 高級用法

2.1 `search()`　掃描整個字符串，並返回第一個成功匹配的對象

💡 匹配失敗就返回None


xxxxxxxxxx
res = re.search("\d" ,"Antonio1")
print(res.group())

輸出結果：


xxxxxxxxxx
1

2.2 `findall()`　從頭到尾匹配，找到所有匹配成功的數據，返回一個列表


xxxxxxxxxx
res = re.findall("i" ,"Victoria")
print(res)  # 不需用group返回
print(type(res))

輸出結果：


xxxxxxxxxx
['i', 'i']
<class 'list'>

2.3 `sub()`　`sub(pattern, repl, string, count)`

pattern：正則表達式（代表需要被替換的，也就是字符串里面的舊內容）
repl：新內容
string：字符串
count：指定替換的次數


xxxxxxxxxx
res = re.sub("sleeping", "working", "I am sleeping, sleeping.")     # 默認全部替換
print(res)
res1 = re.sub("sleeping", "working", "I am sleeping, sleeping.", 1)     # 設定替換次數
print(res1)
res2 = re.sub("\d","2","今天是本月的第30天", 1)     # 沒有count = 1，結果就是「22天」
print(res2)

輸出結果：


xxxxxxxxxx
I am working, working.
I am working, sleeping.
今天是本月的第20天

2.4 `split()`　`split(pattern, string, maxsplit)`

pattern：正則表達式
string：字符串
maxsplit：指定最大分割次數


xxxxxxxxxx
s = "12,23,34,45,56"
res = re.split(",", s, 2)   # 只分割2次
print(res)

輸出結果：


xxxxxxxxxx
['12', '23', '34,45,56']

2.5 總結

match()：從頭開始匹配，匹配成功返回match對象，通過gruop()進行提取，匹配失敗就返回None，只匹配一次
search()：從頭到尾匹配，匹配成功返回第一個成功匹配的對象，通過gruop進行提取，匹配失敗就返回None，只匹配一次
findall()：從頭到尾匹配，匹配成功返回一個列表（list），匹配所有匹配成功的數據；不需要通過group()進行提取，沒有這個方法

3 貪婪與非貪婪

3.1 貪婪匹配（默認）：在滿足匹配時，匹配盡可能長的字符串


xxxxxxxxxx
res = re.match("em*" ,"emmmmm......")
print(res.group())

輸出結果：


xxxxxxxxxx
emmmmm

3.2 非貪婪匹配：在滿足匹配時，匹配盡可能短的字符串，使用`?`來表示非貪婪匹配


xxxxxxxxxx
res = re.match("em+?" ,"emmmmm......")  # ?：只要最少，這裡是1次
print(res.group())
res = re.match("m{2,6}?" ,"mmmmm")  # 只要最少，這裡是2次
print(res.group())

輸出結果：


xxxxxxxxxx
em
mm

4 原生字符串　`r`取消轉義

❗ 在Python中，在字符串前面加上r以表示原生字符串


xxxxxxxxxx
print(r"bear\belly")

輸出結果：


xxxxxxxxxx
bear\belly

💡 正則表達式中，匹配字符串的字符\需要\\\\（4個），加入原生字符串，\\代表\


xxxxxxxxxx
res = re.match("\\\\" ,"\\name")
res1 = re.match(r"\\" ,r"\name")
print(res.group(), res1.group(), sep="（分隔）")

輸出結果：


xxxxxxxxxx
\（分隔）\

導航連結：

目的地	超連結
首頁	返回主頁
Python學習	Python學習
上一篇	22 - 正則基礎
下一篇	24 - 內置模塊

23 - 正則進階

📄目錄

1 匹配分組

1.1 | 匹配左右任意一個表達式（常用）

1.2 (ab) 將括號中字符作為一個分組（常用）

1.3 \num 匹配分組num匹配到的字符串（經常在匹配標籤時被使用）

1.4 (?P<name>) 分組起別名

1.5 (?P=name) 引用別名為name分組匹配到的字符串

案例：匹配網址

2 高級用法

2.1 search() 掃描整個字符串，並返回第一個成功匹配的對象

2.2 findall() 從頭到尾匹配，找到所有匹配成功的數據，返回一個列表

2.3 sub() sub(pattern, repl, string, count)

2.4 split() split(pattern, string, maxsplit)

2.5 總結

3 貪婪與非貪婪

3.1 貪婪匹配（默認）：在滿足匹配時，匹配盡可能長的字符串

3.2 非貪婪匹配：在滿足匹配時，匹配盡可能短的字符串，使用?來表示非貪婪匹配

4 原生字符串 r取消轉義

導航連結：

1.1 `|`　匹配左右任意一個表達式（常用）

1.2 `(ab)`　將括號中字符作為一個分組（常用）

1.3 `\num`　匹配分組num匹配到的字符串（經常在匹配標籤時被使用）

1.4 `(?P<name>)`　分組起別名

1.5 `(?P=name)`　引用別名為name分組匹配到的字符串

2.1 `search()`　掃描整個字符串，並返回第一個成功匹配的對象

2.2 `findall()`　從頭到尾匹配，找到所有匹配成功的數據，返回一個列表

2.3 `sub()`　`sub(pattern, repl, string, count)`

2.4 `split()`　`split(pattern, string, maxsplit)`

3.2 非貪婪匹配：在滿足匹配時，匹配盡可能短的字符串，使用`?`來表示非貪婪匹配

4 原生字符串　`r`取消轉義