Published on

Shuffling Regular Languages

Authors

Introduction

Regular languages provide a formalism for describing and reasoning about patterns in strings, making them fundamental to designing compilers. As every regular language can be modeled as a deterministic finite automaton (DFA), they also provide several benchmarks for the running times of algorithms with string input. For instance, the set of all email addresses forms a regular language, showing that (omitting space complexity) we can check if some string 1 is a syntactically correct email address in O(n)\mathcal{O}(n). The following was an assignment problem of mine that deals with a lesser-known theorem identifying a closure property of regular languages.

Theorem.

Let Σ\Sigma be some finite alphabet. For languages AΣA \subseteq \Sigma^* and BΣB \subseteq \Sigma^*, let the shuffling of AA and BB, denoted shuffle(A,B)(A,B), be the language

shuffle(A,B)={wΣw=a1b1akbk where a1akA and b1bkB for some kZ+}\text{shuffle}(A,B)=\{w\in \Sigma^*|w=a_1b_1\ldots a_kb_k\text{ where $a_1\ldots a_k\in A$ and $b_1\ldots b_k\in B$ for some $k\in \mathbb{Z}^+$}\}

The class of regular languages is closed under the shullfing operation.

Solution

A quick note

Usually, when proving closure properties, it's much easier to construct a non-deterministic finite automaton and state its equivalence to a DFA. But, in this case, I found it much more intuitive to construct another DFA using the DFAs for languages AA and BB. To see why, try drawing a picture of two (preferably small) DFAs for two specific regular languages, then add the new transition function δ\delta from the proof below.

Proving the theorem

Proof. Let Σ\Sigma be some finite alphabet. Let AΣA\subseteq \Sigma^* and BΣB\subseteq \Sigma^* be regular languages. Since AA and BB are regular languages, let MA=(QA,Σ,δA,q0A,FA)M_A=(Q_A,\Sigma,\delta_A,q^A_0,F_A) and MB=(QB,Σ,δB,q0B,FB)M_B=(Q_B,\Sigma,\delta_B,q^B_0,F_B) be deterministic finite automata such that L(MA)=A\mathcal{L}(M_A)=A and L(MB)=B\mathcal{L}(M_B)=B. Let M=(Q,Σ,δ,(q0A,q0B,A),F)M=(Q,\Sigma,\delta,(q^A_0,q^B_0,A),F) be a deterministic finite automaton where Q=QA×QB×{A,B}Q=Q_A\times Q_B\times \{A,B\}, F=FA×FB×{A}F=F_A\times F_B\times \{A\}, and δ:Q×ΣQ\delta:Q\times\Sigma\mapsto Q is defined by for all qiAQAq^A_i\in Q_A, qjBQBq^B_j\in Q_B, and wΣw\in \Sigma^*,

δ((qiA,qjB,A),w)=(δA(qiA,w),qjB,B)δ((qiA,qjB,B),w)=(qiA,δB(qjB,w),A)\begin{aligned} \delta((q^A_i,q^B_j,A),w)=(\delta_A(q^A_i,w),q^B_j,B)\\ \delta((q^A_i,q^B_j,B),w)=(q^A_i,\delta_B(q^B_j,w),A) \end{aligned}

Let wΣw\in \Sigma^* and notice that wL(M)w\in \mathcal{L}(M) iff δ^((q0A,q0B,A),w)F\hat{\delta}((q_0^A,q_0^B,A),w)\in F2. Since F=FA×FB×{A}F=F_A\times F_B\times \{A\}, by the definition of δ\delta, we can see that MM rejects ww if ww is of odd length or the empty string. Now, consider the case where ww has even length. That is, w=x1y1xkykw=x_1y_1\ldots x_ky_k for some kZ+k\in \mathbb{Z}^+ where x1xkΣx_1\ldots x_k\in \Sigma^* and y1ykΣy_1\ldots y_k\in \Sigma^*. By induction (here), we can show that, δ^((q0A,q0B,A),w)=(δ^A(q0A,x1xk),δ^B(q0B,y1yk),A)\hat{\delta}((q_0^A,q_0^B,A),w)=(\hat{\delta}_A(q_0^A,x_1\ldots x_k),\hat{\delta}_B(q_0^B,y_1\ldots y_k),A). It follows that wL(M)w\in \mathcal{L}(M) iff δ^A(q0A,x1xk)FA\hat{\delta}_A(q_0^A,x_1\ldots x_k)\in F_A and δ^B(q0B,y1yk)FB\hat{\delta}_B(q_0^B,y_1\ldots y_k)\in F_B. Equivalently, wL(M)w\in \mathcal{L}(M) iff x1xkL(MA)=Ax_1\ldots x_k\in \mathcal{L}(M_A)=A and y1ykL(MB)=By_1\ldots y_k\in \mathcal{L}(M_B)=B. Therefore,

L(M)={wΣw=a1b1akbk where a1akA and b1bkB for some kZ+}\mathcal{L}(M)=\{w\in \Sigma^*|w=a_1b_1\ldots a_kb_k\text{ where $a_1\ldots a_k\in A$ and $b_1\ldots b_k\in B$ for some $k\in \mathbb{Z}^+$}\}

Thus, the class of regular languages is closed under the shuffling operation.

Proving the lemma

In case you're interested.

lemma.

Let kZ+k\in \mathbb{Z}^+. For all x1y1xkykΣx_1y_1\ldots x_ky_k\in \Sigma^*,

δ^((q0A,q0B,A),x1y1xkyk)=(δ^A(q0A,x1xk),δ^B(q0B,y1yk),A)\hat{\delta}((q_0^A,q_0^B,A),x_1y_1\ldots x_ky_k)=(\hat{\delta}_A(q_0^A,x_1\ldots x_k),\hat{\delta}_B(q_0^B,y_1\ldots y_k),A)

Proof.

Base Case: Let k=1k=1

δ^((q0A,q0B,A),x1y1)=δ(δ^((q0A,q0B,A),x1),y1)=δ(δ(δ^((q0A,q0B,A),ϵ),x1),y1)=δ(δ((q0A,q0B,A),x1),y1)=δ((δA(q0A,x1),q0B,B)y1)=(δA(q0A,x1),δB(q0B,y1),A)=(δ^A(q0A,x1),δ^B(q0B,y1),A)\begin{aligned} \hat{\delta}((q_0^A,q_0^B,A),x_1y_1)&=\delta(\hat{\delta}((q_0^A,q_0^B,A),x_1),y_1)\\ &=\delta(\delta(\hat{\delta}((q_0^A,q_0^B,A),\epsilon),x_1),y_1)\\ &=\delta(\delta((q_0^A,q_0^B,A),x_1),y_1)\\ &=\delta((\delta_A(q_0^A,x_1),q_0^B,B)y_1)\\ &=(\delta_A(q_0^A,x_1),\delta_B(q_0^B,y_1),A)\\ &=(\hat{\delta}_A(q_0^A,x_1),\hat{\delta}_B(q_0^B,y_1),A) \end{aligned}

Induction Step: Let kZ+k\in \mathbb{Z}^+ and assume

δ^((q0A,q0B,A),x1y1xkyk)=(δ^A(q0A,x1xk),δ^B(q0B,y1yk),A)(1)\hat{\delta}((q_0^A,q_0^B,A),x_1y_1\ldots x_ky_k)=(\hat{\delta}_A(q_0^A,x_1\ldots x_k),\hat{\delta}_B(q_0^B,y_1\ldots y_k),A)\quad\text{(1)}

Then,

δ^((q0A,q0B,A)x1y1xk+1yk+1)=δ(δ^((q0A,q0B,A)x1y1xkykxk+1),yk+1)=δ(δ(δ^((q0A,q0B,A),x1y1xkyk),xk+1),yk+1)=δ(δ((δ^A(q0A,x1xk),δ^B(q0B,y1yk),A),xk+1),yk+1)(1)=δ((δA(δ^A(q0A,x1xk),xk+1),δ^B(q0B,y1yk),B)yk+1)=δ((δ^A(q0A,x1xk+1),δ^B(q0B,y1yk),B),yk+1)=δ(δ^A(q0A,x1xk+1),δ(δ^B(q0B,y1yk),yk+1),A)=δ(δ^A(q0A,x1xk+1),δ^B(q0B,y1yk+1),A)\begin{aligned} \hat{\delta}((q_0^A,q_0^B,A)x_1y_1\ldots x_{k+1}y_{k+1})&=\delta(\hat{\delta}((q_0^A,q_0^B,A)x_1y_1\ldots x_ky_kx_{k+1}),y_{k+1})\\ &=\delta(\delta(\hat{\delta}((q_0^A,q_0^B,A),x_1y_1\ldots x_ky_k),x_{k+1}),y_{k+1})\\ &=\delta(\delta((\hat{\delta}_A(q_0^A,x_1\ldots x_k),\hat{\delta}_B(q_0^B,y_1\ldots y_k),A),x_{k+1}),y_{k+1})&\text{(1)}\\ &=\delta((\delta_A(\hat{\delta}_A(q_0^A,x_1\ldots x_k),x_{k+1}),\hat{\delta}_B(q_0^B,y_1\ldots y_k),B)y_{k+1})\\ &=\delta((\hat{\delta}_A(q_0^A,x_1\ldots x_{k+1}),\hat{\delta}_B(q_0^B,y_1\ldots y_k),B),y_{k+1})\\ &=\delta(\hat{\delta}_A(q_0^A,x_1\ldots x_{k+1}),\delta(\hat{\delta}_B(q_0^B,y_1\ldots y_k),y_{k+1}),A)\\ &=\delta(\hat{\delta}_A(q_0^A,x_1\ldots x_{k+1}),\hat{\delta}_B(q_0^B,y_1\ldots y_{k+1}),A) \end{aligned}

Footnotes

  1. To create your own RFC 5322 regular expression (email address), see the current Internet Message Format

  2. Find the definition of the extended transition function of a DFA δ^\hat{\delta} here